Aaron S. Knister
2008-Mar-04  20:31 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve been battling with this on and off now for a few months. I''ve upgraded my infiniband switch firmware, all the clients and servers are running the latest version of lustre and the lustre patched kernel. Any ideas? -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080304/6a315423/attachment-0002.html
Charles Taylor
2008-Mar-04  20:41 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
We''ve seen this before as well. Our experience is that the obd_timeout is far too small for large clusters (ours is 400+ nodes) and the only way we avoid these errors is by setting it to 1000 which seems high to us but appears to work and puts an end to the transport endpoint shutdowns. On the MDS.... lctl conf_param srn.sys.timeout=1000 You may have to do this on the OSS''s as well unless you restart the OSS''s but I could be wrong on that. You should check it everywhere with... cat /proc/sys/lustre/timeout On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote:> This morning I''ve had both my infiniband and tcp lustre clients > hiccup. They are evicted from the server presumably as a result of > their high load and consequent timeouts. My question is- why don''t > the clients re-connect. The infiniband and tcp clients both give > the following message when I type "df" - Cannot send after > transport endpoint shutdown (-108). I''ve been battling with this on > and off now for a few months. I''ve upgraded my infiniband switch > firmware, all the clients and servers are running the latest > version of lustre and the lustre patched kernel. Any ideas? > > -Aaron > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Aaron S. Knister
2008-Mar-04  20:55 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
I think I tried that before and it didn''t help, but I will try it again. Thanks for the suggestion. -Aaron ----- Original Message ----- From: "Charles Taylor" <taylor at hpc.ufl.edu> To: "Aaron S. Knister" <aaron at iges.org> Cc: "lustre-discuss" <lustre-discuss at clusterfs.com>, "Thomas Wakefield" <twake at cola.iges.org> Sent: Tuesday, March 4, 2008 3:41:04 PM GMT -05:00 US/Canada Eastern Subject: Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108) We''ve seen this before as well. Our experience is that the obd_timeout is far too small for large clusters (ours is 400+ nodes) and the only way we avoid these errors is by setting it to 1000 which seems high to us but appears to work and puts an end to the transport endpoint shutdowns. On the MDS.... lctl conf_param srn.sys.timeout=1000 You may have to do this on the OSS''s as well unless you restart the OSS''s but I could be wrong on that. You should check it everywhere with... cat /proc/sys/lustre/timeout On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote:> This morning I''ve had both my infiniband and tcp lustre clients > hiccup. They are evicted from the server presumably as a result of > their high load and consequent timeouts. My question is- why don''t > the clients re-connect. The infiniband and tcp clients both give > the following message when I type "df" - Cannot send after > transport endpoint shutdown (-108). I''ve been battling with this on > and off now for a few months. I''ve upgraded my infiniband switch > firmware, all the clients and servers are running the latest > version of lustre and the lustre patched kernel. Any ideas? > > -Aaron > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080304/4d9c5a7a/attachment-0002.html
Brian J. Murrell
2008-Mar-04  21:04 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:> I think I tried that before and it didn''t help, but I will try it > again. Thanks for the suggestion.Just so you guys know, 1000 seconds for the obd_timeout is very, very large! As you could probably guess, we have some very, very big Lustre installations and to the best of my knowledge none of them are using anywhere near that. AFAIK (and perhaps a Sun engineer with closer experience to some of these very large clusters might correct me) the largest value that the largest clusters are using is in the neighbourhood of 300s. There has to be some other problem at play here that you need 1000s. Can you both please report your lustre and kernel versions? I know you said "latest" Aaron, but some version numbers might be more solid to go on. b.
Aaron Knister
2008-Mar-04  22:42 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
I made this change and clients are still being evicted. This is very frustrating. It happens over tcp and infiniband. My timeout is 1000. Anybody know why don''t the clients reconnect? On Mar 4, 2008, at 3:55 PM, Aaron S. Knister wrote:> I think I tried that before and it didn''t help, but I will try it > again. Thanks for the suggestion. > > -Aaron > > ----- Original Message ----- > From: "Charles Taylor" <taylor at hpc.ufl.edu> > To: "Aaron S. Knister" <aaron at iges.org> > Cc: "lustre-discuss" <lustre-discuss at clusterfs.com>, "Thomas > Wakefield" <twake at cola.iges.org> > Sent: Tuesday, March 4, 2008 3:41:04 PM GMT -05:00 US/Canada Eastern > Subject: Re: [Lustre-discuss] Cannot send after transport endpoint > shutdown (-108) > > We''ve seen this before as well. Our experience is that the > obd_timeout is far too small for large clusters (ours is 400+ > nodes) and the only way we avoid these errors is by setting it to > 1000 which seems high to us but appears to work and puts an end to > the transport endpoint shutdowns. > > On the MDS.... > > lctl conf_param srn.sys.timeout=1000 > > You may have to do this on the OSS''s as well unless you restart the > OSS''s but I could be wrong on that. You should check it everywhere > with... > > cat /proc/sys/lustre/timeout > > > On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote: > > > This morning I''ve had both my infiniband and tcp lustre clients > > hiccup. They are evicted from the server presumably as a result of > > their high load and consequent timeouts. My question is- why don''t > > the clients re-connect. The infiniband and tcp clients both give > > the following message when I type "df" - Cannot send after > > transport endpoint shutdown (-108). I''ve been battling with this on > > and off now for a few months. I''ve upgraded my infiniband switch > > firmware, all the clients and servers are running the latest > > version of lustre and the lustre patched kernel. Any ideas? > > > > -Aaron > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080304/59f16d35/attachment-0002.html
Craig Prescott
2008-Mar-05  00:37 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Hi Aaron; As Charlie mentioned, we have 400 clients and a timeout value of 1000 is "enough" for us. How many clients do you have? If it is more than 400, or the ratio of your o2ib/tcp clients is not like ours (80/20), you may need a bigger value. Also, we have observed that occassionally we set the timeout on out MGS/MDS machine via: lctl conf_param <fsname>.sys.timeout=1000 but it does not "take" everywhere. That is, you should check your OSSes and clients to observe that the correct timeout is reflected in /proc/sys/lustre/timeout. If it isn''t, just echo the correct number in there. If you already checked this, maybe try a bigger value? Hope that helps, Craig Prescott Aaron Knister wrote:> I made this change and clients are still being evicted. This is very > frustrating. It happens over tcp and infiniband. My timeout is 1000. > Anybody know why don''t the clients reconnect? > > On Mar 4, 2008, at 3:55 PM, Aaron S. Knister wrote: > >> I think I tried that before and it didn''t help, but I will try it >> again. Thanks for the suggestion. >> >> -Aaron >> >> ----- Original Message ----- >> From: "Charles Taylor" <taylor at hpc.ufl.edu <mailto:taylor at hpc.ufl.edu>> >> To: "Aaron S. Knister" <aaron at iges.org <mailto:aaron at iges.org>> >> Cc: "lustre-discuss" <lustre-discuss at clusterfs.com >> <mailto:lustre-discuss at clusterfs.com>>, "Thomas Wakefield" >> <twake at cola.iges.org <mailto:twake at cola.iges.org>> >> Sent: Tuesday, March 4, 2008 3:41:04 PM GMT -05:00 US/Canada Eastern >> Subject: Re: [Lustre-discuss] Cannot send after transport endpoint >> shutdown (-108) >> >> We''ve seen this before as well. Our experience is that the >> obd_timeout is far too small for large clusters (ours is 400+ >> nodes) and the only way we avoid these errors is by setting it to >> 1000 which seems high to us but appears to work and puts an end to >> the transport endpoint shutdowns. >> >> On the MDS.... >> >> lctl conf_param srn.sys.timeout=1000 >> >> You may have to do this on the OSS''s as well unless you restart the >> OSS''s but I could be wrong on that. You should check it everywhere >> with... >> >> cat /proc/sys/lustre/timeout >> >> >> On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote: >> >> > This morning I''ve had both my infiniband and tcp lustre clients >> > hiccup. They are evicted from the server presumably as a result of >> > their high load and consequent timeouts. My question is- why don''t >> > the clients re-connect. The infiniband and tcp clients both give >> > the following message when I type "df" - Cannot send after >> > transport endpoint shutdown (-108). I''ve been battling with this on >> > and off now for a few months. I''ve upgraded my infiniband switch >> > firmware, all the clients and servers are running the latest >> > version of lustre and the lustre patched kernel. Any ideas? >> > >> > -Aaron >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org> >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org <mailto:aaron at iges.org> > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Charles Taylor
2008-Mar-05  11:56 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Sure, we will provide you with more details of our installation but let me first say that, if recollection serves, we did not pull that number out of a hat. I believe that there is a formula in one of the lustre tuning manuals for calculating the recommended timeout value. I''ll have to take a moment to go back and find it. Anyway, if you use that formula for our cluster, the recommended timeout value, I think, comes out to be *much* larger than 1000. Later this morning, we will go back and find that formula and share with the list how we came up w/ our timeout. Perhaps you can show us where we are going wrong. One more comment.... We just brought up our second large lustre file system. It is 80+ TB served by 24 OSTs on two (pretty beefy) OSSs. We just achieved over 2GB/sec of sustained (large block, sequential) I/O from an aggregate of 20 clients. Our design target was 1.0 GB/sec/OSS and we hit that pretty comfortably. That said, when we first mounted the new (1.6.4.2) file system across all 400 nodes in our cluster, we immediately started getting "transport endpoint failures" and evictions. We looked rather intensively for network/fabric problems (we have both o2ib and tcp nids) and could find none. All of our MPI apps are/were running just fine. The only way we could get rid of the evictions and transport endpoint failures was by increasing the timeout. Also, we knew to do this based on our experience with our first lustre file system (1.6.3 + patches) where we had to do the same thing. Like I said, a little bit later, Craig or I will post more details about our implementation. If we are doing something wrong with regard to this timeout business, I would love to know what it is. Thanks, Charlie Taylor UF HPC Center On Mar 4, 2008, at 4:04 PM, Brian J. Murrell wrote:> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: >> I think I tried that before and it didn''t help, but I will try it >> again. Thanks for the suggestion. > > Just so you guys know, 1000 seconds for the obd_timeout is very, very > large! As you could probably guess, we have some very, very big > Lustre > installations and to the best of my knowledge none of them are using > anywhere near that. AFAIK (and perhaps a Sun engineer with closer > experience to some of these very large clusters might correct me) the > largest value that the largest clusters are using is in the > neighbourhood of 300s. There has to be some other problem at play > here > that you need 1000s. > > Can you both please report your lustre and kernel versions? I know > you > said "latest" Aaron, but some version numbers might be more solid > to go > on. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Frank Leers
2008-Mar-05  16:03 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote:> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: > > I think I tried that before and it didn''t help, but I will try it > > again. Thanks for the suggestion. > > Just so you guys know, 1000 seconds for the obd_timeout is very, very > large! As you could probably guess, we have some very, very big Lustre > installations and to the best of my knowledge none of them are using > anywhere near that. AFAIK (and perhaps a Sun engineer with closer > experience to some of these very large clusters might correct me) the > largest value that the largest clusters are using is in the > neighbourhood of 300s. There has to be some other problem at play here > that you need 1000s.I can confirm that at a recent large installation with several thousand clients, the default of 100 is in effect.> > Can you both please report your lustre and kernel versions? I know you > said "latest" Aaron, but some version numbers might be more solid to go > on. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Aaron Knister
2008-Mar-05  16:08 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
That''s very strange. What interconnect is that site using? My versions are - Lustre - 1.6.4.2 Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp Kernel (clients) - 2.6.18-53.1.13.el5 On Mar 5, 2008, at 11:03 AM, Frank Leers wrote:> On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote: >> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: >>> I think I tried that before and it didn''t help, but I will try it >>> again. Thanks for the suggestion. >> >> Just so you guys know, 1000 seconds for the obd_timeout is very, very >> large! As you could probably guess, we have some very, very big >> Lustre >> installations and to the best of my knowledge none of them are using >> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >> experience to some of these very large clusters might correct me) the >> largest value that the largest clusters are using is in the >> neighbourhood of 300s. There has to be some other problem at play >> here >> that you need 1000s. > > I can confirm that at a recent large installation with several > thousand > clients, the default of 100 is in effect. > >> >> Can you both please report your lustre and kernel versions? I know >> you >> said "latest" Aaron, but some version numbers might be more solid >> to go >> on. >> >> b. >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Frank Leers
2008-Mar-05  16:33 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote:> That''s very strange. What interconnect is that site using? >Not really strange, but - SDR IB/OFED lustre 1.6.4.2 2.6.18.8 clients 2.6.9-55.0.9 servers> My versions are - > > Lustre - 1.6.4.2 > Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp > Kernel (clients) - 2.6.18-53.1.13.el5 > > > > On Mar 5, 2008, at 11:03 AM, Frank Leers wrote: > > > On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote: > >> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: > >>> I think I tried that before and it didn''t help, but I will try it > >>> again. Thanks for the suggestion. > >> > >> Just so you guys know, 1000 seconds for the obd_timeout is very, very > >> large! As you could probably guess, we have some very, very big > >> Lustre > >> installations and to the best of my knowledge none of them are using > >> anywhere near that. AFAIK (and perhaps a Sun engineer with closer > >> experience to some of these very large clusters might correct me) the > >> largest value that the largest clusters are using is in the > >> neighbourhood of 300s. There has to be some other problem at play > >> here > >> that you need 1000s. > > > > I can confirm that at a recent large installation with several > > thousand > > clients, the default of 100 is in effect. > > > >> > >> Can you both please report your lustre and kernel versions? I know > >> you > >> said "latest" Aaron, but some version numbers might be more solid > >> to go > >> on. > >> > >> b. > >> > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > >
Charles Taylor
2008-Mar-05  16:34 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Well, go figure. We are running... Lustre: 1.6.4.2 on clients and servers Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers) Platform: X86_64 (opteron 275s, mostly) Interconnect: IB, Ethernet IB Stack: OFED 1.2 We already posted our procedure for patching the kernel, building OFED, and building lustre so I don''t think I''ll go into that again. Like I said, we just brought a new file system online. Everything looked fine at first with just a few clients mounted. Once we mounted all 408 (or so), we started gettting all kinds of "transport endpoint failures" and the MGSs and OSTs were evicting clients left and right. We looked for network problems and could not find any of any substance. Once we increased the obd/lustre/ system timeout setting as previously discussed, the errors vanished. This was consistent with our experience with 1.6.3 as well. That file system has been online since early December. Both file systems appear to be working well. I''m not sure what to make of it. Perhaps we are just masking another problem. Perhaps there are some other, related values that need to be tuned. We''ve done the best we could but I''m sure there is still much about Lustre we don''t know. We''ll try to get someone out to the next class but until then, we''re on our own, so to speak. Charlie Taylor UF HPC Center>> >> Just so you guys know, 1000 seconds for the obd_timeout is very, very >> large! As you could probably guess, we have some very, very big >> Lustre >> installations and to the best of my knowledge none of them are using >> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >> experience to some of these very large clusters might correct me) the >> largest value that the largest clusters are using is in the >> neighbourhood of 300s. There has to be some other problem at play >> here >> that you need 1000s. > > I can confirm that at a recent large installation with several > thousand > clients, the default of 100 is in effect. > >>
Aaron Knister
2008-Mar-05  18:09 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Are you running DDR or SDR IB? Also what hardware are you using for your storage? On Mar 5, 2008, at 11:34 AM, Charles Taylor wrote:> Well, go figure. We are running... > > Lustre: 1.6.4.2 on clients and servers > Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers) > Platform: X86_64 (opteron 275s, mostly) > Interconnect: IB, Ethernet > IB Stack: OFED 1.2 > > We already posted our procedure for patching the kernel, building > OFED, and building lustre so I don''t think I''ll go into that > again. Like I said, we just brought a new file system online. > Everything looked fine at first with just a few clients mounted. > Once we mounted all 408 (or so), we started gettting all kinds of > "transport endpoint failures" and the MGSs and OSTs were evicting > clients left and right. We looked for network problems and could > not find any of any substance. Once we increased the obd/lustre/ > system timeout setting as previously discussed, the errors > vanished. This was consistent with our experience with 1.6.3 as > well. That file system has been online since early December. > Both file systems appear to be working well. > > I''m not sure what to make of it. Perhaps we are just masking > another problem. Perhaps there are some other, related values > that need to be tuned. We''ve done the best we could but I''m sure > there is still much about Lustre we don''t know. We''ll try to get > someone out to the next class but until then, we''re on our own, so to > speak. > > Charlie Taylor > UF HPC Center > >>> >>> Just so you guys know, 1000 seconds for the obd_timeout is very, >>> very >>> large! As you could probably guess, we have some very, very big >>> Lustre >>> installations and to the best of my knowledge none of them are using >>> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >>> experience to some of these very large clusters might correct me) >>> the >>> largest value that the largest clusters are using is in the >>> neighbourhood of 300s. There has to be some other problem at play >>> here >>> that you need 1000s. >> >> I can confirm that at a recent large installation with several >> thousand >> clients, the default of 100 is in effect. >> >>> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Charles Taylor
2008-Mar-05  18:30 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
SDR on the IB side. Our storage is RAID Inc. Falcon 3s, host attached via 4Gb qlogic FC HCAs. http://www.raidinc.com/falcon_III.php Regards, Charlie On Mar 5, 2008, at 1:09 PM, Aaron Knister wrote:> Are you running DDR or SDR IB? Also what hardware are you using for > your storage? > > On Mar 5, 2008, at 11:34 AM, Charles Taylor wrote: > >> Well, go figure. We are running... >> >> Lustre: 1.6.4.2 on clients and servers >> Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers) >> Platform: X86_64 (opteron 275s, mostly) >> Interconnect: IB, Ethernet >> IB Stack: OFED 1.2 >> >> We already posted our procedure for patching the kernel, building >> OFED, and building lustre so I don''t think I''ll go into that >> again. Like I said, we just brought a new file system online. >> Everything looked fine at first with just a few clients mounted. >> Once we mounted all 408 (or so), we started gettting all kinds of >> "transport endpoint failures" and the MGSs and OSTs were evicting >> clients left and right. We looked for network problems and could >> not find any of any substance. Once we increased the obd/lustre/ >> system timeout setting as previously discussed, the errors >> vanished. This was consistent with our experience with 1.6.3 as >> well. That file system has been online since early December. >> Both file systems appear to be working well. >> >> I''m not sure what to make of it. Perhaps we are just masking >> another problem. Perhaps there are some other, related values >> that need to be tuned. We''ve done the best we could but I''m sure >> there is still much about Lustre we don''t know. We''ll try to get >> someone out to the next class but until then, we''re on our own, so to >> speak. >> >> Charlie Taylor >> UF HPC Center >> >>>> >>>> Just so you guys know, 1000 seconds for the obd_timeout is very, >>>> very >>>> large! As you could probably guess, we have some very, very big >>>> Lustre >>>> installations and to the best of my knowledge none of them are >>>> using >>>> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >>>> experience to some of these very large clusters might correct >>>> me) the >>>> largest value that the largest clusters are using is in the >>>> neighbourhood of 300s. There has to be some other problem at play >>>> here >>>> that you need 1000s. >>> >>> I can confirm that at a recent large installation with several >>> thousand >>> clients, the default of 100 is in effect. >>> >>>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > >
Aaron Knister
2008-Mar-05  18:37 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Could you tell me what version of OFED was being used? Was it the version that ships with the kernel? -Aaron On Mar 5, 2008, at 11:33 AM, Frank Leers wrote:> On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote: >> That''s very strange. What interconnect is that site using? >> > > Not really strange, but - > > SDR IB/OFED > > lustre 1.6.4.2 > 2.6.18.8 clients > 2.6.9-55.0.9 servers > >> My versions are - >> >> Lustre - 1.6.4.2 >> Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp >> Kernel (clients) - 2.6.18-53.1.13.el5 >> >> >> >> On Mar 5, 2008, at 11:03 AM, Frank Leers wrote: >> >>> On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote: >>>> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: >>>>> I think I tried that before and it didn''t help, but I will try it >>>>> again. Thanks for the suggestion. >>>> >>>> Just so you guys know, 1000 seconds for the obd_timeout is very, >>>> very >>>> large! As you could probably guess, we have some very, very big >>>> Lustre >>>> installations and to the best of my knowledge none of them are >>>> using >>>> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >>>> experience to some of these very large clusters might correct me) >>>> the >>>> largest value that the largest clusters are using is in the >>>> neighbourhood of 300s. There has to be some other problem at play >>>> here >>>> that you need 1000s. >>> >>> I can confirm that at a recent large installation with several >>> thousand >>> clients, the default of 100 is in effect. >>> >>>> >>>> Can you both please report your lustre and kernel versions? I know >>>> you >>>> said "latest" Aaron, but some version numbers might be more solid >>>> to go >>>> on. >>>> >>>> b. >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> Aaron Knister >> Associate Systems Analyst >> Center for Ocean-Land-Atmosphere Studies >> >> (301) 595-7000 >> aaron at iges.org >> >> >> >> >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Aaron Knister
2008-Mar-05  18:39 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
I wonder if the issue is related to the kernels being run on the servers. Both Mr. Taylor and my setups are running the 2.6.18 kernel on the server, however the set up mentioned with a timeout of 100 was using the 2.6.9 kernel on the servers. -Aaron On Mar 5, 2008, at 1:30 PM, Charles Taylor wrote:> > SDR on the IB side. Our storage is RAID Inc. Falcon 3s, host > attached via 4Gb qlogic FC HCAs. > > http://www.raidinc.com/falcon_III.php > > Regards, > > Charlie > > > On Mar 5, 2008, at 1:09 PM, Aaron Knister wrote: > >> Are you running DDR or SDR IB? Also what hardware are you using for >> your storage? >> >> On Mar 5, 2008, at 11:34 AM, Charles Taylor wrote: >> >>> Well, go figure. We are running... >>> >>> Lustre: 1.6.4.2 on clients and servers >>> Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers) >>> Platform: X86_64 (opteron 275s, mostly) >>> Interconnect: IB, Ethernet >>> IB Stack: OFED 1.2 >>> >>> We already posted our procedure for patching the kernel, building >>> OFED, and building lustre so I don''t think I''ll go into that >>> again. Like I said, we just brought a new file system online. >>> Everything looked fine at first with just a few clients mounted. >>> Once we mounted all 408 (or so), we started gettting all kinds of >>> "transport endpoint failures" and the MGSs and OSTs were evicting >>> clients left and right. We looked for network problems and could >>> not find any of any substance. Once we increased the obd/lustre/ >>> system timeout setting as previously discussed, the errors >>> vanished. This was consistent with our experience with 1.6.3 as >>> well. That file system has been online since early December. >>> Both file systems appear to be working well. >>> >>> I''m not sure what to make of it. Perhaps we are just masking >>> another problem. Perhaps there are some other, related values >>> that need to be tuned. We''ve done the best we could but I''m sure >>> there is still much about Lustre we don''t know. We''ll try to get >>> someone out to the next class but until then, we''re on our own, so >>> to >>> speak. >>> >>> Charlie Taylor >>> UF HPC Center >>> >>>>> >>>>> Just so you guys know, 1000 seconds for the obd_timeout is very, >>>>> very >>>>> large! As you could probably guess, we have some very, very big >>>>> Lustre >>>>> installations and to the best of my knowledge none of them are >>>>> using >>>>> anywhere near that. AFAIK (and perhaps a Sun engineer with closer >>>>> experience to some of these very large clusters might correct >>>>> me) the >>>>> largest value that the largest clusters are using is in the >>>>> neighbourhood of 300s. There has to be some other problem at play >>>>> here >>>>> that you need 1000s. >>>> >>>> I can confirm that at a recent large installation with several >>>> thousand >>>> clients, the default of 100 is in effect. >>>> >>>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> Aaron Knister >> Associate Systems Analyst >> Center for Ocean-Land-Atmosphere Studies >> >> (301) 595-7000 >> aaron at iges.org >> >> >> >> >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Frank Leers
2008-Mar-05  19:03 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
On Wed, 2008-03-05 at 13:37 -0500, Aaron Knister wrote:> Could you tell me what version of OFED was being used? Was it the > version that ships with the kernel?OFED version is 1.2.5.4> > -Aaron > > On Mar 5, 2008, at 11:33 AM, Frank Leers wrote: > > > On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote: > >> That''s very strange. What interconnect is that site using? > >> > > > > Not really strange, but - > > > > SDR IB/OFED > > > > lustre 1.6.4.2 > > 2.6.18.8 clients > > 2.6.9-55.0.9 servers > > > >> My versions are - > >> > >> Lustre - 1.6.4.2 > >> Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp > >> Kernel (clients) - 2.6.18-53.1.13.el5 > >> > >> > >> > >> On Mar 5, 2008, at 11:03 AM, Frank Leers wrote: > >> > >>> On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote: > >>>> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: > >>>>> I think I tried that before and it didn''t help, but I will try it > >>>>> again. Thanks for the suggestion. > >>>> > >>>> Just so you guys know, 1000 seconds for the obd_timeout is very, > >>>> very > >>>> large! As you could probably guess, we have some very, very big > >>>> Lustre > >>>> installations and to the best of my knowledge none of them are > >>>> using > >>>> anywhere near that. AFAIK (and perhaps a Sun engineer with closer > >>>> experience to some of these very large clusters might correct me) > >>>> the > >>>> largest value that the largest clusters are using is in the > >>>> neighbourhood of 300s. There has to be some other problem at play > >>>> here > >>>> that you need 1000s. > >>> > >>> I can confirm that at a recent large installation with several > >>> thousand > >>> clients, the default of 100 is in effect. > >>> > >>>> > >>>> Can you both please report your lustre and kernel versions? I know > >>>> you > >>>> said "latest" Aaron, but some version numbers might be more solid > >>>> to go > >>>> on. > >>>> > >>>> b. > >>>> > >>>> > >>>> _______________________________________________ > >>>> Lustre-discuss mailing list > >>>> Lustre-discuss at lists.lustre.org > >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> Lustre-discuss at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > >> Aaron Knister > >> Associate Systems Analyst > >> Center for Ocean-Land-Atmosphere Studies > >> > >> (301) 595-7000 > >> aaron at iges.org > >> > >> > >> > >> > > > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > >
Aaron Knister
2008-Mar-05  23:00 UTC
[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)
Are the clients SuSE, redhat or another distro? I can''t get OFED 1.2.5.4 to build with rhel5 but im working on that. On Mar 5, 2008, at 2:03 PM, Frank Leers wrote:> On Wed, 2008-03-05 at 13:37 -0500, Aaron Knister wrote: >> Could you tell me what version of OFED was being used? Was it the >> version that ships with the kernel? > > OFED version is 1.2.5.4 > >> >> -Aaron >> >> On Mar 5, 2008, at 11:33 AM, Frank Leers wrote: >> >>> On Wed, 2008-03-05 at 11:08 -0500, Aaron Knister wrote: >>>> That''s very strange. What interconnect is that site using? >>>> >>> >>> Not really strange, but - >>> >>> SDR IB/OFED >>> >>> lustre 1.6.4.2 >>> 2.6.18.8 clients >>> 2.6.9-55.0.9 servers >>> >>>> My versions are - >>>> >>>> Lustre - 1.6.4.2 >>>> Kernel (servers) - 2.6.18-8.1.14.el5_lustre.1.6.4.2smp >>>> Kernel (clients) - 2.6.18-53.1.13.el5 >>>> >>>> >>>> >>>> On Mar 5, 2008, at 11:03 AM, Frank Leers wrote: >>>> >>>>> On Tue, 2008-03-04 at 22:04 +0100, Brian J. Murrell wrote: >>>>>> On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote: >>>>>>> I think I tried that before and it didn''t help, but I will try >>>>>>> it >>>>>>> again. Thanks for the suggestion. >>>>>> >>>>>> Just so you guys know, 1000 seconds for the obd_timeout is very, >>>>>> very >>>>>> large! As you could probably guess, we have some very, very big >>>>>> Lustre >>>>>> installations and to the best of my knowledge none of them are >>>>>> using >>>>>> anywhere near that. AFAIK (and perhaps a Sun engineer with >>>>>> closer >>>>>> experience to some of these very large clusters might correct me) >>>>>> the >>>>>> largest value that the largest clusters are using is in the >>>>>> neighbourhood of 300s. There has to be some other problem at >>>>>> play >>>>>> here >>>>>> that you need 1000s. >>>>> >>>>> I can confirm that at a recent large installation with several >>>>> thousand >>>>> clients, the default of 100 is in effect. >>>>> >>>>>> >>>>>> Can you both please report your lustre and kernel versions? I >>>>>> know >>>>>> you >>>>>> said "latest" Aaron, but some version numbers might be more solid >>>>>> to go >>>>>> on. >>>>>> >>>>>> b. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> Lustre-discuss at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> Aaron Knister >>>> Associate Systems Analyst >>>> Center for Ocean-Land-Atmosphere Studies >>>> >>>> (301) 595-7000 >>>> aaron at iges.org >>>> >>>> >>>> >>>> >>> >> >> Aaron Knister >> Associate Systems Analyst >> Center for Ocean-Land-Atmosphere Studies >> >> (301) 595-7000 >> aaron at iges.org >> >> >> >> >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org