search for: obd_timeout

Displaying 4 results from an estimated 4 matches for "obd_timeout".

Did you mean: add_timeout
2007 Nov 07
9
How To change server recovery timeout
Hi, Our lustre environment is: 2.6.9-55.0.9.EL_lustre.1.6.3smp I would like to change recovery timeout from default value 250s to something longer I tried example from manual: set_timeout <secs> Sets the timeout (obd_timeout) for a server to wait before failing recovery. We performed that experiment on our test lustre installation with one OST. storage02 is our OSS [root at storage02 ~]# lctl dl 0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter h...
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have 1 MDT/MGS and 1 OSS with 2 OST''s. Our cluster uses all Gige and has about 608 nodes 1854 cores. We have allot of jobs that die, and/or go into high IO wait, strace shows processes stuck in fstat(). The big problem is (i think) I would like some feedback on it that of these 608 nodes 209 of them have in dmesg
2008 Feb 12
0
Lustre-discuss Digest, Vol 25, Issue 17
...ent was evicted because of this lock can not be released >>>>>>> on client >>>>>>> on time. Could you provide the stack strace of client at that >>>>>>> time? >>>>>>> >>>>>>> I assume increase obd_timeout could fix your problem. Then maybe >>>>>>> you should wait 1.6.5 released, including a new feature >>>>>>> adaptive_timeout, >>>>>>> which will adjust the timeout value according to the network >>>>>>> congestion &g...