Displaying 4 results from an estimated 4 matches for "obd_timeout".
Did you mean:
add_timeout
2007 Nov 07
9
How To change server recovery timeout
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to
something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with one
OST.
storage02 is our OSS
[root at storage02 ~]# lctl dl
0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter h...
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have
1 MDT/MGS and 1 OSS with 2 OST''s.
Our cluster uses all Gige and has about 608 nodes 1854 cores.
We have allot of jobs that die, and/or go into high IO wait, strace
shows processes stuck in fstat().
The big problem is (i think) I would like some feedback on it that of
these 608 nodes 209 of them have in dmesg
2008 Feb 12
0
Lustre-discuss Digest, Vol 25, Issue 17
...ent was evicted because of this lock can not be released
>>>>>>> on client
>>>>>>> on time. Could you provide the stack strace of client at that
>>>>>>> time?
>>>>>>>
>>>>>>> I assume increase obd_timeout could fix your problem. Then maybe
>>>>>>> you should wait 1.6.5 released, including a new feature
>>>>>>> adaptive_timeout,
>>>>>>> which will adjust the timeout value according to the network
>>>>>>> congestion
&g...