Displaying 7 results from an estimated 7 matches for "conf_param".
2007 Nov 07
9
How To change server recovery timeout
...rage02 is our OSS
[root at storage02 ~]# lctl dl
0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
[root at storage02 ~]# lctl --device 2 set_timeout 600
set_timeout has been deprecated. Use conf_param instead.
e.g. conf_param lustre-MDT0000 obd_timeout=50
usage: conf_param obd_timeout=<secs>
run <command> after connecting to device <devno>
--device <devno> <command [args ...]>
[root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600
No device found for na...
2007 Mar 20
15
How to bypass failed OST without blocking?
Hi
I want my lustre do such things during OST failed: if some file
has stripe data on th failed OST, any operation on the file will
return IO error without blocking, and also at this moment I can
create and read/write new file or read/write files which have no stripe
data on the failed OST without blocking.
What should I do ? How to configure?
thanks!
swin
-------------- next part
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have
1 MDT/MGS and 1 OSS with 2 OST''s.
Our cluster uses all Gige and has about 608 nodes 1854 cores.
We have allot of jobs that die, and/or go into high IO wait, strace
shows processes stuck in fstat().
The big problem is (i think) I would like some feedback on it that of
these 608 nodes 209 of them have in dmesg
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2007 Nov 06
4
Checksum Algorithm
Hi,
We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes.
I''ve been experimenting with using a simple XOR algorithm. I''ve been able to recover
2010 Jul 13
4
Enable async journals
Hi all,
we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS
using external journals to one with async journals enabled.
Question is whether the procedure:
umount <filesystem> on all clients
umount <osts> on all OSSes
e2fsck <ost-device> on all OSSes for all all OSTs
tune2fs -O ^has_journal <ost-device> on all
2008 Feb 12
0
Lustre-discuss Digest, Vol 25, Issue 17
...or the next version of lustre might be the best
>>>>>> thing. I
>>>>>> had upped the timeout a few days back but the next day i had
>>>>>> errors on the MDS box. I have switched it back:
>>>>>>
>>>>>> lctl conf_param nobackup-MDT0000.sys.timeout=300
>>>>>>
>>>>>> I would love to give you that trace but I don''t know how to get
>>>>>> it. Is there a debug option to turn on in the clients?
>>>>> You can get that by echo t > /proc/sys...