search for: conf_param

Displaying 7 results from an estimated 7 matches for "conf_param".

2007 Nov 07
9
How To change server recovery timeout
...rage02 is our OSS [root at storage02 ~]# lctl dl 0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 [root at storage02 ~]# lctl --device 2 set_timeout 600 set_timeout has been deprecated. Use conf_param instead. e.g. conf_param lustre-MDT0000 obd_timeout=50 usage: conf_param obd_timeout=<secs> run <command> after connecting to device <devno> --device <devno> <command [args ...]> [root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600 No device found for na...
2007 Mar 20
15
How to bypass failed OST without blocking?
Hi I want my lustre do such things during OST failed: if some file has stripe data on th failed OST, any operation on the file will return IO error without blocking, and also at this moment I can create and read/write new file or read/write files which have no stripe data on the failed OST without blocking. What should I do ? How to configure? thanks! swin -------------- next part
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have 1 MDT/MGS and 1 OSS with 2 OST''s. Our cluster uses all Gige and has about 608 nodes 1854 cores. We have allot of jobs that die, and/or go into high IO wait, strace shows processes stuck in fstat(). The big problem is (i think) I would like some feedback on it that of these 608 nodes 209 of them have in dmesg
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2007 Nov 06
4
Checksum Algorithm
Hi, We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes. I''ve been experimenting with using a simple XOR algorithm. I''ve been able to recover
2010 Jul 13
4
Enable async journals
Hi all, we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS using external journals to one with async journals enabled. Question is whether the procedure: umount <filesystem> on all clients umount <osts> on all OSSes e2fsck <ost-device> on all OSSes for all all OSTs tune2fs -O ^has_journal <ost-device> on all
2008 Feb 12
0
Lustre-discuss Digest, Vol 25, Issue 17
...or the next version of lustre might be the best >>>>>> thing. I >>>>>> had upped the timeout a few days back but the next day i had >>>>>> errors on the MDS box. I have switched it back: >>>>>> >>>>>> lctl conf_param nobackup-MDT0000.sys.timeout=300 >>>>>> >>>>>> I would love to give you that trace but I don''t know how to get >>>>>> it. Is there a debug option to turn on in the clients? >>>>> You can get that by echo t > /proc/sys...