thr3ads.net - search: "conf

Displaying 7 results from an estimated 7 matches for "conf_param".

2007 Nov 07

How To change server recovery timeout

...rage02 is our OSS [root at storage02 ~]# lctl dl 0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 [root at storage02 ~]# lctl --device 2 set_timeout 600 set_timeout has been deprecated. Use conf_param instead. e.g. conf_param lustre-MDT0000 obd_timeout=50 usage: conf_param obd_timeout=<secs> run <command> after connecting to device <devno> --device <devno> <command [args ...]> [root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600 No device found for na...

How to bypass failed OST without blocking?

2007 Mar 20

How to bypass failed OST without blocking?

Hi I want my lustre do such things during OST failed: if some file has stripe data on th failed OST, any operation on the file will return IO error without blocking, and also at this moment I can create and read/write new file or read/write files which have no stripe data on the failed OST without blocking. What should I do ? How to configure? thanks! swin -------------- next part

Luster clients getting evicted

2008 Feb 04

Luster clients getting evicted

on our cluster that has been running lustre for about 1 month. I have 1 MDT/MGS and 1 OSS with 2 OST''s. Our cluster uses all Gige and has about 608 nodes 1854 cores. We have allot of jobs that die, and/or go into high IO wait, strace shows processes stuck in fstat(). The big problem is (i think) I would like some feedback on it that of these 608 nodes 209 of them have in dmesg

Cannot send after transport endpoint shutdown (-108)

2008 Mar 04

Cannot send after transport endpoint shutdown (-108)

This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve

Checksum Algorithm

2007 Nov 06

Checksum Algorithm

Hi, We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes. I''ve been experimenting with using a simple XOR algorithm. I''ve been able to recover

Enable async journals

2010 Jul 13

Enable async journals

Hi all, we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS using external journals to one with async journals enabled. Question is whether the procedure: umount <filesystem> on all clients umount <osts> on all OSSes e2fsck <ost-device> on all OSSes for all all OSTs tune2fs -O ^has_journal <ost-device> on all

Lustre-discuss Digest, Vol 25, Issue 17

2008 Feb 12

Lustre-discuss Digest, Vol 25, Issue 17

...or the next version of lustre might be the best >>>>>> thing. I >>>>>> had upped the timeout a few days back but the next day i had >>>>>> errors on the MDS box. I have switched it back: >>>>>> >>>>>> lctl conf_param nobackup-MDT0000.sys.timeout=300 >>>>>> >>>>>> I would love to give you that trace but I don''t know how to get >>>>>> it. Is there a debug option to turn on in the clients? >>>>> You can get that by echo t > /proc/sys...

search for: conf_param