Thierry Delaitre
2006-Sep-20 11:29 UTC
[Lustre-discuss] Problems with 1.5.95 - help needed
I''ve setup Lustre 1.5.95 on SLES10 but i''m experiencing reliability issues when i''m doing an ''ls'' while writing a large file. I get a segfault when doing ls and then the lustre client hangs forever. I have a very basic setup for testing purposes composed of 3 partitions on the same HD: hda3: mdt + mgs hda5: ost1 hda6: ost2 I''m experiencing the same issue if i mount the ost fs on the same machine or remote machine. it says in the 1.4 manual that pre-emption in the kernel should be disabled. Should i say no to the following ? CONFIG_PREEMPT_VOLUNTARY=y Am''i experiencing this because 1.6 is still in beta or i''m doing something wrong ? I cannot use 1.4 because only 1.5.95 supports sles10! Thierry. ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.
Hi You should indeed answer "n" there, and give it another shot. - Peter -> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > Thierry Delaitre > Sent: Wednesday, September 20, 2006 11:28 AM > To: lustre-discuss@clusterfs.com > Subject: [Lustre-discuss] Problems with 1.5.95 - help needed > > > I''ve setup Lustre 1.5.95 on SLES10 but i''m experiencing > reliability issues when i''m doing an ''ls'' while writing a > large file. I get a segfault when doing ls and then the > lustre client hangs forever. I have a very basic setup for > testing purposes composed of 3 partitions on the same HD: > > hda3: mdt + mgs > hda5: ost1 > hda6: ost2 > > I''m experiencing the same issue if i mount the ost fs on the > same machine or remote machine. > > it says in the 1.4 manual that pre-emption in the kernel > should be disabled. Should i say no to the following ? > > CONFIG_PREEMPT_VOLUNTARY=y > > Am''i experiencing this because 1.6 is still in beta or i''m > doing something wrong ? I cannot use 1.4 because only 1.5.95 > supports sles10! > > Thierry. > > ---------------------------------------- > Dr Thierry DELAITRE > Systems and Services Manager, CSCS > University of Westminster > 115 New Cavendish Street, London W1W 6UW > > Tel: 020 7911 5000 ext: 3586 > Fax: 020 7911 5089 > Mobile short dial code 1788 > > http://www.cscs.wmin.ac.uk/~delaitt > ---------------------------------------- > > This e-mail and its attachments are intended for the above > named only and may be confidential. If they have come to you > in error you must not copy or show them to anyone, nor should > you take any action based on them, other than to notify the > error by replying to the sender. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Thierry Delaitre
2006-Sep-20 13:39 UTC
[Lustre-discuss] Problems with 1.5.95 - help needed
On Wed, 20 Sep 2006, Thierry Delaitre wrote:> > I''ve setup Lustre 1.5.95 on SLES10 but i''m experiencing reliability issues > when i''m doing an ''ls'' while writing a large file. I get a segfault when > doing ls and then the lustre client hangs forever. I have a very basic > setup for testing purposes composed of 3 partitions on the same HD: > > hda3: mdt + mgs > hda5: ost1 > hda6: ost2 > > I''m experiencing the same issue if i mount the ost fs on the same machine > or remote machine. > > it says in the 1.4 manual that pre-emption in the kernel should be > disabled. Should i say no to the following ? > > CONFIG_PREEMPT_VOLUNTARY=y > > Am''i experiencing this because 1.6 is still in beta or i''m doing something > wrong ? I cannot use 1.4 because only 1.5.95 supports sles10! > > Thierry.the following scenario makes it hang too: I successfully write 5 files of 500MB in //. Once files have been written, then doing ''ls'' works once but subsequents ''ls'' commands give a segfault and then hang forever. this scenario is reproducible 100%. [1] Done dd if=/dev/zero of=foo1 bs=1k count=500k [2] Done dd if=/dev/zero of=foo2 bs=1k count=500k [3] Done dd if=/dev/zero of=foo3 bs=1k count=500k [4] Done dd if=/dev/zero of=foo4 bs=1k count=500k [5] Done dd if=/dev/zero of=foo5 bs=1k count=500k laptop-17:/mnt/lustre/ost # ls foo1 foo2 foo3 foo4 foo5 laptop-17:/mnt/lustre/ost # ls -l Segmentation fault laptop-17:/mnt/lustre/ost # ls -l Thierry.
Thierry Delaitre
2006-Sep-20 13:40 UTC
[Lustre-discuss] Problems with 1.5.95 - help needed
Hi Peter, Thanks, will try. Thierry. On Wed, 20 Sep 2006, Peter J. Braam wrote:> Hi > > You should indeed answer "n" there, and give it another shot. > > - Peter - > > > -----Original Message----- > > From: lustre-discuss-bounces@clusterfs.com > > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > > Thierry Delaitre > > Sent: Wednesday, September 20, 2006 11:28 AM > > To: lustre-discuss@clusterfs.com > > Subject: [Lustre-discuss] Problems with 1.5.95 - help needed > > > > > > I''ve setup Lustre 1.5.95 on SLES10 but i''m experiencing > > reliability issues when i''m doing an ''ls'' while writing a > > large file. I get a segfault when doing ls and then the > > lustre client hangs forever. I have a very basic setup for > > testing purposes composed of 3 partitions on the same HD: > > > > hda3: mdt + mgs > > hda5: ost1 > > hda6: ost2 > > > > I''m experiencing the same issue if i mount the ost fs on the > > same machine or remote machine. > > > > it says in the 1.4 manual that pre-emption in the kernel > > should be disabled. Should i say no to the following ? > > > > CONFIG_PREEMPT_VOLUNTARY=y > > > > Am''i experiencing this because 1.6 is still in beta or i''m > > doing something wrong ? I cannot use 1.4 because only 1.5.95 > > supports sles10! > > > > Thierry. > > > > ---------------------------------------- > > Dr Thierry DELAITRE > > Systems and Services Manager, CSCS > > University of Westminster > > 115 New Cavendish Street, London W1W 6UW > > > > Tel: 020 7911 5000 ext: 3586 > > Fax: 020 7911 5089 > > Mobile short dial code 1788 > > > > http://www.cscs.wmin.ac.uk/~delaitt > > ---------------------------------------- > > > > This e-mail and its attachments are intended for the above > > named only and may be confidential. If they have come to you > > in error you must not copy or show them to anyone, nor should > > you take any action based on them, other than to notify the > > error by replying to the sender. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.
Thierry Delaitre
2006-Sep-20 15:46 UTC
[Lustre-discuss] Problems with 1.5.95 - help needed
Hi Peter, i''ve now recompiled the sles10 kernel with CONFIG_PREEMPT_VOLUNTARY set to ''n'' and Lustre 1.5.95 seems reliable so far even under heavy load from a local and remote clients. will continue testing! Thanks, Thierry, On Wed, 20 Sep 2006, Peter J. Braam wrote:> Hi > > You should indeed answer "n" there, and give it another shot. > > - Peter - > > > -----Original Message----- > > From: lustre-discuss-bounces@clusterfs.com > > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > > Thierry Delaitre > > Sent: Wednesday, September 20, 2006 11:28 AM > > To: lustre-discuss@clusterfs.com > > Subject: [Lustre-discuss] Problems with 1.5.95 - help needed > > > > > > I''ve setup Lustre 1.5.95 on SLES10 but i''m experiencing > > reliability issues when i''m doing an ''ls'' while writing a > > large file. I get a segfault when doing ls and then the > > lustre client hangs forever. I have a very basic setup for > > testing purposes composed of 3 partitions on the same HD: > > > > hda3: mdt + mgs > > hda5: ost1 > > hda6: ost2 > > > > I''m experiencing the same issue if i mount the ost fs on the > > same machine or remote machine. > > > > it says in the 1.4 manual that pre-emption in the kernel > > should be disabled. Should i say no to the following ? > > > > CONFIG_PREEMPT_VOLUNTARY=y > > > > Am''i experiencing this because 1.6 is still in beta or i''m > > doing something wrong ? I cannot use 1.4 because only 1.5.95 > > supports sles10! > > > > Thierry.