thr3ads.net - Lustre discuss - [Lustre-discuss] patchless kernel problems [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Robin Humble

2007-Jun-27 06:34 UTC

[Lustre-discuss] patchless kernel problems

I''ve been trying out patchless kernels and the attached simple code
appears to trigger a failure in Lustre 1.6.0.1. I couldn''t see anything
in bugzilla about it.

typically I see 4+ open() failures out of 32 on the first run after a
Lustre filesystem is mounted. often (but not always) the number of
failures decreases to a few or 0 on subsequent runs.

eg. typical output (where no output is success) would be:
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of ''/mnt/testfs/rjh/blk016.dat'' failed on rank 16,
hostname ''x15''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk018.dat'' failed on rank 18,
hostname ''x15''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk019.dat'' failed on rank 19,
hostname ''x15''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk022.dat'' failed on rank 22,
hostname ''x16''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk020.dat'' failed on rank 20,
hostname ''x16''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk023.dat'' failed on rank 23,
hostname ''x16''
 open: No such file or directory
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of ''/mnt/testfs/rjh/blk014.dat'' failed on rank 14,
hostname ''x14''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk013.dat'' failed on rank 13,
hostname ''x14''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk015.dat'' failed on rank 15,
hostname ''x14''
 open: No such file or directory
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh

which is 32 threads across 6 nodes attempting to open and close 1 file each
(32 files total) in 1 directory over o2ib.

if I umount and remount the filesystem then the higher rate of errors
occurs again.
  cexec :11-16 umount /mnt/testfs
  cexec :11-16 /usr/sbin/lustre_rmmod ; cexec :11-16 /usr/sbin/lustre_rmmod
  cexec :11-16 mount -t lustre x17ib@o2ib:/testfs /mnt/testfs

Note that the same failures happen over GigE too, but only on larger
tests. eg. -np 64 or 128. so the extra speed of IB is triggering the
bugs sooner.

if Lustre kernel rpms (eg. 2.6.9-42.0.10.EL_lustre-1.6.0.1smp) are used
instead of the patchless kernels, then I don''t see any failures. tested
out to -np 512.

patchless 2.6.19.7 and 2.6.21.5 give failures at about the same rate.
modules for 2.6.19.7 were built using the standard Lustre 1.6.0.1
tarball, and 2.6.21.5 modules were built using
 
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/lustre-1.6.0.1-ql3.tar.bz2
as that''s a lot easier to work with than
  https://bugzilla.lustre.org/show_bug.cgi?id=11647

Lustre setup is:
  1 OSS node with 2 OSTs, each md raid0 SAS, 2.6.9-42.0.10.EL_lustre-1.6.0.1smp
  1 MDS node with MDT on a 3G ramdisk                ""
  6 client nodes
  no lustre striping, lnet debugging as the default
  all nodes are dual dual-core Xeon x86_64 CentOS4.5
  nodes are booting diskless oneSIS

another data point is that if I rm all the files in the dir then the
test succeeds more often (up until the time the fs is umount''d and
remounted). so something about the unlink/create combo might be the
problem. eg.
  % cexec :11 rm /mnt/testfs/rjh/''*''
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
# the above both succeed. umount and remount the fs as per above, then:
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of ''/mnt/testfs/rjh/blk014.dat'' failed on rank 14,
hostname ''x14''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk015.dat'' failed on rank 15,
hostname ''x14''
 open: No such file or directory
 open of ''/mnt/testfs/rjh/blk010.dat'' failed on rank 10,
hostname ''x13''
 open: No such file or directory
 ...

please let me know if you''d like me to re-run anything with a different
setup or try different kernels or something...

cheers,
robin
-------------- next part --------------

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <mpi.h>

int main(int nargs, char** argv)
{
         int myRank;
         char fname[128];
         int fp;
	 int mpiErr, closeErr;
         char name[64];

         mpiErr = MPI_Init(&nargs, &argv);
         if ( mpiErr ) perror( "MPI_Init" );
         mpiErr = MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
         if ( mpiErr ) perror( "MPI_Comm_rank" );
         gethostname(name, sizeof(name));

         sprintf(fname,"%s/blk%03d.dat", argv[1], myRank);

         fp = open(fname, (O_RDWR | O_CREAT | O_TRUNC), 0640 );
         if ( fp == -1 ) {
                 fprintf(stderr,"open of ''%s'' failed on
rank %d, hostname ''%s''\n", fname, myRank, name);
                 perror("open");
         }
         else {
                 closeErr = close(fp);
                 if ( closeErr ) {
                         fprintf(stderr,"close of ''%s''
failed on rank %d\n", fname, myRank);
                         perror("close");
                 }
         }

         mpiErr = MPI_Finalize();
         if ( mpiErr ) perror( "MPI_Finalize" );

         exit(0);
}

Bernd Schubert

2007-Jun-27 07:16 UTC

head link

[Lustre-discuss] patchless kernel problems

Hello Robin,

On Wednesday 27 June 2007 14:34:11 Robin Humble wrote:> I''ve been trying out patchless kernels and the attached simple
code
> appears to trigger a failure in Lustre 1.6.0.1. I couldn''t see
anything
> in bugzilla about it.
>
> typically I see 4+ open() failures out of 32 on the first run after a
> Lustre filesystem is mounted. often (but not always) the number of
> failures decreases to a few or 0 on subsequent runs.
>
> eg. typical output (where no output is success) would be:
>   % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open
> /mnt/testfs/rjh open of ''/mnt/testfs/rjh/blk016.dat''
failed on rank 16,
[...]
>
> which is 32 threads across 6 nodes attempting to open and close 1 file each
> (32 files total) in 1 directory over o2ib.
>
> if I umount and remount the filesystem then the higher rate of errors
> occurs again.
>   cexec :11-16 umount /mnt/testfs
>   cexec :11-16 /usr/sbin/lustre_rmmod ; cexec :11-16 /usr/sbin/lustre_rmmod
>   cexec :11-16 mount -t lustre x17ib@o2ib:/testfs /mnt/testfs
>
> Note that the same failures happen over GigE too, but only on larger
> tests. eg. -np 64 or 128. so the extra speed of IB is triggering the
> bugs sooner.
>
> if Lustre kernel rpms (eg. 2.6.9-42.0.10.EL_lustre-1.6.0.1smp) are used
> instead of the patchless kernels, then I don''t see any failures.
tested
> out to -np 512.
>
> patchless 2.6.19.7 and 2.6.21.5 give failures at about the same rate.
> modules for 2.6.19.7 were built using the standard Lustre 1.6.0.1
> tarball, and 2.6.21.5 modules were built using
>  
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/lustre-1
>.6.0.1-ql3.tar.bz2 as that''s a lot easier to work with than
>   https://bugzilla.lustre.org/show_bug.cgi?id=11647
for real MPI jobs you will probably also need flock support, but for this you 
will need -ql4 (bug #12802  and #11880).

Could you please test again with a patched 2.6.20 or 2.6.21? So far we
don''t
need patchless clients, so I don''t test this extensivle. I also
didn''t test
2.6.21 very much. I think I will skip 2.6.21 at all and after adding sanity 
tests for the 2.6.22 patches will test this version more thorougly.

[...]

Also, did you see anything in the logs (server and clients)?
> another data point is that if I rm all the files in the dir then the
> test succeeds more often (up until the time the fs is umount''d and
> remounted). so something about the unlink/create combo might be the
> problem. eg.
When I run "sanity.sh 76" it will. Test 76 is also about unlink/create
actions, only that it tests the inode cache. No idea so far where I need to 
look into...



Cheers,
Bernd


PS: I would test this here, but presently we have too few customer systems in 
repair to do these tests with ;)

-- 
Bernd Schubert
Q-Leap Networks GmbH

Alexey Lyashkov

2007-Jun-27 13:27 UTC

head link

[Lustre-discuss] patchless kernel problems

Robin,

can you test patch from bug 12123, how it fix you problem?
looks this same symptoms.


On Wed, 2007-06-27 at 08:34 -0400, Robin Humble wrote:> I''ve been trying out patchless kernels and the attached simple
code
> appears to trigger a failure in Lustre 1.6.0.1. I couldn''t see
anything
> in bugzilla about it.
> 
> typically I see 4+ open() failures out of 32 on the first run after a
> Lustre filesystem is mounted. often (but not always) the number of
> failures decreases to a few or 0 on subsequent runs.
> 
> eg. typical output (where no output is success) would be:
>   % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open
/mnt/testfs/rjh
>  open of ''/mnt/testfs/rjh/blk016.dat'' failed on rank 16,
hostname ''x15''
>  open: No such file or directory
-- 
Alexey Lyashkov <shadow@clusterfs.com>
Beaver team, Cluster filesystem

Robin Humble

2007-Jun-28 08:32 UTC

head link

[Lustre-discuss] patchless kernel problems

Hi Bernd and Alexey,

thanks for the prompt response and suggestions.

On Wed, Jun 27, 2007 at 03:16:37PM +0200, Bernd Schubert
wrote:>> if Lustre kernel rpms (eg. 2.6.9-42.0.10.EL_lustre-1.6.0.1smp) are used
>> instead of the patchless kernels, then I don''t see any
failures. tested
>> out to -np 512.
>> patchless 2.6.19.7 and 2.6.21.5 give failures at about the same rate.
>> modules for 2.6.19.7 were built using the standard Lustre 1.6.0.1
>> tarball, and 2.6.21.5 modules were built using
>>  
>>
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/lustre-1
>>.6.0.1-ql3.tar.bz2 as that''s a lot easier to work with than
>>   https://bugzilla.lustre.org/show_bug.cgi?id=11647
>
>for real MPI jobs you will probably also need flock support, but for this
you
>will need -ql4 (bug #12802  and #11880).
the test code is actually a distilled part of a real MPI job which
doesn''t need flock() - they''re happy just to writing to
separate files.
thanks for the info though.

Alexey Lyashkov wrote:>can you test patch from bug 12123, how it fix you problem?
>looks this same symptoms.
patchless:
a modified ql4 with the patch from bugzilla 12123(*) gives no joy :-(
both patchless kernels 2.6.20 and 2.6.21 were tried and the open()
failures still occur with the same symptoms.

Bernd Schubert wrote:>Could you please test again with a patched 2.6.20 or 2.6.21? So far we
don''t
patched:
clients with patched 2.6.20 kernel (ql4 + 12123 similar to above) have
no problems. I tried 2.6.20 and 2.6.9-42.0.10.EL_lustre-1.6.0.1smp on
the servers and both worked ok with the patched clients.

patchless 2.6.20 clients with patched 2.6.20 servers have the familiar
open() failures. as always, the best way to trigger this is by
umount''ing and re-mounting the fs.
>Also, did you see anything in the logs (server and clients)?
there''s nothing unusual logged in dmesg or /var/log/messages anywhere
that I can see. just startup and shutdown messages.

cheers,
robin

(*) the patch didn''t apply cleanly but was easy to merge

Alexey Lyashkov

2007-Jun-29 03:36 UTC

head link

[Lustre-discuss] patchless kernel problems

> patchless:
> a modified ql4 with the patch from bugzilla 12123(*) gives no joy :-(
> both patchless kernels 2.6.20 and 2.6.21 were tried and the open()
> failures still occur with the same symptoms.
> Robin, 

can you replicate this bug with set lnet.debug to -1, dump lustre debug
dump with lctl dk > some-file and fill bug report in bugzilla ?


-- 
Alexey Lyashkov <shadow@clusterfs.com>
Beaver team, Cluster filesystem

Robin Humble

2007-Jul-01 22:31 UTC

head link

[Lustre-discuss] patchless kernel problems

Hi Alexey,

On Fri, Jun 29, 2007 at 12:36:23PM +0300, Alexey Lyashkov
wrote:>can you replicate this bug with set lnet.debug to -1, dump lustre debug
>dump with lctl dk > some-file and fill bug report in bugzilla ?
no probs.
  https://bugzilla.lustre.org/show_bug.cgi?id=12880

cheers,
robin

Alexey Lyashkov

2007-Jul-02 03:10 UTC

head link

[Lustre-discuss] patchless kernel problems

Hi Robin,

Thanks for sumbit bug, but looks client log is to short, can you check
it? is sysctl variable ''lnet.debug'' set to -1 ?
 

On Mon, 2007-07-02 at 00:31 -0400, Robin Humble wrote:> Hi Alexey,
> 
> On Fri, Jun 29, 2007 at 12:36:23PM +0300, Alexey Lyashkov wrote:
> >can you replicate this bug with set lnet.debug to -1, dump lustre debug
> >dump with lctl dk > some-file and fill bug report in bugzilla ?
> 
> no probs.
>   https://bugzilla.lustre.org/show_bug.cgi?id=12880
> 
> cheers,
> robin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- 
Alexey Lyashkov <shadow@clusterfs.com>
Beaver team, Cluster filesystem

Robin Humble

2007-Jul-03 02:16 UTC

head link

[Lustre-discuss] patchless kernel problems

Hi Alexey,

On Mon, Jul 02, 2007 at 12:10:13PM +0300, Alexey Lyashkov
wrote:>Thanks for sumbit bug, but looks client log is to short, can you check
>it? is sysctl variable ''lnet.debug'' set to -1 ?
I''m afraid that''s all there is :-/ they''re set to
debug.

bugzilla has logs where I set lnet debug after the fs was setup, but
(see below) I also tried modprobe''ing lnet, setting it to debug, and
then mkfs and mounting the rest of lustre. it made no difference.

below is a script I used to build the fs for this test. let me know
if you can see a problem with it.
the ''check lnet debug'' step below reports (same for every
node):
  lnet.debug = trace inode super ext2 malloc cache info ioctl neterror net
warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace
vfstrace reada mmap config console quota sec

cheers,
robin

#!/bin/sh
# resetLustreWithDebug

# tidy
cexec :10-14,16 umount /mnt/testfs 
cexec :1 umount /mnt/ost0 /mnt/ost1
cexec :17 umount /mnt/mdt
cexec :1,10-14,16-17 /usr/sbin/lustre_rmmod 
cexec :1,10-14,16-17 /usr/sbin/lustre_rmmod 

# check
cexec :1,10-14,16-17 ''lsmod | wc -l''
cexec :1,10-14,16-17 ''mount | grep lustre''

# lnet into debug mode
cexec :1,10-14,16-17 modprobe lnet 
cexec :1,10-14,16-17 sysctl lnet.debug=-1

# check
cexec :1,10-14,16-17 sysctl lnet.debug

# mkfs
cexec :17 mkfs.lustre --fsname=testfs --mdt --mgs --reformat /dev/loop0 &
cexec :1 mkfs.lustre --fsname=testfs --ost --mgsnode=x17ib@o2ib --reformat
/dev/md0 &
cexec :1 mkfs.lustre --fsname=testfs --ost --mgsnode=x17ib@o2ib --reformat
/dev/md1 &
wait

# mount
cexec :17 mount -t lustre /dev/loop0 /mnt/mdt
cexec :1 mount -t lustre /dev/md0 /mnt/ost0
cexec :1 mount -t lustre /dev/md1 /mnt/ost1
cexec :10-14,16 mount -t lustre x17ib@o2ib:/testfs /mnt/testfs

# prep for test
cexec :10 chmod 1777 /mnt/testfs
cexec :10 mkdir /mnt/testfs/rjh
cexec :10 chown rjh.rjh /mnt/testfs/rjh

# then run the test as a the user...

# then gather logs with
#  ssh x1 lctl dk > dk.x1.oss
#  ssh x17 lctl dk > dk.x17.mds
#  cexec :10-14,16 lctl dk > dk.nodes

Lustre discuss - Jun 2007 - patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems

[Lustre-discuss] patchless kernel problems