Michael Sternberg
2009-Oct-28 20:38 UTC
[Lustre-discuss] concurrent open() fails sporadically
Greetings, I''m seeing open() failures when attempting concurrent access in a lustre fs. The following Fortran program fails sporadically when run under mpirun, even on the same host. Note that there is no MPI statement; the mpirun simply keeps the startup times very close together: ---------------------------------------------------- $ cat test.f program test open(1, file = ''test.dat'', status = ''old'') close(1) write(*,*) "OK" end $ gfortran test.f $ mpirun -np 8 a.out OK OK OK OK OK OK OK OK $ mpirun -np 8 a.out OK OK OK OK OK OK At line 2 of file test.f Fortran runtime error: No such file or directory OK ---------------------------------------------------- The "status= ''old''" seems to be the trigger. A C version never failed (thus far): ---------------------------------------------------- $ cat test.c #include <stdio.h> #include <errno.h> #include <unistd.h> main () { if (fopen("test.dat", "r") == NULL) { perror("test.dat"); } else { char hostname[20]; gethostname(hostname, 20); printf("%s: OK\n", hostname); } } ---------------------------------------------------- I run 2.6.18-92.1.17.el5_lustre.1.6.7.1smp on RHEL-5.3. The error shows up with both gfortran-4.1.2 20080704 (Red Hat 4.1.2-44) and Intel Fortran 10.1 20090817. The data file size is some 800K. Nothing from lustre shows up in syslog on the clients or servers. The error is quite unexpected for such a basic operation. Where should I look for parameters to tweak? I have mounted on the client: mds01_ib at o2ib:mds02_ib at o2ib:/sandbox on /sandbox type lustre (rw) on the MDS: /dev/dm-2 on /mnt/mdt-sandbox type lustre (rw) and OSS: /dev/dm-2 on /mnt/ost0-sandbox type lustre (rw) The MGS/MDS sit on the same disk, /dev/dm-1 (which also serves /home) With best regards, Michael
Brian J. Murrell
2009-Oct-28 20:47 UTC
[Lustre-discuss] concurrent open() fails sporadically
On Wed, 2009-10-28 at 15:38 -0500, Michael Sternberg wrote:> > I''m seeing open() failures when attempting concurrent access in a > lustre fs. > > The following Fortran program fails sporadically when run under > mpirun, even on the same host.Yet...> A C version never > failed (thus far):This might be indicative. Maybe not. Fortran might just be exposing a race condition that the C version is not.> Nothing from lustre shows up in syslog on the clients or servers.Ahh. Well, I''d be sceptical that this is a Lustre problem then.> The error is quite unexpected for such a basic operation. Where > should I look for parameters to tweak?There is nothing that needs tweaking to make such a use case work. As you see with your C program. I trust the C program more as it''s programming much closer to the system calls than Fortran would. What would be ideal is an strace of the fortran program failing so that we can see what the system calls did. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091028/4643cf76/attachment.bin
David Singleton
2009-Oct-28 21:40 UTC
[Lustre-discuss] concurrent open() fails sporadically
Michael Sternberg wrote:> Greetings, > > > I''m seeing open() failures when attempting concurrent access in a > lustre fs. > > The following Fortran program fails sporadically when run under > mpirun, even on the same host. Note that there is no MPI statement; > the mpirun simply keeps the startup times very close together: >See https://bugzilla.lustre.org/show_bug.cgi?id=17545 David
Michael Sternberg
2009-Oct-28 23:00 UTC
[Lustre-discuss] concurrent open() fails sporadically
On Oct 28, 2009, at 15:47 , Brian J. Murrell wrote:> On Wed, 2009-10-28 at 15:38 -0500, Michael Sternberg wrote: >> I''m seeing open() failures when attempting concurrent access in a >> lustre fs. >> [..] >> A C version never failed (thus far): > > This might be indicative. Maybe not. Fortran might just be > exposing a > race condition that the C version is not. > [..] > What would be ideal is an strace of the fortran program failing so > that > we can see what the system calls did.Great suggestion! Turns out the file in question has mode 0440, but since the open() is not otherwise specified, Fortran first tries to open read-write, and only then read-only. I''m using: mpirun -np 2 bash -c ''strace -tt ./a.out 2> strace7-$$.err'' > strace7.out Here''s a failure case where the first process fails, and the seconds succeeds. The difference is that in the first process the initial open(.., O_RDWR) returns with ENOENT (fatal) vs. EACCES (will retry). If the timestamps can be trusted, the failing open() comes 0.1 ms *after* the succeeding PID''s open(.., O_RDONLY). $ tail -n 15 strace7* ==> strace7-10831.err <=17:27:42.630621 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0 17:27:42.630686 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 17:27:42.630753 open("test.dat", O_RDWR) = -1 ENOENT (No such file or directory) 17:27:42.631044 write(2, "At line ", 8At line ) = 8 17:27:42.631111 write(2, "2", 12) = 1 17:27:42.631171 write(2, " of file ", 9 of file ) = 9 17:27:42.631248 write(2, "test.f", 6test.f) = 6 17:27:42.631322 write(2, "\n", 1 ) = 1 17:27:42.631385 write(2, "Fortran runtime error: ", 23Fortran runtime error: ) = 23 17:27:42.631443 write(2, "No such file or directory", 25No such file or directory) = 25 17:27:42.631500 write(2, "\n", 1 ) = 1 17:27:42.631563 close(0) = 0 17:27:42.631615 exit_group(2) = ? ==> strace7-10832.err <=17:27:42.629790 fstat(2, {st_mode=S_IFREG|0664, st_size=5542, ...}) = 0 17:27:42.629984 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff8a624490) = -1 ENOTTY (Inappropriate ioctl for device) 17:27:42.630076 stat("test.dat", {st_mode=S_IFREG|0440, st_size=805891, ...}) = 0 17:27:42.630163 fstat(2, {st_mode=S_IFREG|0664, st_size=5813, ...}) = 0 17:27:42.630235 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 17:27:42.630299 fstat(0, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 17:27:42.630364 open("test.dat", O_RDWR) = -1 EACCES (Permission denied) 17:27:42.630648 open("test.dat", O_RDONLY) = 3 17:27:42.630921 fstat(3, {st_mode=S_IFREG|0440, st_size=805891, ...}) = 0 17:27:42.630998 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff8a623240) = -1 ENOTTY (Inappropriate ioctl for device) 17:27:42.631055 close(3) = 0 17:27:42.631133 write(1, " OK", 3) = 3 17:27:42.631193 write(1, "\n", 1) = 1 17:27:42.631252 close(0) = 0 17:27:42.631331 exit_group(0) = ? A workaround for my user is to either "chmod u+w datafile" or, more cleanly, be explicit in the Fortran open() by saying ACTION=''READ''. With best regards, Michael
Michael Sternberg
2009-Oct-28 23:38 UTC
[Lustre-discuss] concurrent open() fails sporadically
On Oct 28, 2009, at 16:40 , David Singleton wrote:> Michael Sternberg wrote: >> I''m seeing open() failures when attempting concurrent access in a >> lustre fs. >> >> The following Fortran program fails sporadically when run under >> mpirun, even on the same host. Note that there is no MPI statement; >> the mpirun simply keeps the startup times very close together: > > See https://bugzilla.lustre.org/show_bug.cgi?id=17545Thanks - was preparing to file one when a search for "concurrent open" didn''t return a hit. Good to see: "Landed on 1.8.2". Michael