Dear list, on a Debian etch system, I have NFS-mounted a Lustre filesystem. The mount was possible without error. However, when I want to access this NFS-mounted Lustre filesystem, I see the following error message on the NFS client system: # ls -l /alcc/alf1/ total 0 ?--------- ? ? ? ? ? /alcc/alf1/admin # ls -ld /alcc/alf1/admin ls: /alcc/alf1/admin: Input/output error each time I access the NFS mounted Lustre system, I see a syslog entry on the client like: kernel: nfs_stat_to_errno: bad nfs status return value: 45 The Lustre client which acts as the NFS server and the Lustre MDT don''t show any error in the logs. On the Lustre client, /alcc/alf1 is the mount point for the Lustre filesystem and this is exported with NFS: # ls -ld /alcc/alf1/admin drwxr-xr-x 3 root root 4096 2009-03-24 10:38 /alcc/alf1/admin # exportfs -v -r exporting 192.168.2.0/24:/alcc/alf1 The Lustre client (and servers) are Debian Lenny systems running a vanilla 2.6.22.19 kernel with Lustre 1.6.7(+ the latest critical patch). All systems have nss access to the same user database, although the example above does not even need this (no root squash on Lustre and NFS). Any hints? TIA, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Ralf, I could fix similar issue on RHEL 5 by updating following /etc/exports entries, /lustremnt *(rw,no_root_squash,async) no_root_squash is important Thanks, Anil On Wed, Apr 22, 2009 at 2:35 PM, Ralf Utermann < ralf.utermann at physik.uni-augsburg.de> wrote:> Dear list, > > on a Debian etch system, I have NFS-mounted a Lustre > filesystem. The mount was possible without error. However, > when I want to access this NFS-mounted Lustre filesystem, I see the > following error message on the NFS client system: > > # ls -l /alcc/alf1/ > total 0 > ?--------- ? ? ? ? ? /alcc/alf1/admin > # ls -ld /alcc/alf1/admin > ls: /alcc/alf1/admin: Input/output error > > each time I access the NFS mounted Lustre system, I see > a syslog entry on the client like: > > kernel: nfs_stat_to_errno: bad nfs status return value: 45 > > The Lustre client which acts as the NFS server and the Lustre MDT > don''t show any error in the logs. On the Lustre client, /alcc/alf1 > is the mount point for the Lustre filesystem and this is exported > with NFS: > # ls -ld /alcc/alf1/admin > drwxr-xr-x 3 root root 4096 2009-03-24 10:38 /alcc/alf1/admin > # exportfs -v -r > exporting 192.168.2.0/24:/alcc/alf1 > > The Lustre client (and servers) are Debian Lenny systems running a > vanilla 2.6.22.19 kernel with Lustre 1.6.7(+ the latest critical patch). > All systems have nss access to the same user database, although > the example above does not even need this (no root squash on > Lustre and NFS). > > Any hints? > TIA, Ralf > -- > Ralf Utermann > _____________________________________________________________________ > Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer > Universit?tsstr.1 > D-86135 Augsburg Phone: +49-821-598-3231 > SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090422/17287b06/attachment-0001.html
anil kumar wrote:> Ralf, > > I could fix similar issue on RHEL 5 by updating following /etc/exports > entries, > > /lustremnt *(rw,no_root_squash,async) > no_root_squash is important >Thanks, Anil, but no_root_squash is already active. The problem is still there. It does not matter whether I try root-owned directories or others. Regards, Ralf [...] -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
On Apr 22, 2009 11:05 +0200, Ralf Utermann wrote:> on a Debian etch system, I have NFS-mounted a Lustre > filesystem. The mount was possible without error.Note that you can also get Debian packages for Lustre...> when I want to access this NFS-mounted Lustre filesystem, I see the > following error message on the NFS client system: > > # ls -l /alcc/alf1/ > total 0 > ?--------- ? ? ? ? ? /alcc/alf1/admin > # ls -ld /alcc/alf1/admin > ls: /alcc/alf1/admin: Input/output error > > each time I access the NFS mounted Lustre system, I see > a syslog entry on the client like: > > kernel: nfs_stat_to_errno: bad nfs status return value: 45Do you mean "43" instead of "45"? Please see thread this week about OS/X client + NFS export on this same list.> All systems have nss access to the same user database, although > the example above does not even need this (no root squash on > Lustre and NFS).If you are sure the numeric userid is the same on the NFS client and the MDS then it may be a different issue. Please run "id" on the client to list the userid, and "/usr/sbin/l_getgroups -d {uid}" on the MDS to verify that the UID can be resolved properly. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> On Apr 22, 2009 11:05 +0200, Ralf Utermann wrote: >> on a Debian etch system, I have NFS-mounted a Lustre >> filesystem. The mount was possible without error. > > Note that you can also get Debian packages for Lustre... >I use the Debian packages, backported to lenny. The latest critical patch had been integrated manually. [...]>> >> kernel: nfs_stat_to_errno: bad nfs status return value: 45 > > Do you mean "43" instead of "45"? Please see thread this week about > OS/X client + NFS export on this same list. >It is really 45, not the 43 from the OS/X thread.>> All systems have nss access to the same user database, although >> the example above does not even need this (no root squash on >> Lustre and NFS). > > If you are sure the numeric userid is the same on the NFS client and > the MDS then it may be a different issue. Please run "id" on the > client to list the userid, and "/usr/sbin/l_getgroups -d {uid}" on > the MDS to verify that the UID can be resolved properly. >I have the problems with user root, as well as with other users. The uid is definitely the same on the systems: user root: on NFS client id gives: uid=0 on MDS: # /usr/sbin/l_getgroups -d 0 uid=0 gid=0 another user: on NFS client id gives: uid=6014 on MDS: # /usr/sbin/l_getgroups -d 6014 l_getgroups: _nss_dce_init: initializing NSS/DCE library. uid=6014 gid=234 [NSS/DCE library is our own library to connect nss on linux to our DCE cell] regards, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Hello! On Apr 23, 2009, at 4:29 AM, Ralf Utermann wrote:>>> kernel: nfs_stat_to_errno: bad nfs status return value: 45 >> Do you mean "43" instead of "45"? Please see thread this week about >> OS/X client + NFS export on this same list. > It is really 45, not the 43 from the OS/X thread.That''s really weird. 45 stands for #define EL2NSYNC 45 /* Level 2 not synchronized */>We really do not use this error value (and I don''t even have idea what it''s supposed to mean).>>> All systems have nss access to the same user database, although >>> the example above does not even need this (no root squash on >>> Lustre and NFS). >> If you are sure the numeric userid is the same on the NFS client and >> the MDS then it may be a different issue. Please run "id" on the >> client to list the userid, and "/usr/sbin/l_getgroups -d {uid}" on >> the MDS to verify that the UID can be resolved properly. > I have the problems with user root, as well as with other users. The > uid is definitely the same on the systems:What might be useful is if you can reproduce this quickly n as few set of Lustre nodes as possible. remember your current /proc/sys/lnet/debug value. on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug then do lctl dk >/dev/null (on those same two nodes). Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes again as soon as possible after the problem was reproduced. Create a new bugzilla bug and attach the files there. Thanks. Bye, Oleg
Oleg Drokin wrote:> Hello! > > On Apr 23, 2009, at 4:29 AM, Ralf Utermann wrote: >>>> kernel: nfs_stat_to_errno: bad nfs status return value: 45 >>> Do you mean "43" instead of "45"? Please see thread this week about >>> OS/X client + NFS export on this same list. >> It is really 45, not the 43 from the OS/X thread. > > That''s really weird. > 45 stands for > #define EL2NSYNC 45 /* Level 2 not synchronized */ > > We really do not use this error value (and I don''t even have idea > what it''s supposed to mean). > >>>> All systems have nss access to the same user database, although >>>> the example above does not even need this (no root squash on >>>> Lustre and NFS). >>> If you are sure the numeric userid is the same on the NFS client and >>> the MDS then it may be a different issue. Please run "id" on the >>> client to list the userid, and "/usr/sbin/l_getgroups -d {uid}" on >>> the MDS to verify that the UID can be resolved properly. >> I have the problems with user root, as well as with other users. The >> uid is definitely the same on the systems: > > What might be useful is if you can reproduce this quickly n as few set > of > Lustre nodes as possible. > remember your current /proc/sys/lnet/debug value. > on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug > then do lctl dk >/dev/null (on those same two nodes). > Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes > again as soon as possible after the problem was reproduced.I did this on both lustre-client/nfs-server and on MDS. The output of lctl dk on both is only: Debug log: 0 lines, 0 kept, 0 dropped, 0 bad. Regards, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Hello! On May 13, 2009, at 7:53 AM, Ralf Utermann wrote:>> What might be useful is if you can reproduce this quickly n as few >> set >> of >> Lustre nodes as possible. >> remember your current /proc/sys/lnet/debug value. >> on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug >> then do lctl dk >/dev/null (on those same two nodes). >> Reproduce the problem and do lctl dk >/tmp/somefile on both of the >> nodes >> again as soon as possible after the problem was reproduced. > I did this on both lustre-client/nfs-server and on MDS. > The output of lctl dk on both is only: > Debug log: 0 lines, 0 kept, 0 dropped, 0 bad.Either Lustre never got any control at all and your problem is unrelated to lustre and related to something else in your system or the logging is broken somewhat. The way to test it is to do ls -la /mnt/lustre (or whatever your lustre mountpoint is) on the nfs server while the rest of the instructions remains and do lctl dk again. If any output appears - the logging works just fine, if not - doublecheck what''s in your /proc/sys/lnet/debug Can you export any other filesystems from that server? Bye, Oleg
Oleg Drokin wrote: [...]> Either Lustre never got any control at all and your problem is > unrelated to lustre and related to something else in your system or > the logging is broken somewhat. The way to test it is to do ls -la > /mnt/lustre (or whatever your lustre mountpoint is) on the nfs server > while the rest of the instructions remains and do lctl dk again. If > any output appears - the logging works just fine, if not - > doublecheck what''s in your /proc/sys/lnet/debugNothing appears in the lctl dk output, if I do ls -la on the Lustre mount point on the Lustre client/NFS server. For this system and the MDS: # cat /proc/sys/lnet/debug trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec Do I need to enable any other things to see debug output?> > Can you export any other filesystems from that server?I can export a local ext3 just fine. There is no problem accessing this on an NFS client. Bye, ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Hello! On May 13, 2009, at 10:48 AM, Ralf Utermann wrote:> Oleg Drokin wrote: > [...] >> Either Lustre never got any control at all and your problem is >> unrelated to lustre and related to something else in your system or >> the logging is broken somewhat. The way to test it is to do ls -la >> /mnt/lustre (or whatever your lustre mountpoint is) on the nfs server >> while the rest of the instructions remains and do lctl dk again. If >> any output appears - the logging works just fine, if not - >> doublecheck what''s in your /proc/sys/lnet/debug > Nothing appears in the lctl dk output, if I do ls -la on the Lustre > mount > point on the Lustre client/NFS server.Hm, that''s really strange. I hope you did not built your Lustre with --disable-libcfs-* configure options? Bye, Oleg
Oleg Drokin wrote: [...]> > Hm, that''s really strange. > I hope you did not built your Lustre with --disable-libcfs-* configure > options?how can I check this? The modules have been built with debian utilities (m-a build ...) On the systems I have a libcfs module: # modinfo libcfs filename: /lib/modules/2.6.22.19-mylustre-2/kernel/net/lustre/libcfs.ko license: GPL description: Portals v3.1 author: Peter J. Braam <braam at clusterfs.com> depends: vermagic: 2.6.22.19-mylustre-2 SMP mod_unload parm: libcfs_subsystem_debug:Lustre kernel debug subsystem mask (int) parm: libcfs_debug:Lustre kernel debug mask (int) parm: libcfs_debug_mb:Total debug buffer size. (int) parm: libcfs_printk:Lustre kernel debug console mask (uint) parm: libcfs_console_ratelimit:Lustre kernel debug console ratelimit (0 to disable) (uint) parm: libcfs_console_max_delay:Lustre kernel debug console max delay (jiffies) (ulong) parm: libcfs_console_min_delay:Lustre kernel debug console min delay (jiffies) (ulong) parm: libcfs_console_backoff:Lustre kernel debug console backoff factor (uint) parm: libcfs_panic_on_lbug:Lustre kernel panic on LBUG (uint) parm: debug_file_path:Path for dumping debug logs, set ''NONE'' to prevent log dumping (charp) bye and thanks for your help! -Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Hello! On May 14, 2009, at 4:05 AM, Ralf Utermann wrote:>> Hm, that''s really strange. >> I hope you did not built your Lustre with --disable-libcfs-* >> configure >> options? > how can I check this? The modules have been built with debian > utilities (m-a build ...)I suppose you can take a look at the build script used and figure out what configure parameters were used. if you did the build yourself and the source tree is still there, config.status should list the options. Bye, Oleg
Oleg Drokin wrote:> Hello! > > On May 14, 2009, at 4:05 AM, Ralf Utermann wrote: >>> Hm, that''s really strange. >>> I hope you did not built your Lustre with --disable-libcfs-* configure >>> options? >> how can I check this? The modules have been built with debian >> utilities (m-a build ...) > > I suppose you can take a look at the build script used and figure out > what configure parameters were used. > if you did the build yourself and the source tree is still there, > config.status should list the options. >you pointed to the right direction, Debian lustre packages normally do not enable libcfs-*, so I rebuild with enable-libcfs-*. Bye, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Oleg Drokin wrote: [...]> > What might be useful is if you can reproduce this quickly n as few set > of > Lustre nodes as possible. > remember your current /proc/sys/lnet/debug value. > on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug > then do lctl dk >/dev/null (on those same two nodes). > Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes > again as soon as possible after the problem was reproduced. > Create a new bugzilla bug and attach the files there.Hi Oleg, so now I am sure to have libcfs-* enabled modules (probably the Debian packages also had it, it''s not disabled in the configure call) and did this test again, however I still do not get any debug lines after accessing the NFS mounted Lustre. But if I stop and start the Lustre client, I get some output from lctl dk, so basically, this should work: # lctl dk [...] 00000400:00020000:3:1242386721.185659:0:3986:0:(router_proc.c:1020:lnet_proc_init()) couldn''t create proc entry sys/lnet/stats 10000000:02000400:3:1242386723.429628:0:4042:0:(mgc_request.c:910:mgc_import_event()) MGC192.168.2.191 at tcp: Reactivating import 00000080:02000400:7:1242386723.488602:0:4109:0:(llite_lib.c:1101:ll_fill_super()) Client alf1-client has started Debug log: 8 lines, 8 kept, 0 dropped, 0 bad. Thanks for your patience, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Hello! On May 15, 2009, at 7:39 AM, Ralf Utermann wrote:> so now I am sure to have libcfs-* enabled modules (probably the Debian > packages also had it, it''s not disabled in the configure call) and > did this test > again, however I still do not get any debug lines after accessing > the NFS mounted Lustre. > But if I stop and start the Lustre client, I get some output from > lctl dk, so > basically, this should work: > # lctl dk > [...] > 00000400:00020000:3:1242386721.185659:0:3986:0:(router_proc.c: > 1020:lnet_proc_init()) couldn''t create proc entry sys/lnet/stats > 10000000:02000400:3:1242386723.429628:0:4042:0:(mgc_request.c: > 910:mgc_import_event()) MGC192.168.2.191 at tcp: Reactivating import > 00000080:02000400:7:1242386723.488602:0:4109:0:(llite_lib.c: > 1101:ll_fill_super()) Client alf1-client has started > Debug log: 8 lines, 8 kept, 0 dropped, 0 bad.Hm. What''s in your /proc/sys/lnet/subsystem_debug, I wonder, if the list of subsystems is small there, try echo -1 >/proc/sys/lnet/ subsystem_debug Bye, Oleg
Oleg Drokin wrote: [...]> Hm. > What''s in your /proc/sys/lnet/subsystem_debug, I wonder, if the list > of subsystems is small there, try echo -1 >/proc/sys/lnet/ > subsystem_debugHi Oleg, now I get something in the logs! The -1 on subsystem_debug fills up the logs now ... I opened bug #19559 and attached log files from Lustre-client/NFS-server and Lustre-MDS while running my ''ls -l'' test on the NFS client. Thanks for your help, Bye, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411