thr3ads.net - Lustre discuss - [Lustre-discuss] NFS export problem [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Ralf Utermann

2009-Apr-22 09:05 UTC

[Lustre-discuss] NFS export problem

Dear list,

on a Debian etch system, I have NFS-mounted a Lustre
filesystem. The mount was possible without error. However,
when I want to access this NFS-mounted Lustre filesystem, I see the
following error message on the NFS client system:

# ls -l /alcc/alf1/
total 0
?--------- ? ? ? ?                ? /alcc/alf1/admin
# ls -ld /alcc/alf1/admin
ls: /alcc/alf1/admin: Input/output error

each time I access the NFS mounted Lustre system, I see
a syslog entry on the client like:

kernel: nfs_stat_to_errno: bad nfs status return value: 45

The Lustre client which acts as the NFS server and the Lustre MDT
don''t show any error in the logs. On the Lustre client, /alcc/alf1
is the mount point for the Lustre filesystem and this is exported
with NFS:
# ls -ld /alcc/alf1/admin
drwxr-xr-x 3 root root 4096 2009-03-24 10:38 /alcc/alf1/admin
# exportfs -v -r
exporting 192.168.2.0/24:/alcc/alf1

The Lustre client (and servers) are Debian Lenny systems running a 
vanilla 2.6.22.19 kernel with Lustre 1.6.7(+ the latest critical patch).
All systems have nss  access to the same user database, although
the example above does not even need this (no root squash on
Lustre and NFS).

Any hints?
TIA, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

anil kumar

2009-Apr-22 11:02 UTC

head link

[Lustre-discuss] NFS export problem

Ralf,

I could fix similar issue on RHEL 5 by updating following /etc/exports
entries,

/lustremnt  *(rw,no_root_squash,async)
no_root_squash is important

Thanks,
Anil
On Wed, Apr 22, 2009 at 2:35 PM, Ralf Utermann <
ralf.utermann at physik.uni-augsburg.de> wrote:
> Dear list,
>
> on a Debian etch system, I have NFS-mounted a Lustre
> filesystem. The mount was possible without error. However,
> when I want to access this NFS-mounted Lustre filesystem, I see the
> following error message on the NFS client system:
>
> # ls -l /alcc/alf1/
> total 0
> ?--------- ? ? ? ?                ? /alcc/alf1/admin
> # ls -ld /alcc/alf1/admin
> ls: /alcc/alf1/admin: Input/output error
>
> each time I access the NFS mounted Lustre system, I see
> a syslog entry on the client like:
>
> kernel: nfs_stat_to_errno: bad nfs status return value: 45
>
> The Lustre client which acts as the NFS server and the Lustre MDT
> don''t show any error in the logs. On the Lustre client, /alcc/alf1
> is the mount point for the Lustre filesystem and this is exported
> with NFS:
> # ls -ld /alcc/alf1/admin
> drwxr-xr-x 3 root root 4096 2009-03-24 10:38 /alcc/alf1/admin
> # exportfs -v -r
> exporting 192.168.2.0/24:/alcc/alf1
>
> The Lustre client (and servers) are Debian Lenny systems running a
> vanilla 2.6.22.19 kernel with Lustre 1.6.7(+ the latest critical patch).
> All systems have nss  access to the same user database, although
> the example above does not even need this (no root squash on
> Lustre and NFS).
>
> Any hints?
> TIA, Ralf
> --
>        Ralf Utermann
> _____________________________________________________________________
>        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
>        Universit?tsstr.1
>        D-86135 Augsburg                     Phone:  +49-821-598-3231
>        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090422/17287b06/attachment-0001.html

Ralf Utermann

2009-Apr-22 11:32 UTC

head link

[Lustre-discuss] NFS export problem

anil kumar wrote:> Ralf,
>  
> I could fix similar issue on RHEL 5 by updating following /etc/exports
> entries,
>  
> /lustremnt  *(rw,no_root_squash,async)
> no_root_squash is important
>  Thanks, Anil,

but no_root_squash is already active. The problem is still there.
It does not matter whether I try root-owned directories or others.

Regards, Ralf

[...]
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Andreas Dilger

2009-Apr-22 18:00 UTC

head link

[Lustre-discuss] NFS export problem

On Apr 22, 2009  11:05 +0200, Ralf Utermann wrote:> on a Debian etch system, I have NFS-mounted a Lustre
> filesystem. The mount was possible without error.
Note that you can also get Debian packages for Lustre...
> when I want to access this NFS-mounted Lustre filesystem, I see the
> following error message on the NFS client system:
> 
> # ls -l /alcc/alf1/
> total 0
> ?--------- ? ? ? ?                ? /alcc/alf1/admin
> # ls -ld /alcc/alf1/admin
> ls: /alcc/alf1/admin: Input/output error
> 
> each time I access the NFS mounted Lustre system, I see
> a syslog entry on the client like:
> 
> kernel: nfs_stat_to_errno: bad nfs status return value: 45
Do you mean "43" instead of "45"?  Please see thread this
week about
OS/X client + NFS export on this same list.
> All systems have nss  access to the same user database, although
> the example above does not even need this (no root squash on
> Lustre and NFS).
If you are sure the numeric userid is the same on the NFS client and
the MDS then it may be a different issue.  Please run "id" on the
client to list the userid, and "/usr/sbin/l_getgroups -d {uid}" on
the MDS to verify that the UID can be resolved properly.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Ralf Utermann

2009-Apr-23 08:29 UTC

head link

[Lustre-discuss] NFS export problem

Andreas Dilger wrote:> On Apr 22, 2009  11:05 +0200, Ralf Utermann wrote:
>> on a Debian etch system, I have NFS-mounted a Lustre
>> filesystem. The mount was possible without error.
> 
> Note that you can also get Debian packages for Lustre...
> 
I use the Debian packages, backported to lenny. The latest critical patch
had been integrated manually.

[...]>>
>> kernel: nfs_stat_to_errno: bad nfs status return value: 45
> 
> Do you mean "43" instead of "45"?  Please see thread
this week about
> OS/X client + NFS export on this same list.
> 
It is really 45, not the 43 from the OS/X thread.
>> All systems have nss  access to the same user database, although
>> the example above does not even need this (no root squash on
>> Lustre and NFS).
> 
> If you are sure the numeric userid is the same on the NFS client and
> the MDS then it may be a different issue.  Please run "id" on the
> client to list the userid, and "/usr/sbin/l_getgroups -d {uid}"
on
> the MDS to verify that the UID can be resolved properly.
> I have the problems with user root, as well as with other users. The
uid is definitely the same on the systems:

user root:
on NFS client id gives: uid=0
on MDS: # /usr/sbin/l_getgroups -d 0
uid=0 gid=0

another user:
on NFS client id gives: uid=6014
on MDS: # /usr/sbin/l_getgroups -d 6014
l_getgroups: _nss_dce_init: initializing NSS/DCE library.
uid=6014 gid=234

[NSS/DCE library is our own library to connect nss
 on linux to our DCE cell]


regards, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Oleg Drokin

2009-May-11 15:08 UTC

head link

[Lustre-discuss] NFS export problem

Hello!

On Apr 23, 2009, at 4:29 AM, Ralf Utermann wrote:>>> kernel: nfs_stat_to_errno: bad nfs status return value: 45
>> Do you mean "43" instead of "45"?  Please see
thread this week about
>> OS/X client + NFS export on this same list.
> It is really 45, not the 43 from the OS/X thread.
That''s really weird.
45 stands for
#define EL2NSYNC        45      /* Level 2 not synchronized */
>We really do not use this error value (and I don''t even have idea
what it''s supposed to mean).
>>> All systems have nss  access to the same user database, although
>>> the example above does not even need this (no root squash on
>>> Lustre and NFS).
>> If you are sure the numeric userid is the same on the NFS client and
>> the MDS then it may be a different issue.  Please run "id" on
the
>> client to list the userid, and "/usr/sbin/l_getgroups -d
{uid}" on
>> the MDS to verify that the UID can be resolved properly.
> I have the problems with user root, as well as with other users. The
> uid is definitely the same on the systems:
What might be useful is if you can reproduce this quickly n as few set  
of
Lustre nodes as possible.
remember your current /proc/sys/lnet/debug value.
on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug
then do lctl dk >/dev/null (on those same two nodes).
Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes
again as soon as possible after the problem was reproduced.
Create a new bugzilla bug and attach the files there.

Thanks.

Bye,
     Oleg

Ralf Utermann

2009-May-13 11:53 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:> Hello!
> 
> On Apr 23, 2009, at 4:29 AM, Ralf Utermann wrote:
>>>> kernel: nfs_stat_to_errno: bad nfs status return value: 45
>>> Do you mean "43" instead of "45"?  Please see
thread this week about
>>> OS/X client + NFS export on this same list.
>> It is really 45, not the 43 from the OS/X thread.
> 
> That''s really weird.
> 45 stands for
> #define EL2NSYNC        45      /* Level 2 not synchronized */
> 
> We really do not use this error value (and I don''t even have idea
> what it''s supposed to mean).
> 
>>>> All systems have nss  access to the same user database,
although
>>>> the example above does not even need this (no root squash on
>>>> Lustre and NFS).
>>> If you are sure the numeric userid is the same on the NFS client
and
>>> the MDS then it may be a different issue.  Please run
"id" on the
>>> client to list the userid, and "/usr/sbin/l_getgroups -d
{uid}" on
>>> the MDS to verify that the UID can be resolved properly.
>> I have the problems with user root, as well as with other users. The
>> uid is definitely the same on the systems:
> 
> What might be useful is if you can reproduce this quickly n as few set  
> of
> Lustre nodes as possible.
> remember your current /proc/sys/lnet/debug value.
> on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug
> then do lctl dk >/dev/null (on those same two nodes).
> Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes
> again as soon as possible after the problem was reproduced.
I did this on both  lustre-client/nfs-server and on MDS.
The output of lctl dk on both is only:

Debug log: 0 lines, 0 kept, 0 dropped, 0 bad.


Regards, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Oleg Drokin

2009-May-13 13:16 UTC

head link

[Lustre-discuss] NFS export problem

Hello!

On May 13, 2009, at 7:53 AM, Ralf Utermann wrote:>> What might be useful is if you can reproduce this quickly n as few  
>> set
>> of
>> Lustre nodes as possible.
>> remember your current /proc/sys/lnet/debug value.
>> on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug
>> then do lctl dk >/dev/null (on those same two nodes).
>> Reproduce the problem and do lctl dk >/tmp/somefile on both of the  
>> nodes
>> again as soon as possible after the problem was reproduced.
> I did this on both  lustre-client/nfs-server and on MDS.
> The output of lctl dk on both is only:
> Debug log: 0 lines, 0 kept, 0 dropped, 0 bad.
Either Lustre never got any control at all and your problem is
unrelated to lustre and related to something else in your system
or the logging is broken somewhat.
The way to test it is to do ls -la /mnt/lustre (or whatever your lustre
mountpoint is) on the nfs server while the rest of the instructions
remains and do lctl dk again. If any output appears - the logging
works just fine, if not - doublecheck what''s in your
/proc/sys/lnet/debug

Can you export any other filesystems from that server?

Bye,
     Oleg

Ralf Utermann

2009-May-13 14:48 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:
[...]> Either Lustre never got any control at all and your problem is 
> unrelated to lustre and related to something else in your system or
> the logging is broken somewhat. The way to test it is to do ls -la
> /mnt/lustre (or whatever your lustre mountpoint is) on the nfs server
> while the rest of the instructions remains and do lctl dk again. If
> any output appears - the logging works just fine, if not -
> doublecheck what''s in your /proc/sys/lnet/debugNothing appears in the lctl dk output, if I do  ls -la on the Lustre mount
point on the Lustre client/NFS server.
For this system and the MDS:
# cat /proc/sys/lnet/debug
trace inode super ext2 malloc cache info ioctl neterror net warning buffs 
other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace
reada mmap config console quota sec

Do I need to enable any other things to see debug
output?> 
> Can you export any other filesystems from that server?I can export a local ext3 just fine. There is no problem
accessing this on an NFS client.

Bye, ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Oleg Drokin

2009-May-13 15:50 UTC

head link

[Lustre-discuss] NFS export problem

Hello!

On May 13, 2009, at 10:48 AM, Ralf Utermann wrote:
> Oleg Drokin wrote:
> [...]
>> Either Lustre never got any control at all and your problem is
>> unrelated to lustre and related to something else in your system or
>> the logging is broken somewhat. The way to test it is to do ls -la
>> /mnt/lustre (or whatever your lustre mountpoint is) on the nfs server
>> while the rest of the instructions remains and do lctl dk again. If
>> any output appears - the logging works just fine, if not -
>> doublecheck what''s in your /proc/sys/lnet/debug
> Nothing appears in the lctl dk output, if I do  ls -la on the Lustre  
> mount
> point on the Lustre client/NFS server.
Hm, that''s really strange.
I hope you did not built your Lustre with --disable-libcfs-* configure  
options?

Bye,
     Oleg

Ralf Utermann

2009-May-14 08:05 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:
[...]> 
> Hm, that''s really strange.
> I hope you did not built your Lustre with --disable-libcfs-* configure  
> options?how can I check this? The modules have been built with debian
utilities (m-a build ...)
On the systems I have a libcfs module:
# modinfo libcfs
filename:       /lib/modules/2.6.22.19-mylustre-2/kernel/net/lustre/libcfs.ko
license:        GPL
description:    Portals v3.1
author:         Peter J. Braam <braam at clusterfs.com>
depends:
vermagic:       2.6.22.19-mylustre-2 SMP mod_unload
parm:           libcfs_subsystem_debug:Lustre kernel debug subsystem mask (int)
parm:           libcfs_debug:Lustre kernel debug mask (int)
parm:           libcfs_debug_mb:Total debug buffer size. (int)
parm:           libcfs_printk:Lustre kernel debug console mask (uint)
parm:           libcfs_console_ratelimit:Lustre kernel debug console ratelimit
(0 to disable) (uint)
parm:           libcfs_console_max_delay:Lustre kernel debug console max delay
(jiffies) (ulong)
parm:           libcfs_console_min_delay:Lustre kernel debug console min delay
(jiffies) (ulong)
parm:           libcfs_console_backoff:Lustre kernel debug console backoff
factor (uint)
parm:           libcfs_panic_on_lbug:Lustre kernel panic on LBUG (uint)
parm:           debug_file_path:Path for dumping debug logs, set
''NONE'' to prevent log dumping (charp)


bye and thanks for your help!
-Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Oleg Drokin

2009-May-14 14:35 UTC

head link

[Lustre-discuss] NFS export problem

Hello!

On May 14, 2009, at 4:05 AM, Ralf Utermann wrote:>> Hm, that''s really strange.
>> I hope you did not built your Lustre with --disable-libcfs-*  
>> configure
>> options?
> how can I check this? The modules have been built with debian
> utilities (m-a build ...)
I suppose you can take a look at the build script used and figure out  
what configure parameters were used.
if you did the build yourself and the source tree is still there,  
config.status should list the options.

Bye,
     Oleg

Ralf Utermann

2009-May-15 09:18 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:> Hello!
> 
> On May 14, 2009, at 4:05 AM, Ralf Utermann wrote:
>>> Hm, that''s really strange.
>>> I hope you did not built your Lustre with --disable-libcfs-*
configure
>>> options?
>> how can I check this? The modules have been built with debian
>> utilities (m-a build ...)
> 
> I suppose you can take a look at the build script used and figure out
> what configure parameters were used.
> if you did the build yourself and the source tree is still there,
> config.status should list the options.
> you pointed to the right direction, Debian lustre packages 
normally do not enable libcfs-*, so I rebuild with enable-libcfs-*.

Bye, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Ralf Utermann

2009-May-15 11:39 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:
[...]> 
> What might be useful is if you can reproduce this quickly n as few set  
> of
> Lustre nodes as possible.
> remember your current /proc/sys/lnet/debug value.
> on lustre-client/nfs-server and on MDS echo -1 >/proc/sys/lnet/debug
> then do lctl dk >/dev/null (on those same two nodes).
> Reproduce the problem and do lctl dk >/tmp/somefile on both of the nodes
> again as soon as possible after the problem was reproduced.
> Create a new bugzilla bug and attach the files there.
Hi Oleg,

so now I am sure to have libcfs-* enabled modules (probably the Debian
packages also had it, it''s not disabled in the configure call)  and did
this test
again, however I still do not get any debug lines after accessing 
the NFS mounted Lustre.
But if I stop and start the Lustre client, I get some output from lctl dk, so
basically, this should work:
# lctl dk
[...]
00000400:00020000:3:1242386721.185659:0:3986:0:(router_proc.c:1020:lnet_proc_init())
couldn''t create proc entry sys/lnet/stats
10000000:02000400:3:1242386723.429628:0:4042:0:(mgc_request.c:910:mgc_import_event())
MGC192.168.2.191 at tcp: Reactivating import
00000080:02000400:7:1242386723.488602:0:4109:0:(llite_lib.c:1101:ll_fill_super())
Client alf1-client has started
Debug log: 8 lines, 8 kept, 0 dropped, 0 bad.

Thanks for your patience, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Oleg Drokin

2009-May-15 15:42 UTC

head link

[Lustre-discuss] NFS export problem

Hello!

On May 15, 2009, at 7:39 AM, Ralf Utermann wrote:> so now I am sure to have libcfs-* enabled modules (probably the Debian
> packages also had it, it''s not disabled in the configure call) 
and
> did this test
> again, however I still do not get any debug lines after accessing
> the NFS mounted Lustre.
> But if I stop and start the Lustre client, I get some output from  
> lctl dk, so
> basically, this should work:
> # lctl dk
> [...]
> 00000400:00020000:3:1242386721.185659:0:3986:0:(router_proc.c: 
> 1020:lnet_proc_init()) couldn''t create proc entry sys/lnet/stats
> 10000000:02000400:3:1242386723.429628:0:4042:0:(mgc_request.c: 
> 910:mgc_import_event()) MGC192.168.2.191 at tcp: Reactivating import
> 00000080:02000400:7:1242386723.488602:0:4109:0:(llite_lib.c: 
> 1101:ll_fill_super()) Client alf1-client has started
> Debug log: 8 lines, 8 kept, 0 dropped, 0 bad.
Hm.
What''s in your /proc/sys/lnet/subsystem_debug, I wonder, if the list  
of subsystems is small there, try echo -1 >/proc/sys/lnet/ 
subsystem_debug

Bye,
     Oleg

Ralf Utermann

2009-May-18 13:50 UTC

head link

[Lustre-discuss] NFS export problem

Oleg Drokin wrote:
[...]> Hm.
> What''s in your /proc/sys/lnet/subsystem_debug, I wonder, if the
list
> of subsystems is small there, try echo -1 >/proc/sys/lnet/ 
> subsystem_debugHi Oleg,

now I get something in the logs! The -1 on subsystem_debug fills
up the logs now ...
I opened bug #19559 and attached log files from Lustre-client/NFS-server
and Lustre-MDS while running my ''ls -l'' test on the NFS
client.

Thanks for your help,
Bye, Ralf
-- 
        Ralf Utermann
_____________________________________________________________________
        Universit?t Augsburg, Institut f?r Physik   --   EDV-Betreuer
        Universit?tsstr.1             
        D-86135 Augsburg                     Phone:  +49-821-598-3231
        SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE         Fax: -3411

Lustre discuss - Apr 2009 - NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem

[Lustre-discuss] NFS export problem