thr3ads.net - Lustre discuss - [Lustre-discuss] temporarily refusing client connection [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Ms. Megan Larko

2008-Sep-08 17:17 UTC

[Lustre-discuss] temporarily refusing client connection

Greetings,

I was having difficulty using a lustre disk Friday of last week.   I
was repeatedly getting errors on non-root users "Identifier Removed".
I found old msg on lustre discuss that stated that the lnet cannot
hold all of the group permission info so group permission either has
to exist on the MGS/MDT or run the following command (Thanks Aaron!)
to permit lustre to continue without having the group permissions
locally on MGS/MDT:
>>tunefs.lustre --param mdt.group_upcall=NONE /dev/sdfchecking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     crew8-MDT0000
Index:      0
Lustre FS:  crew8
Mount type: ldiskfs
Flags:      0x401
              (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.18.0.10 at o2ib
mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE


   Permanent disk data:
Target:     crew8-MDT0000
Index:      0
Lustre FS:  crew8
Mount type: ldiskfs
Flags:      0x441
              (MDT update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.18.0.10 at o2ib
mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE
mdt.group_upcall=NONE

Writing CONFIGS/mountdata

This command ran without errors on my MGS/MDT.   I had unmounted the
disk on the client when I did the above command on the MGS/MDT.

I now find that I cannot remount the lustre disk on the client.   The
errors are:
[root at crew01 ~]# mount -v -t lustre ic-mds1 at o2ib:/crew8 /crew8
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = ic-mds1 at o2ib:/crew8
arg[5] = /crew8
source = ic-mds1 at o2ib:/crew8 (172.18.0.10 at o2ib:/crew8), target = /crew8
options = rw
mounting device 172.18.0.10 at o2ib:/crew8 at /crew8, flags=0
options=device=172.18.0.10 at o2ib:/crew8

...and it hangs here.

The MGS/MDT reads in /var/log/messages:
Sep  8 13:06:45 mds1 kernel: Lustre: crew8-MDT0000: temporarily
refusing client connection from 172.18.0.11 at o2ib
Sep  8 13:06:45 mds1 kernel: LustreError:
3355:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
(-11)  req at ffff81005e27c400 x95936659/t0 o38-><?>@<?>:-1 lens
240/0
ref 0 fl Interpret:/0/0 rc -11/0

There are no errors on the OSS.  The lctl pings on the IB connection
return without errors.

Why am I not able to mount the lustre disk on the client?   Why is the
"connection temporarily refused"?

Suggestions appreciated.

megan

Ms. Megan Larko

2008-Sep-08 19:21 UTC

head link

[Lustre-discuss] temporarily refusing client connection

More info:

I was looking at the output from my tunefs.lustre and I noticed that
the  parameter "mds.group_upcall=/usr/sbin/l_getgroups" was still
listed first, prior to my attempted change of
"mds>group_upcall=NONE".
  So I re-ran the tunefs.lustre command on my unmounted MDT device and
re-specified the entire bit of info with the single exception of the
"--reformat" option (although I could do that too and re-populate if
necessary.

New version of my tunefs.lustre:
[root at mds1 ~]# tunefs.lustre --erase-params  --writeconf
--mgsnode=ic-mds1 at o2ib --fsname=crew8 --mdt --param
mdt.group_upcall=NONE /dev/sdf
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     crew8-MDT0000
Index:      0
Lustre FS:  crew8
Mount type: ldiskfs
Flags:      0x541
              (MDT update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mdt.group_upcall=NONE


   Permanent disk data:
Target:     crew8-MDT0000
Index:      0
Lustre FS:  crew8
Mount type: ldiskfs
Flags:      0x541
              (MDT update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=NONE

Writing CONFIGS/mountdata

I now have only the param mdg.group_upcall=NONE

The MGS/MDS /var/log/messages file showed this new setting, but is
still hanging on the client machine trying to mount this disk:
Sep  8 15:12:21 mds1 kernel: Lustre: Enabling user_xattr
Sep  8 15:12:21 mds1 kernel: Lustre:
25233:0:(mds_fs.c:446:mds_init_server_data()) RECOVERY: service
crew8-MDT0000, 1 recoverable clients, last_transno 474506
Sep  8 15:12:21 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev
(crew8-MDT0000/954e98ed-1063-9476-4689-9a2c9b70577f), but will be in
recovery until 1 client reconnect, or if no clients reconnect for
41:40; during that time new clients will not be allowed to connect.
Recovery progress can be monitored by watching
/proc/fs/lustre/mds/crew8-MDT0000/recovery_status.
Sep  8 15:12:21 mds1 kernel: Lustre:
25233:0:(lproc_mds.c:260:lprocfs_wr_group_upcall()) crew8-MDT0000:
group upcall set to NONE
Sep  8 15:12:21 mds1 kernel: Lustre: crew8-MDT0000.mdt: set parameter
group_upcall=NONE
Sep  8 15:12:21 mds1 kernel: Lustre: Server crew8-MDT0000 on device
/dev/sdf has started
Sep  8 15:13:27 mds1 kernel: Lustre: crew8-MDT0000: temporarily
refusing client connection from 172.18.0.11 at o2ib
Sep  8 15:13:27 mds1 kernel: LustreError:
3351:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
(-11)  req at ffff81002b275400 x95938226/t0 o38-><?>@<?>:-1 lens
240/0
ref 0 fl Interpret:/0/0 rc -11/0

I am hoping it is just in recovery for a few long moments.  My system
recoveries are slow by design.

megan

Lustre discuss - Sep 2008 - temporarily refusing client connection

[Lustre-discuss] temporarily refusing client connection

[Lustre-discuss] temporarily refusing client connection