Ms. Megan Larko
2008-Sep-08 17:17 UTC
[Lustre-discuss] temporarily refusing client connection
Greetings, I was having difficulty using a lustre disk Friday of last week. I was repeatedly getting errors on non-root users "Identifier Removed". I found old msg on lustre discuss that stated that the lnet cannot hold all of the group permission info so group permission either has to exist on the MGS/MDT or run the following command (Thanks Aaron!) to permit lustre to continue without having the group permissions locally on MGS/MDT:>>tunefs.lustre --param mdt.group_upcall=NONE /dev/sdfchecking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x401 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE Permanent disk data: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x441 (MDT update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE mdt.group_upcall=NONE Writing CONFIGS/mountdata This command ran without errors on my MGS/MDT. I had unmounted the disk on the client when I did the above command on the MGS/MDT. I now find that I cannot remount the lustre disk on the client. The errors are: [root at crew01 ~]# mount -v -t lustre ic-mds1 at o2ib:/crew8 /crew8 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = ic-mds1 at o2ib:/crew8 arg[5] = /crew8 source = ic-mds1 at o2ib:/crew8 (172.18.0.10 at o2ib:/crew8), target = /crew8 options = rw mounting device 172.18.0.10 at o2ib:/crew8 at /crew8, flags=0 options=device=172.18.0.10 at o2ib:/crew8 ...and it hangs here. The MGS/MDT reads in /var/log/messages: Sep 8 13:06:45 mds1 kernel: Lustre: crew8-MDT0000: temporarily refusing client connection from 172.18.0.11 at o2ib Sep 8 13:06:45 mds1 kernel: LustreError: 3355:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-11) req at ffff81005e27c400 x95936659/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -11/0 There are no errors on the OSS. The lctl pings on the IB connection return without errors. Why am I not able to mount the lustre disk on the client? Why is the "connection temporarily refused"? Suggestions appreciated. megan
Ms. Megan Larko
2008-Sep-08 19:21 UTC
[Lustre-discuss] temporarily refusing client connection
More info:
I was looking at the output from my tunefs.lustre and I noticed that
the parameter "mds.group_upcall=/usr/sbin/l_getgroups" was still
listed first, prior to my attempted change of
"mds>group_upcall=NONE".
So I re-ran the tunefs.lustre command on my unmounted MDT device and
re-specified the entire bit of info with the single exception of the
"--reformat" option (although I could do that too and re-populate if
necessary.
New version of my tunefs.lustre:
[root at mds1 ~]# tunefs.lustre --erase-params --writeconf
--mgsnode=ic-mds1 at o2ib --fsname=crew8 --mdt --param
mdt.group_upcall=NONE /dev/sdf
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: crew8-MDT0000
Index: 0
Lustre FS: crew8
Mount type: ldiskfs
Flags: 0x541
(MDT update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mdt.group_upcall=NONE
Permanent disk data:
Target: crew8-MDT0000
Index: 0
Lustre FS: crew8
Mount type: ldiskfs
Flags: 0x541
(MDT update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=NONE
Writing CONFIGS/mountdata
I now have only the param mdg.group_upcall=NONE
The MGS/MDS /var/log/messages file showed this new setting, but is
still hanging on the client machine trying to mount this disk:
Sep 8 15:12:21 mds1 kernel: Lustre: Enabling user_xattr
Sep 8 15:12:21 mds1 kernel: Lustre:
25233:0:(mds_fs.c:446:mds_init_server_data()) RECOVERY: service
crew8-MDT0000, 1 recoverable clients, last_transno 474506
Sep 8 15:12:21 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev
(crew8-MDT0000/954e98ed-1063-9476-4689-9a2c9b70577f), but will be in
recovery until 1 client reconnect, or if no clients reconnect for
41:40; during that time new clients will not be allowed to connect.
Recovery progress can be monitored by watching
/proc/fs/lustre/mds/crew8-MDT0000/recovery_status.
Sep 8 15:12:21 mds1 kernel: Lustre:
25233:0:(lproc_mds.c:260:lprocfs_wr_group_upcall()) crew8-MDT0000:
group upcall set to NONE
Sep 8 15:12:21 mds1 kernel: Lustre: crew8-MDT0000.mdt: set parameter
group_upcall=NONE
Sep 8 15:12:21 mds1 kernel: Lustre: Server crew8-MDT0000 on device
/dev/sdf has started
Sep 8 15:13:27 mds1 kernel: Lustre: crew8-MDT0000: temporarily
refusing client connection from 172.18.0.11 at o2ib
Sep 8 15:13:27 mds1 kernel: LustreError:
3351:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
(-11) req at ffff81002b275400 x95938226/t0 o38-><?>@<?>:-1 lens
240/0
ref 0 fl Interpret:/0/0 rc -11/0
I am hoping it is just in recovery for a few long moments. My system
recoveries are slow by design.
megan