Ms. Megan Larko
2008-Sep-08 17:17 UTC
[Lustre-discuss] temporarily refusing client connection
Greetings, I was having difficulty using a lustre disk Friday of last week. I was repeatedly getting errors on non-root users "Identifier Removed". I found old msg on lustre discuss that stated that the lnet cannot hold all of the group permission info so group permission either has to exist on the MGS/MDT or run the following command (Thanks Aaron!) to permit lustre to continue without having the group permissions locally on MGS/MDT:>>tunefs.lustre --param mdt.group_upcall=NONE /dev/sdfchecking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x401 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE Permanent disk data: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x441 (MDT update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups mds.group_upcall=NONE mdt.group_upcall=NONE Writing CONFIGS/mountdata This command ran without errors on my MGS/MDT. I had unmounted the disk on the client when I did the above command on the MGS/MDT. I now find that I cannot remount the lustre disk on the client. The errors are: [root at crew01 ~]# mount -v -t lustre ic-mds1 at o2ib:/crew8 /crew8 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = ic-mds1 at o2ib:/crew8 arg[5] = /crew8 source = ic-mds1 at o2ib:/crew8 (172.18.0.10 at o2ib:/crew8), target = /crew8 options = rw mounting device 172.18.0.10 at o2ib:/crew8 at /crew8, flags=0 options=device=172.18.0.10 at o2ib:/crew8 ...and it hangs here. The MGS/MDT reads in /var/log/messages: Sep 8 13:06:45 mds1 kernel: Lustre: crew8-MDT0000: temporarily refusing client connection from 172.18.0.11 at o2ib Sep 8 13:06:45 mds1 kernel: LustreError: 3355:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-11) req at ffff81005e27c400 x95936659/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -11/0 There are no errors on the OSS. The lctl pings on the IB connection return without errors. Why am I not able to mount the lustre disk on the client? Why is the "connection temporarily refused"? Suggestions appreciated. megan
Ms. Megan Larko
2008-Sep-08 19:21 UTC
[Lustre-discuss] temporarily refusing client connection
More info: I was looking at the output from my tunefs.lustre and I noticed that the parameter "mds.group_upcall=/usr/sbin/l_getgroups" was still listed first, prior to my attempted change of "mds>group_upcall=NONE". So I re-ran the tunefs.lustre command on my unmounted MDT device and re-specified the entire bit of info with the single exception of the "--reformat" option (although I could do that too and re-populate if necessary. New version of my tunefs.lustre: [root at mds1 ~]# tunefs.lustre --erase-params --writeconf --mgsnode=ic-mds1 at o2ib --fsname=crew8 --mdt --param mdt.group_upcall=NONE /dev/sdf checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x541 (MDT update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mdt.group_upcall=NONE Permanent disk data: Target: crew8-MDT0000 Index: 0 Lustre FS: crew8 Mount type: ldiskfs Flags: 0x541 (MDT update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.10 at o2ib mdt.group_upcall=NONE Writing CONFIGS/mountdata I now have only the param mdg.group_upcall=NONE The MGS/MDS /var/log/messages file showed this new setting, but is still hanging on the client machine trying to mount this disk: Sep 8 15:12:21 mds1 kernel: Lustre: Enabling user_xattr Sep 8 15:12:21 mds1 kernel: Lustre: 25233:0:(mds_fs.c:446:mds_init_server_data()) RECOVERY: service crew8-MDT0000, 1 recoverable clients, last_transno 474506 Sep 8 15:12:21 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev (crew8-MDT0000/954e98ed-1063-9476-4689-9a2c9b70577f), but will be in recovery until 1 client reconnect, or if no clients reconnect for 41:40; during that time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/crew8-MDT0000/recovery_status. Sep 8 15:12:21 mds1 kernel: Lustre: 25233:0:(lproc_mds.c:260:lprocfs_wr_group_upcall()) crew8-MDT0000: group upcall set to NONE Sep 8 15:12:21 mds1 kernel: Lustre: crew8-MDT0000.mdt: set parameter group_upcall=NONE Sep 8 15:12:21 mds1 kernel: Lustre: Server crew8-MDT0000 on device /dev/sdf has started Sep 8 15:13:27 mds1 kernel: Lustre: crew8-MDT0000: temporarily refusing client connection from 172.18.0.11 at o2ib Sep 8 15:13:27 mds1 kernel: LustreError: 3351:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-11) req at ffff81002b275400 x95938226/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -11/0 I am hoping it is just in recovery for a few long moments. My system recoveries are slow by design. megan