Michael D. Seymour
2009-Apr-24 15:53 UTC
[Lustre-discuss] Problems adding new OSS to existing Lustre filesystem -- Refusing connection, No matching NI
Hi, We are having a problem adding a new OSS (roc06, 10.5.203.6) to an existing Lustre file system (raid-cita) on the 10.5 network. selinux and iptables are disabled. It is a multi-homed OSS on the 10.4 and 10.5 network. When mounted, clients are trying to connect to the Lustre file system via the 10.4 network, even though things are set up to use the 10.5 network. The clients do not see the new space on the file system either. It shows 23T as opposed to the > 27T it should show. lfs quota hangs as well. We did suffer some problems with the MDS filesystem, which was fcsked, the kernel downgraded to 1.6.6 and remounted. Many messages like this exist in /var/log/messages on the new OSS: Apr 24 10:01:07 roc06 kernel: LustreError: 120-3: Refusing connection from 10.4.1.52 for 10.4.203.6 at tcp: No matching NI On the multi-homed client 10.4.1.52: [root at tpb52-chroot ~]# uname -a; cat /etc/redhat-release Linux tpb52 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9 19:56:55 MST 2009 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5 (Final) [root at tpb52-chroot ~]# df -h /mnt/raid-cita/ Filesystem Size Used Avail Use% Mounted on 10.5.203.250 at tcp:/roc 23T 11T 12T 47% /mnt/raid-cita [root at tpb52-chroot ~]# lctl list_nids 10.5.2.12 at tcp [root at tpb52-chroot ~]# grep lnet /etc/modprobe.conf options lnet networks=tcp0(eth1) [root at tpb52-chroot ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:15:C5:EC:FA:8C inet addr:10.5.2.12 Bcast:10.5.255.255 Mask:255.255.0.0 On the OSS roc06: [root at roc06 lustre]# uname -a; cat /etc/redhat-release Linux roc06 2.6.18-92.1.17.el5_lustre.1.6.7.1smp #1 SMP Mon Apr 13 16:13:00 MDT 2009 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5.3 (Final) [root at roc06 lustre]# lctl list_nids 10.5.203.6 at tcp [root at roc06 ~]# grep lnet /etc/modprobe.conf options lnet networks=tcp0(eth1) [root at roc06 ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:22:19:05:90:F2 inet addr:10.5.203.6 Bcast:10.5.255.255 Mask:255.255.0.0 The OSS was formatted with the following: mkfs.lustre --verbose --reformat --fsname=roc --ost --mgsnode=10.5.203.250 at tcp0 --mkfsoptions="-m 0 -E stride=32" /dev/md2 I believe this was done before "options lnet networks=tcp0(eth1)" was included in modprobe.conf. [root at roc06 ~]# tunefs.lustre --print /dev/md2 Permanent disk data: Target: roc-OST0005 Index: 5 Lustre FS: roc Mount type: ldiskfs Flags: 0x402 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.5.203.250 at tcp ost.quota_type=u For comparison, the OSS roc05: [root at roc05 ~]# uname -a; cat /etc/redhat-release Linux roc05 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9 19:56:55 MST 2009 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5 (Final) [root at roc05 ~]# lctl list_nids 10.5.203.5 at tcp [root at roc05 ~]# grep lnet /etc/modprobe.conf options lnet networks=tcp0(eth1) [root at roc05 ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:1C:23:D5:F5:4F inet addr:10.5.203.5 Bcast:10.5.255.255 Mask:255.255.0.0 [root at roc05 ~]# tunefs.lustre --print /dev/md2 Permanent disk data: Target: roc-OST0004 Index: 4 Lustre FS: roc Mount type: ldiskfs Flags: 0x402 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.5.203.250 at tcp ost.quota_type=u On the MDS (rocpile): [root at rocpile ~]# uname -a; cat /etc/redhat-release Linux rocpile 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5 (Final) [root at rocpile ~]# lctl list_nids 10.5.203.250 at tcp [root at rocpile ~]# grep lnet /etc/modprobe.conf options lnet networks=tcp(eth1) [root at rocpile ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:15:C5:EC:F6:88 inet addr:10.5.203.250 Bcast:10.5.255.255 Mask:255.255.0.0 Any suggestions? Thanks, Mike -- Michael D. Seymour Phone: 416-978-1776 Scientific Computing Support Fax: 416-978-3921 Canadian Institute for Theoretical Astrophysics, University of Toronto