Hi, I have a new MGS/MDS that I would like to start. It is another of the same Cent0S 5 kernel 2.6.18-53.1.13.el5 lustre-1.6.4.3smp as my other boxes. Initially I had an IP number that was used elsewhere in our group. I changed it using the tunefs.lustre command below for the new MDT. [root at mds2 ~]# tunefs.lustre --erase-params --writeconf --mgsnode=ic-mds2 at o2ib /dev/sdd1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: crew8-MDTffff Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x71 (MDT needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.9 at o2ib Permanent disk data: Target: crew8-MDTffff Index: unassigned Lustre FS: crew8 Mount type: ldiskfs Flags: 0x171 (MDT needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.18.0.16 at o2ib Writing CONFIGS/mountdata Next I try to mount this new MDT onto the system.... [root at mds2 ~]# mount -t lustre /dev/sdd1 /srv/lustre/mds/crew8-MDT0000 mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT0000 failed: Input/output error Is the MGS running? Ummm--- yeah, I thought the MGS is running. [root at mds2 ~]# tail /var/log/messages Sep 4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Sep 4 16:28:13 mds2 kernel: LustreError: 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1220560088, 5s ago) req at ffff81042f109000 x3/t0 o250->MGS at MGC172.18.0.16@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-5) Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for crew8-MDTffff: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error with the MGS. Is the MGS running? Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -5 Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDTffff Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDTffff not registered Sep 4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDTffff complete Sep 4 16:28:13 mds2 kernel: LustreError: 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) The o2ib network is up. It is ping-able via bash and lctl. I can get to it from itself and from other computers on this local subnet. [root at mds2 ~]# lctl lctl > ping 172.18.0.16 at o2ib 12345-0 at lo 12345-172.18.0.16 at o2ib lctl > ping 172.18.0.15 at o2ib 12345-0 at lo 12345-172.18.0.15 at o2ib lctl > quit On this net, there are no firewalls as the computers are using only non-routable IP numbers. So there is not a firewall issue of which I am aware... [root at mds2 ~]# iptables -L -bash: iptables: command not found The only oddity I have found is that the modules in my working MGS/MDS are used more than the modules in my new MGS/MDT. Correctly functioning MGS/MDT: [root at mds1 ~]# lsmod | grep mgs mgs 181512 1 mgc 86744 2 mgs ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass libcfs 183128 14 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [root at mds1 ~]# lsmod | grep osc osc 172136 11 ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc obdclass 542200 13 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc lvfs 84712 12 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass libcfs 183128 14 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [root at mds1 ~]# lsmod | grep lnet lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass libcfs 183128 14 osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs Failing MGS/MDT: [root at mds2 ~]# lsmod | grep mgs mgs 181512 0 mgc 86744 1 mgs ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc obdclass 542200 10 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc lvfs 84712 12 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass libcfs 183128 14 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [root at mds2 ~]# lsmod | grep osc osc 172136 0 ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc obdclass 542200 10 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc lvfs 84712 12 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass libcfs 183128 14 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs [root at mds2 ~]# lsmod | grep lnet lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass libcfs 183128 14 osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs The failing MGS/MDT has a 0 by mgs and not a 1 like the working MGS/MDT. The osc module has 11 by it in the working version and 0 by it in the non-working version. The lnet is the same as are most of the other module comparisons. Am I missing something at the module mgs/mgc/osc level? Or are those modules just indicating that they are actually in-use on my good MGS/MDT? Even with IB cabling aside (I''m working on the MGS/MDS itself), why can I not mount a new MDT? Why do I see the message: Is the MGS running? I am actually on the MGS/MDS itself. Also I receive the same result if I attempt to mount an OST on an OSS which is pointing to this new MGS/MDT. The OST won''t even mount locally on the OSS without successful communication with its associated MGS/MDT. Any and all suggestions gratefully appreciated. megan
Does the new MDS actually have an MGS running? FYI- you only need one mgs per lustre set up. In the commands you issued it doesn''t look like you actually set up an MGS on the host "mds2". Can you run an "lctl dl" on mds2 and send the output? On Sep 4, 2008, at 4:54 PM, Ms. Megan Larko wrote:> Hi, > > I have a new MGS/MDS that I would like to start. It is another of > the same Cent0S 5 kernel 2.6.18-53.1.13.el5 > lustre-1.6.4.3smp as my other boxes. Initially I had an IP number > that was used elsewhere in our group. I > changed it using the tunefs.lustre command below for the new MDT. > > [root at mds2 ~]# tunefs.lustre --erase-params --writeconf > --mgsnode=ic-mds2 at o2ib /dev/sdd1 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: crew8-MDTffff > Index: unassigned > Lustre FS: crew8 > Mount type: ldiskfs > Flags: 0x71 > (MDT needs_index first_time update ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: mgsnode=172.18.0.9 at o2ib > > > Permanent disk data: > Target: crew8-MDTffff > Index: unassigned > Lustre FS: crew8 > Mount type: ldiskfs > Flags: 0x171 > (MDT needs_index first_time update writeconf ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: mgsnode=172.18.0.16 at o2ib > > Writing CONFIGS/mountdata > > Next I try to mount this new MDT onto the system.... > [root at mds2 ~]# mount -t lustre /dev/sdd1 /srv/lustre/mds/crew8-MDT0000 > mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT0000 failed: > Input/output error > Is the MGS running? > > Ummm--- yeah, I thought the MGS is running. > > [root at mds2 ~]# tail /var/log/messages > Sep 4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Sep 4 16:28:13 mds2 kernel: LustreError: > 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1220560088, 5s ago) req at ffff81042f109000 x3/t0 > o250->MGS at MGC172.18.0.16@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc > 0/-22 > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:954:server_register_target()) registration with > the MGS failed (-5) > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration > failed for crew8-MDTffff: -5 > Sep 4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error > with the MGS. Is the MGS running? > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: > -5 > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDTffff > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDTffff not > registered > Sep 4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDTffff > complete > Sep 4 16:28:13 mds2 kernel: LustreError: > 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) > > The o2ib network is up. It is ping-able via bash and lctl. I can > get to it from itself and from other computers on > this local subnet. > > [root at mds2 ~]# lctl > lctl > ping 172.18.0.16 at o2ib > 12345-0 at lo > 12345-172.18.0.16 at o2ib > lctl > ping 172.18.0.15 at o2ib > 12345-0 at lo > 12345-172.18.0.15 at o2ib > lctl > quit > > On this net, there are no firewalls as the computers are using only > non-routable IP numbers. So there is not a > firewall issue of which I am aware... > [root at mds2 ~]# iptables -L > -bash: iptables: command not found > > The only oddity I have found is that the modules in my working MGS/MDS > are used more than the modules in my > new MGS/MDT. > > Correctly functioning MGS/MDT: > [root at mds1 ~]# lsmod | grep mgs > mgs 181512 1 > mgc 86744 2 mgs > ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc > obdclass 542200 13 > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc > lvfs 84712 12 > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass > libcfs 183128 14 > osc > ,mds > ,fsfilt_ldiskfs > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > [root at mds1 ~]# lsmod | grep osc > osc 172136 11 > ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc > obdclass 542200 13 > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc > lvfs 84712 12 > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass > libcfs 183128 14 > osc > ,mds > ,fsfilt_ldiskfs > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > [root at mds1 ~]# lsmod | grep lnet > lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass > libcfs 183128 14 > osc > ,mds > ,fsfilt_ldiskfs > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > Failing MGS/MDT: > [root at mds2 ~]# lsmod | grep mgs > mgs 181512 0 > mgc 86744 1 mgs > ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc > obdclass 542200 10 > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc > lvfs 84712 12 > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass > libcfs 183128 14 > osc > ,lustre > ,lov > ,mdc > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > [root at mds2 ~]# lsmod | grep osc > osc 172136 0 > ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc > obdclass 542200 10 > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc > lvfs 84712 12 > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass > libcfs 183128 14 > osc > ,lustre > ,lov > ,mdc > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > [root at mds2 ~]# lsmod | grep lnet > lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass > libcfs 183128 14 > osc > ,lustre > ,lov > ,mdc > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > The failing MGS/MDT has a 0 by mgs and not a 1 like the working MGS/ > MDT. > The osc module has 11 by it in the working version and 0 by it in the > non-working version. > The lnet is the same as are most of the other module comparisons. Am > I missing something at the module mgs/mgc/osc > level? Or are those modules just indicating that they are actually > in-use on my good MGS/MDT? > > Even with IB cabling aside (I''m working on the MGS/MDS itself), why > can I not mount a new MDT? Why do I see the message: > Is the MGS running? I am actually on the MGS/MDS itself. > > Also I receive the same result if I attempt to mount an OST on an OSS > which is pointing to this new MGS/MDT. The OST won''t > even mount locally on the OSS without successful communication with > its associated MGS/MDT. > > Any and all suggestions gratefully appreciated. > > megan > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Sep 05, 2008 11:11 -0400, Aaron Knister wrote:> Does the new MDS actually have an MGS running? FYI- you only need one > mgs per lustre set up. In the commands you issued it doesn''t look like > you actually set up an MGS on the host "mds2". Can you run an "lctl > dl" on mds2 and send the output?There are tradeoffs between having a single MGS for multiple filesystems, and having one MGS per filesystem (assuming different MDS nodes). In general, there isn''t much benefit to sharing an MGS between multiple MDS nodes, and the drawback is that it is a single point of failure, so you may as well have one per MDS.> On Sep 4, 2008, at 4:54 PM, Ms. Megan Larko wrote: > > > Hi, > > > > I have a new MGS/MDS that I would like to start. It is another of > > the same Cent0S 5 kernel 2.6.18-53.1.13.el5 > > lustre-1.6.4.3smp as my other boxes. Initially I had an IP number > > that was used elsewhere in our group. I > > changed it using the tunefs.lustre command below for the new MDT. > > > > [root at mds2 ~]# tunefs.lustre --erase-params --writeconf > > --mgsnode=ic-mds2 at o2ib /dev/sdd1 > > checking for existing Lustre data: found CONFIGS/mountdata > > Reading CONFIGS/mountdata > > > > Read previous values: > > Target: crew8-MDTffff > > Index: unassigned > > Lustre FS: crew8 > > Mount type: ldiskfs > > Flags: 0x71 > > (MDT needs_index first_time update ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: mgsnode=172.18.0.9 at o2ib > > > > > > Permanent disk data: > > Target: crew8-MDTffff > > Index: unassigned > > Lustre FS: crew8 > > Mount type: ldiskfs > > Flags: 0x171 > > (MDT needs_index first_time update writeconf ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: mgsnode=172.18.0.16 at o2ib > > > > Writing CONFIGS/mountdata > > > > Next I try to mount this new MDT onto the system.... > > [root at mds2 ~]# mount -t lustre /dev/sdd1 /srv/lustre/mds/crew8-MDT0000 > > mount.lustre: mount /dev/sdd1 at /srv/lustre/mds/crew8-MDT0000 failed: > > Input/output error > > Is the MGS running? > > > > Ummm--- yeah, I thought the MGS is running. > > > > [root at mds2 ~]# tail /var/log/messages > > Sep 4 16:28:08 mds2 kernel: LDISKFS-fs: mounted filesystem with > > ordered data mode. > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3526:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at > > 1220560088, 5s ago) req at ffff81042f109000 x3/t0 > > o250->MGS at MGC172.18.0.16@o2ib_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc > > 0/-22 > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:954:server_register_target()) registration with > > the MGS failed (-5) > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:1054:server_start_targets()) Required registration > > failed for crew8-MDTffff: -5 > > Sep 4 16:28:13 mds2 kernel: LustreError: 15f-b: Communication error > > with the MGS. Is the MGS running? > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: > > -5 > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:1368:server_put_super()) no obd crew8-MDTffff > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:119:server_deregister_mount()) crew8-MDTffff not > > registered > > Sep 4 16:28:13 mds2 kernel: Lustre: server umount crew8-MDTffff > > complete > > Sep 4 16:28:13 mds2 kernel: LustreError: > > 3797:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) > > > > The o2ib network is up. It is ping-able via bash and lctl. I can > > get to it from itself and from other computers on > > this local subnet. > > > > [root at mds2 ~]# lctl > > lctl > ping 172.18.0.16 at o2ib > > 12345-0 at lo > > 12345-172.18.0.16 at o2ib > > lctl > ping 172.18.0.15 at o2ib > > 12345-0 at lo > > 12345-172.18.0.15 at o2ib > > lctl > quit > > > > On this net, there are no firewalls as the computers are using only > > non-routable IP numbers. So there is not a > > firewall issue of which I am aware... > > [root at mds2 ~]# iptables -L > > -bash: iptables: command not found > > > > The only oddity I have found is that the modules in my working MGS/MDS > > are used more than the modules in my > > new MGS/MDT. > > > > Correctly functioning MGS/MDT: > > [root at mds1 ~]# lsmod | grep mgs > > mgs 181512 1 > > mgc 86744 2 mgs > > ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc > > obdclass 542200 13 > > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc > > lvfs 84712 12 > > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,mds > > ,fsfilt_ldiskfs > > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > [root at mds1 ~]# lsmod | grep osc > > osc 172136 11 > > ptlrpc 659512 8 osc,mds,mgs,mgc,lustre,lov,lquota,mdc > > obdclass 542200 13 > > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc > > lvfs 84712 12 > > osc,mds,fsfilt_ldiskfs,mgs,mgc,lustre,lov,lquota,mdc,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,mds > > ,fsfilt_ldiskfs > > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > [root at mds1 ~]# lsmod | grep lnet > > lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,mds > > ,fsfilt_ldiskfs > > ,mgs,mgc,lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > > > Failing MGS/MDT: > > [root at mds2 ~]# lsmod | grep mgs > > mgs 181512 0 > > mgc 86744 1 mgs > > ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc > > obdclass 542200 10 > > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc > > lvfs 84712 12 > > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,lustre > > ,lov > > ,mdc > > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > [root at mds2 ~]# lsmod | grep osc > > osc 172136 0 > > ptlrpc 659512 8 osc,lustre,lov,mdc,mds,lquota,mgs,mgc > > obdclass 542200 10 > > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc > > lvfs 84712 12 > > osc,lustre,lov,mdc,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,lustre > > ,lov > > ,mdc > > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > [root at mds2 ~]# lsmod | grep lnet > > lnet 255656 4 lustre,ko2iblnd,ptlrpc,obdclass > > libcfs 183128 14 > > osc > > ,lustre > > ,lov > > ,mdc > > ,fsfilt_ldiskfs,mds,lquota,mgs,mgc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs > > > > The failing MGS/MDT has a 0 by mgs and not a 1 like the working MGS/ > > MDT. > > The osc module has 11 by it in the working version and 0 by it in the > > non-working version. > > The lnet is the same as are most of the other module comparisons. Am > > I missing something at the module mgs/mgc/osc > > level? Or are those modules just indicating that they are actually > > in-use on my good MGS/MDT? > > > > Even with IB cabling aside (I''m working on the MGS/MDS itself), why > > can I not mount a new MDT? Why do I see the message: > > Is the MGS running? I am actually on the MGS/MDS itself. > > > > Also I receive the same result if I attempt to mount an OST on an OSS > > which is pointing to this new MGS/MDT. The OST won''t > > even mount locally on the OSS without successful communication with > > its associated MGS/MDT. > > > > Any and all suggestions gratefully appreciated. > > > > megan > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.