Canon, Richard Shane
2008-Mar-07 17:17 UTC
[Lustre-discuss] Multihomed question: want Lustre over IB andEthernet
Chris, Perhaps you need to perform some write_conf like command. I''m not sure if this is needed in 1.6 or not. Shane ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: lustre-discuss <lustre-discuss at lists.lustre.org> Sent: Fri Mar 07 12:03:17 2008 Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over IB andEthernet On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott <prescott at hpc.ufl.edu> wrote:> > I think your client modprobe.conf lnet option > should be this: > > > options lnet networks=o2ib(ib0) > > (not ''o2ib0'').It still seems to want the TCP connection: Lustre: Added LNI 36.122.255.1 at o2ib [8/64] Lustre: Lustre Client File System; info at clusterfs.com LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found for 36.121.255.201 at tcp LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find peer 36.121.255.201 at tcp! LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can''t add initial connection LustreError: 11043:0:(obd_config.c:325:class_setup()) setup ddnlfs-MDT0000-mdc-0000010430934400 failed (-2) LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler()) Err -2 on cfg command: LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) NULL connection Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID 2:36.121.255.201 at tcp LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log ''ddnlfs-client'' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to process log: -2 LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 not setup Lustre: client 0000010430934400 umount complete LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-2)> > Another thing to try, if that doesn''t work lctl > ping your MDS/MGS/OSS nids, like so: > > lctl ping 36.122.255.201 at o2ibBefore and after the change it looks the same: # lctl ping 36.122.255.201 at o2ib 12345-0 at lo 12345-36.122.255.201 at o2ib 12345-36.121.255.201 at tcp If I change my modprobe.conf to look as on the MDS/OSS''s: options lnet networks=o2ib0(ib0),tcp0(eth0) Then, mount just specifying o2ib: # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs It works, but, both ko2iblnd and ksocklnd are loaded. The dmesg output is: Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.2 Build Version: 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL-Lustre-1.6.4.2 Lustre: Added LNI 36.122.255.1 at o2ib [8/64] Lustre: Added LNI 36.121.255.1 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; info at clusterfs.com Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter stripesize=2M Lustre: Client ddnlfs-client has started Can I be certain it''ll use IB for LFS on this client? Thanks, Chris> > Cheers, > Craig > > > > > Chris Worley wrote: > > More issues. Now, on the clients. > > > > The MDT/MGS/OST''s are all up and mounted, showing: > > > > # lctl list_nids > > 36.122.255.201 at o2ib > > 36.121.255.201 at tcp > > > > Now, when I go to mount on the IB-based clients, I get: > > > > # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs > > mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No > > such file or directory > > Is the MGS specification correct? > > Is the filesystem name correct? > > If upgrading, is the copied client log valid? (see upgrade docs) > > > > The modprobe.conf contains: > > > > options lnet networks=o2ib0(ib0) > > > > And lctl looks good: > > > > # lctl list_nids > > 36.122.255.1 at o2ib > > > > But dmesg shows that it wants to go over the 36.121.x.x (tcp) network > > (36.12[12].255.201 is the MGS/MDS server): > > > > LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > > for 36.121.255.201 at tcp > > LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > > find peer 36.121.255.201 at tcp! > > LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can''t add > > initial connection > > LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection > > LustreError: 10001:0:(obd_config.c:325:class_setup()) setup > > ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2) > > LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler()) > > Err -2 on cfg command: > > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > > 2:36.121.255.201 at tcp > > LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log > > ''ddnlfs-client'' failed (-2). This may be the result of communication > > errors between this node and the MGS, a bad configuration, or other > > errors. See the syslog for more information. > > LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to > > process log: -2 > > LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup > > Lustre: client 0000010430913c00 umount complete > > LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > > mount (-2) > > > > Note that this setup works fine in the non-multihomed setup, so I > > don''t think ko2iblnd is to blame (the setup on the clients hasn''t > > changed at all). > > > > What am I doing wrong? > > > > Thanks, > > > > Chris > > On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote: > >> I changed my modprobe.conf to look exactly as yours, and it worked. I > >> hadn''t been using all the quotes until the doc said to... but they may > >> have indeed been the problem. > >> > >> Thanks! > >> > >> Chris > >> > >> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote: > >> > > >> > > >> > Do "lclt list_nids" on your mds and oss''s. They should look > >> > something like this. > >> > > >> > [root at hpcmds ~]# lctl list_nids > >> > 10.13.24.40 at o2ib > >> > 10.13.16.40 at tcp > >> > > >> > Then your clients should have a nid on one or the other. > >> > > >> > Check your dmesg output after loading lnet. The complaints are > >> > pretty useful. Your modprobe.conf line looks correct although we > >> > found we did not need all the quoting so you should check that as > >> > well. Ours looks like... > >> > > >> > options lnet networks=o2ib(ib0),tcp(eth0) > >> > > >> > My guess is that it either cannot find or does not like your ko2iblnd > >> > module. > >> > > >> > ct > >> > > >> > > >> > > >> > On Mar 7, 2008, at 12:46 AM, Chris Worley wrote: > >> > > >> > > Most everything is over IB, but I have a few systems I''d like to mount > >> > > the Lustre fs over GigE. > >> > > > >> > > I think I''ve followed the Multihomed instructions correctly, in: > >> > > > >> > > http://dlc.sun.com/pdf/820-3681/820-3681.pdf > >> > > > >> > > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both > >> > > Ethernet and IB) includes: > >> > > > >> > > options lnet ''networks="tcp0(eth0),o2ib0(ib0)"'' > >> > > > >> > > I make and mount the mdt with (which has both IB and Ethernet, subnet > >> > > 36.122.x.x is IB, 36.121.x.x is Ethernet): > >> > > > >> > > # mkfs.lustre --mdt --mgs > >> > > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0 > >> > > # mount -t lustre /dev/md0 /lfs/mdtb > >> > > > >> > > But, at this point, the ksocklnd module is loaded rather than the > >> > > ko2iblnd module! > >> > > > >> > > On the OSS, I make the fs w/ the same "msgnode", but, when I try to > >> > > mount it, it correctly uses the IB interface, but can''t contact the > >> > > MDS: > >> > > > >> > > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > >> > > for MGC36.122.255.201 at o2ib_0 > >> > > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > >> > > find peer MGC36.122.255.201 at o2ib_0! > >> > > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can''t add > >> > > initial connection > >> > > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection()) > >> > > NULL connection > >> > > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup > >> > > MGC36.122.255.201 at o2ib failed (-2) > >> > > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple()) > >> > > MGC36.122.255.201 at o2ib setup error -2 > >> > > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd > >> > > ddnlfs-OSTffff > >> > > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount()) > >> > > ddnlfs-OSTffff not registered > >> > > > >> > > It too has loaded the ksocklnd module, and not the ko2iblnd module. I > >> > > guess that both modules should be loaded in a multihomed case? > >> > > > >> > > What am I doing wrong? > >> > > > >> > > Thanks, > >> > > > >> > > Chris > >> > > _______________________________________________ > >> > > Lustre-discuss mailing list > >> > > Lustre-discuss at lists.lustre.org > >> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > >> > > >> > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Charles Taylor
2008-Mar-07 17:39 UTC
[Lustre-discuss] Multihomed question: want Lustre over IB andEthernet
Make sure the client can lctl ping the MDS and OSS o2ib nids. Then make sure of the same between the OSSs and the MDS/MGS. If all that seems fine, I would start to wonder if I made a mistake in specifying the nids when formating the OSTs. ct On Mar 7, 2008, at 12:17 PM, Canon, Richard Shane wrote:> > Chris, > > Perhaps you need to perform some write_conf like command. I''m not > sure if this is needed in 1.6 or not. > > Shane > > > > ----- Original Message ----- > From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss- > bounces at lists.lustre.org> > To: lustre-discuss <lustre-discuss at lists.lustre.org> > Sent: Fri Mar 07 12:03:17 2008 > Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over > IB andEthernet > > On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott > <prescott at hpc.ufl.edu> wrote: >> >> I think your client modprobe.conf lnet option >> should be this: >> >> >> options lnet networks=o2ib(ib0) >> >> (not ''o2ib0''). > > It still seems to want the TCP connection: > > Lustre: Added LNI 36.122.255.1 at o2ib [8/64] > Lustre: Lustre Client File System; info at clusterfs.com > LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > for 36.121.255.201 at tcp > LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > find peer 36.121.255.201 at tcp! > LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can''t add > initial connection > LustreError: 11043:0:(obd_config.c:325:class_setup()) setup > ddnlfs-MDT0000-mdc-0000010430934400 failed (-2) > LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler()) > Err -2 on cfg command: > LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) > NULL connection > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > 2:36.121.255.201 at tcp > LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log > ''ddnlfs-client'' failed (-2). This may be the result of communication > errors between this node and the MGS, a bad configuration, or other > errors. See the syslog for more information. > LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to > process log: -2 > LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 > not setup > Lustre: client 0000010430934400 umount complete > LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-2) > >> >> Another thing to try, if that doesn''t work lctl >> ping your MDS/MGS/OSS nids, like so: >> >> lctl ping 36.122.255.201 at o2ib > > Before and after the change it looks the same: > > # lctl ping 36.122.255.201 at o2ib > 12345-0 at lo > 12345-36.122.255.201 at o2ib > 12345-36.121.255.201 at tcp > > If I change my modprobe.conf to look as on the MDS/OSS''s: > > options lnet networks=o2ib0(ib0),tcp0(eth0) > > Then, mount just specifying o2ib: > > # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs > > It works, but, both ko2iblnd and ksocklnd are loaded. > > The dmesg output is: > > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.2 > Build Version: > 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL- > Lustre-1.6.4.2 > Lustre: Added LNI 36.122.255.1 at o2ib [8/64] > Lustre: Added LNI 36.121.255.1 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info at clusterfs.com > Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter > stripesize=2M > Lustre: Client ddnlfs-client has started > > Can I be certain it''ll use IB for LFS on this client? > > Thanks, > > Chris >> >> Cheers, >> Craig >> >> >> >> >> Chris Worley wrote: >>> More issues. Now, on the clients. >>> >>> The MDT/MGS/OST''s are all up and mounted, showing: >>> >>> # lctl list_nids >>> 36.122.255.201 at o2ib >>> 36.121.255.201 at tcp >>> >>> Now, when I go to mount on the IB-based clients, I get: >>> >>> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs >>> mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No >>> such file or directory >>> Is the MGS specification correct? >>> Is the filesystem name correct? >>> If upgrading, is the copied client log valid? (see upgrade docs) >>> >>> The modprobe.conf contains: >>> >>> options lnet networks=o2ib0(ib0) >>> >>> And lctl looks good: >>> >>> # lctl list_nids >>> 36.122.255.1 at o2ib >>> >>> But dmesg shows that it wants to go over the 36.121.x.x (tcp) >>> network >>> (36.12[12].255.201 is the MGS/MDS server): >>> >>> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID >>> found >>> for 36.121.255.201 at tcp >>> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) >>> cannot >>> find peer 36.121.255.201 at tcp! >>> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can''t add >>> initial connection >>> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) >>> NULL connection >>> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup >>> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2) >>> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler()) >>> Err -2 on cfg command: >>> Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID >>> 2:36.121.255.201 at tcp >>> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration >>> from log >>> ''ddnlfs-client'' failed (-2). This may be the result of communication >>> errors between this node and the MGS, a bad configuration, or other >>> errors. See the syslog for more information. >>> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to >>> process log: -2 >>> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 >>> not setup >>> Lustre: client 0000010430913c00 umount complete >>> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) >>> Unable to >>> mount (-2) >>> >>> Note that this setup works fine in the non-multihomed setup, so I >>> don''t think ko2iblnd is to blame (the setup on the clients hasn''t >>> changed at all). >>> >>> What am I doing wrong? >>> >>> Thanks, >>> >>> Chris >>> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> >>> wrote: >>>> I changed my modprobe.conf to look exactly as yours, and it >>>> worked. I >>>> hadn''t been using all the quotes until the doc said to... but >>>> they may >>>> have indeed been the problem. >>>> >>>> Thanks! >>>> >>>> Chris >>>> >>>> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor >>>> <taylor at hpc.ufl.edu> wrote: >>>>> >>>>> >>>>> Do "lclt list_nids" on your mds and oss''s. They should look >>>>> something like this. >>>>> >>>>> [root at hpcmds ~]# lctl list_nids >>>>> 10.13.24.40 at o2ib >>>>> 10.13.16.40 at tcp >>>>> >>>>> Then your clients should have a nid on one or the other. >>>>> >>>>> Check your dmesg output after loading lnet. The complaints are >>>>> pretty useful. Your modprobe.conf line looks correct although we >>>>> found we did not need all the quoting so you should check that as >>>>> well. Ours looks like... >>>>> >>>>> options lnet networks=o2ib(ib0),tcp(eth0) >>>>> >>>>> My guess is that it either cannot find or does not like your >>>>> ko2iblnd >>>>> module. >>>>> >>>>> ct >>>>> >>>>> >>>>> >>>>> On Mar 7, 2008, at 12:46 AM, Chris Worley wrote: >>>>> >>>>>> Most everything is over IB, but I have a few systems I''d like >>>>>> to mount >>>>>> the Lustre fs over GigE. >>>>>> >>>>>> I think I''ve followed the Multihomed instructions correctly, in: >>>>>> >>>>>> http://dlc.sun.com/pdf/820-3681/820-3681.pdf >>>>>> >>>>>> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both >>>>>> Ethernet and IB) includes: >>>>>> >>>>>> options lnet ''networks="tcp0(eth0),o2ib0(ib0)"'' >>>>>> >>>>>> I make and mount the mdt with (which has both IB and Ethernet, >>>>>> subnet >>>>>> 36.122.x.x is IB, 36.121.x.x is Ethernet): >>>>>> >>>>>> # mkfs.lustre --mdt --mgs >>>>>> --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > / >>>>>> dev/md0 >>>>>> # mount -t lustre /dev/md0 /lfs/mdtb >>>>>> >>>>>> But, at this point, the ksocklnd module is loaded rather than the >>>>>> ko2iblnd module! >>>>>> >>>>>> On the OSS, I make the fs w/ the same "msgnode", but, when I >>>>>> try to >>>>>> mount it, it correctly uses the IB interface, but can''t >>>>>> contact the >>>>>> MDS: >>>>>> >>>>>> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No >>>>>> NID found >>>>>> for MGC36.122.255.201 at o2ib_0 >>>>>> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) >>>>>> cannot >>>>>> find peer MGC36.122.255.201 at o2ib_0! >>>>>> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can''t >>>>>> add >>>>>> initial connection >>>>>> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection()) >>>>>> NULL connection >>>>>> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup >>>>>> MGC36.122.255.201 at o2ib failed (-2) >>>>>> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple()) >>>>>> MGC36.122.255.201 at o2ib setup error -2 >>>>>> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd >>>>>> ddnlfs-OSTffff >>>>>> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount()) >>>>>> ddnlfs-OSTffff not registered >>>>>> >>>>>> It too has loaded the ksocklnd module, and not the ko2iblnd >>>>>> module. I >>>>>> guess that both modules should be loaded in a multihomed case? >>>>>> >>>>>> What am I doing wrong? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Chris >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> Lustre-discuss at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Chris Worley
2008-Mar-07 18:09 UTC
[Lustre-discuss] Multihomed question: want Lustre over IB andEthernet
On Fri, Mar 7, 2008 at 10:39 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:> Make sure the client can lctl ping the MDS and OSS o2ib nids.I''m not sure what the output should look like, but the IPoIB addresses of the MDS and OSS nodes are: 36.122.255.20[1234], and the ping output from the client looks like: # lctl ping 36.122.255.201 at o2ib 12345-0 at lo 12345-36.122.255.201 at o2ib 12345-36.121.255.201 at tcp # lctl ping 36.122.255.202 at o2ib 12345-0 at lo 12345-36.122.255.202 at o2ib 12345-36.121.255.202 at tcp # lctl ping 36.122.255.203 at o2ib 12345-0 at lo 12345-36.122.255.203 at o2ib 12345-36.121.255.203 at tcp # lctl ping 36.122.255.204 at o2ib 12345-0 at lo 12345-36.122.255.204 at o2ib 12345-36.121.255.204 at tcp> Then > make sure of the same between the OSSs and the MDS/MGS.Looks the same from the MDS/OSS''s: # pdsh -w io[1-4] "lctl ping 36.122.255.201 at o2ib;lctl ping 36.122.255.202 at o2ib;lctl ping 36.122.255.203 at o2ib;lctl ping 36.122.255.204 at o2ib" | dshbak -c ---------------- io[1-4] ---------------- 12345-0 at lo 12345-36.122.255.201 at o2ib 12345-36.121.255.201 at tcp 12345-0 at lo 12345-36.122.255.202 at o2ib 12345-36.121.255.202 at tcp 12345-0 at lo 12345-36.122.255.203 at o2ib 12345-36.121.255.203 at tcp 12345-0 at lo 12345-36.122.255.204 at o2ib 12345-36.121.255.204 at tcp> If all that > seems fine, I would start to wonder if I made a mistake in specifying > the nids when formating the OSTs.The MDS formatting looked like: mkfs.lustre --mdt --mgs --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" \ --fsname=ddnlfs --param sys.timeout=40 --param lov.stripesize=2M \ --stripe-count-hint=8 /dev/md0 The OST''s formatting looked like: for i in a b c d do mkfs.lustre --ost --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" \ --fsname=ddnlfs --param sys.timeout=40 --param lov.stripesize=2M \ --reformat /dev/sd"$i" & done So far, my benchmark results look like everybody is using IB... I do worry if I''ll be able to mount the file system on a Ethernet-only system (I don''t have one yet... I''ll try to test with an IB-capable client, but that could easily generate a false positive). Thanks!@ Chris> > ct > > > > > On Mar 7, 2008, at 12:17 PM, Canon, Richard Shane wrote: > > > > > Chris, > > > > Perhaps you need to perform some write_conf like command. I''m not > > sure if this is needed in 1.6 or not. > > > > Shane > > > > > > > > ----- Original Message ----- > > From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss- > > bounces at lists.lustre.org> > > To: lustre-discuss <lustre-discuss at lists.lustre.org> > > Sent: Fri Mar 07 12:03:17 2008 > > Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over > > IB andEthernet > > > > On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott > > <prescott at hpc.ufl.edu> wrote: > >> > >> I think your client modprobe.conf lnet option > >> should be this: > >> > >> > >> options lnet networks=o2ib(ib0) > >> > >> (not ''o2ib0''). > > > > It still seems to want the TCP connection: > > > > Lustre: Added LNI 36.122.255.1 at o2ib [8/64] > > Lustre: Lustre Client File System; info at clusterfs.com > > LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > > for 36.121.255.201 at tcp > > LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > > find peer 36.121.255.201 at tcp! > > LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can''t add > > initial connection > > LustreError: 11043:0:(obd_config.c:325:class_setup()) setup > > ddnlfs-MDT0000-mdc-0000010430934400 failed (-2) > > LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler()) > > Err -2 on cfg command: > > LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) > > NULL connection > > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > > 2:36.121.255.201 at tcp > > LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log > > ''ddnlfs-client'' failed (-2). This may be the result of communication > > errors between this node and the MGS, a bad configuration, or other > > errors. See the syslog for more information. > > LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to > > process log: -2 > > LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 > > not setup > > Lustre: client 0000010430934400 umount complete > > LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > > mount (-2) > > > >> > >> Another thing to try, if that doesn''t work lctl > >> ping your MDS/MGS/OSS nids, like so: > >> > >> lctl ping 36.122.255.201 at o2ib > > > > Before and after the change it looks the same: > > > > # lctl ping 36.122.255.201 at o2ib > > 12345-0 at lo > > 12345-36.122.255.201 at o2ib > > 12345-36.121.255.201 at tcp > > > > If I change my modprobe.conf to look as on the MDS/OSS''s: > > > > options lnet networks=o2ib0(ib0),tcp0(eth0) > > > > Then, mount just specifying o2ib: > > > > # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs > > > > It works, but, both ko2iblnd and ksocklnd are loaded. > > > > The dmesg output is: > > > > Lustre: OBD class driver, info at clusterfs.com > > Lustre Version: 1.6.4.2 > > Build Version: > > 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL- > > Lustre-1.6.4.2 > > Lustre: Added LNI 36.122.255.1 at o2ib [8/64] > > Lustre: Added LNI 36.121.255.1 at tcp [8/256] > > Lustre: Accept secure, port 988 > > Lustre: Lustre Client File System; info at clusterfs.com > > Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter > > stripesize=2M > > Lustre: Client ddnlfs-client has started > > > > Can I be certain it''ll use IB for LFS on this client? > > > > Thanks, > > > > Chris > >> > >> Cheers, > >> Craig > >> > >> > >> > >> > >> Chris Worley wrote: > >>> More issues. Now, on the clients. > >>> > >>> The MDT/MGS/OST''s are all up and mounted, showing: > >>> > >>> # lctl list_nids > >>> 36.122.255.201 at o2ib > >>> 36.121.255.201 at tcp > >>> > >>> Now, when I go to mount on the IB-based clients, I get: > >>> > >>> # mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs > >>> mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No > >>> such file or directory > >>> Is the MGS specification correct? > >>> Is the filesystem name correct? > >>> If upgrading, is the copied client log valid? (see upgrade docs) > >>> > >>> The modprobe.conf contains: > >>> > >>> options lnet networks=o2ib0(ib0) > >>> > >>> And lctl looks good: > >>> > >>> # lctl list_nids > >>> 36.122.255.1 at o2ib > >>> > >>> But dmesg shows that it wants to go over the 36.121.x.x (tcp) > >>> network > >>> (36.12[12].255.201 is the MGS/MDS server): > >>> > >>> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID > >>> found > >>> for 36.121.255.201 at tcp > >>> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) > >>> cannot > >>> find peer 36.121.255.201 at tcp! > >>> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can''t add > >>> initial connection > >>> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) > >>> NULL connection > >>> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup > >>> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2) > >>> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler()) > >>> Err -2 on cfg command: > >>> Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > >>> 2:36.121.255.201 at tcp > >>> LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration > >>> from log > >>> ''ddnlfs-client'' failed (-2). This may be the result of communication > >>> errors between this node and the MGS, a bad configuration, or other > >>> errors. See the syslog for more information. > >>> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to > >>> process log: -2 > >>> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 > >>> not setup > >>> Lustre: client 0000010430913c00 umount complete > >>> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) > >>> Unable to > >>> mount (-2) > >>> > >>> Note that this setup works fine in the non-multihomed setup, so I > >>> don''t think ko2iblnd is to blame (the setup on the clients hasn''t > >>> changed at all). > >>> > >>> What am I doing wrong? > >>> > >>> Thanks, > >>> > >>> Chris > >>> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> > >>> wrote: > >>>> I changed my modprobe.conf to look exactly as yours, and it > >>>> worked. I > >>>> hadn''t been using all the quotes until the doc said to... but > >>>> they may > >>>> have indeed been the problem. > >>>> > >>>> Thanks! > >>>> > >>>> Chris > >>>> > >>>> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor > >>>> <taylor at hpc.ufl.edu> wrote: > >>>>> > >>>>> > >>>>> Do "lclt list_nids" on your mds and oss''s. They should look > >>>>> something like this. > >>>>> > >>>>> [root at hpcmds ~]# lctl list_nids > >>>>> 10.13.24.40 at o2ib > >>>>> 10.13.16.40 at tcp > >>>>> > >>>>> Then your clients should have a nid on one or the other. > >>>>> > >>>>> Check your dmesg output after loading lnet. The complaints are > >>>>> pretty useful. Your modprobe.conf line looks correct although we > >>>>> found we did not need all the quoting so you should check that as > >>>>> well. Ours looks like... > >>>>> > >>>>> options lnet networks=o2ib(ib0),tcp(eth0) > >>>>> > >>>>> My guess is that it either cannot find or does not like your > >>>>> ko2iblnd > >>>>> module. > >>>>> > >>>>> ct > >>>>> > >>>>> > >>>>> > >>>>> On Mar 7, 2008, at 12:46 AM, Chris Worley wrote: > >>>>> > >>>>>> Most everything is over IB, but I have a few systems I''d like > >>>>>> to mount > >>>>>> the Lustre fs over GigE. > >>>>>> > >>>>>> I think I''ve followed the Multihomed instructions correctly, in: > >>>>>> > >>>>>> http://dlc.sun.com/pdf/820-3681/820-3681.pdf > >>>>>> > >>>>>> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both > >>>>>> Ethernet and IB) includes: > >>>>>> > >>>>>> options lnet ''networks="tcp0(eth0),o2ib0(ib0)"'' > >>>>>> > >>>>>> I make and mount the mdt with (which has both IB and Ethernet, > >>>>>> subnet > >>>>>> 36.122.x.x is IB, 36.121.x.x is Ethernet): > >>>>>> > >>>>>> # mkfs.lustre --mdt --mgs > >>>>>> --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > / > >>>>>> dev/md0 > >>>>>> # mount -t lustre /dev/md0 /lfs/mdtb > >>>>>> > >>>>>> But, at this point, the ksocklnd module is loaded rather than the > >>>>>> ko2iblnd module! > >>>>>> > >>>>>> On the OSS, I make the fs w/ the same "msgnode", but, when I > >>>>>> try to > >>>>>> mount it, it correctly uses the IB interface, but can''t > >>>>>> contact the > >>>>>> MDS: > >>>>>> > >>>>>> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No > >>>>>> NID found > >>>>>> for MGC36.122.255.201 at o2ib_0 > >>>>>> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) > >>>>>> cannot > >>>>>> find peer MGC36.122.255.201 at o2ib_0! > >>>>>> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can''t > >>>>>> add > >>>>>> initial connection > >>>>>> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection()) > >>>>>> NULL connection > >>>>>> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup > >>>>>> MGC36.122.255.201 at o2ib failed (-2) > >>>>>> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple()) > >>>>>> MGC36.122.255.201 at o2ib setup error -2 > >>>>>> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd > >>>>>> ddnlfs-OSTffff > >>>>>> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount()) > >>>>>> ddnlfs-OSTffff not registered > >>>>>> > >>>>>> It too has loaded the ksocklnd module, and not the ko2iblnd > >>>>>> module. I > >>>>>> guess that both modules should be loaded in a multihomed case? > >>>>>> > >>>>>> What am I doing wrong? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Chris > >>>>>> _______________________________________________ > >>>>>> Lustre-discuss mailing list > >>>>>> Lustre-discuss at lists.lustre.org > >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>>>> > >>>>> > >>>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> Lustre-discuss at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > >> > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >