Thomas Zeiser
2012-Feb-16 11:21 UTC
[Lustre-discuss] accessing different Lustre filesystems using different network interconnects
Hi, we have two clusters (let''s call them A and B); each with a "local" Infiniband network and a Lustre filesystem. Both clusters can see each other via GBit Ethernet but not via Infiniband. We''d now like to mount Lustre-A on the login node A via Infiniband (o2ib) but in addition also Lustre-B via GBit Ethernet (eth0). (And vice versa on the login node of B.) However, even if the initial connect of login-A to Lustre-B is initiated via eth-NID it gets the IB-NID back from the MDS and subsequently tries to reach the MDS/OSS via Infiniband; but there is no physical connection between the two Infiniband networks. Any ideas/suggestions? Except also accessing Lustre-A from login-A only via GBit Ethernet, i.e. loading lnet without support for o2ib. Thanks for your help, Thomas Zeiser -- Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg, Germany
Jonathan Buch
2012-Feb-16 11:50 UTC
[Lustre-discuss] accessing different Lustre filesystems using different network interconnects
Hello Mr. Zeiser, without knowledge about your current lnet setup: 3 networks - 2 infiniband, 1 eth cluster-a: i2ib0 cluster-b: i2ib1 conn-ab: tcp0 # /etc/modprobe.d/lustre options lnet ip2nets="tcp0 192.168.0.[1,2]; ib2ib0 10.0.1.*; ib2ib1 10.0.2.*" options lnet routes="tcp0 1 10.0.1.1 at i2ib0; tcp0 1 10.0.2.1 at i2ib1; i2ib0 1 192.168.0.1; i2ib1 1 192.168.0.2; " A `lctl list_nids` on login-a: 192.168.0.1 at tcp0 10.0.1.1 at i2ib0 And the other gateway of course gets the i2ib1 and second tcp0. The gateways get to be lustre routers then and you can even mount the other lustre on the client nodes - the single modprobe file can be distributed on all lustre servers/clients. mount -t lustre 192.168.0.1 at tcp0:10.0.1.1 at i2ib0:/lustre-a /lustre-a I _think_ this should work (of course it depends on how you set up you current networks, if both cluster a and b have i2ib0 as network I think you need to rename one (which I guess would involve an ugly unmount of the full lustre system and --writeconf, as suggested by the manual)). I hope this covers what you were asking. Greetings, Jonathan Buch On Thu, 16 Feb 2012 12:21:03 +0100, Thomas Zeiser <thomas.zeiser at rrze.uni-erlangen.de> wrote:> Hi, > > we have two clusters (let''s call them A and B); each with a "local" > Infiniband network and a Lustre filesystem. Both clusters can see > each other via GBit Ethernet but not via Infiniband. > > We''d now like to mount Lustre-A on the login node A via Infiniband > (o2ib) but in addition also Lustre-B via GBit Ethernet (eth0). > (And vice versa on the login node of B.) > However, even if the initial connect of login-A to Lustre-B is > initiated via eth-NID it gets the IB-NID back from the MDS > and subsequently tries to reach the MDS/OSS via Infiniband; but > there is no physical connection between the two Infiniband networks. > > Any ideas/suggestions? Except also accessing Lustre-A from login-A > only via GBit Ethernet, i.e. loading lnet without support for o2ib. > > > Thanks for your help, > > Thomas Zeiser-- --------------------------------------------------------------- Jonathan Buch, B.Sc. Karlsruhe University of Applied Sciences Institute of Materials and Processes (IMP) Moltkestrasse 30 D-76133 Karlsruhe Germany E-mail: jonathan.buch at hs-karlsruhe.de Phone: +49 721 925 1415 Fax: +49 721 925 1503 http://www.iaf.hs-karlsruhe.de/ice ----------------------------------------------------------------
Thomas Zeiser
2012-Feb-17 16:07 UTC
[Lustre-discuss] accessing different Lustre filesystems using different network interconnects
Hello Mr. Buch, On Thu, Feb 16, 2012 at 12:50:56PM +0100, Jonathan Buch wrote:> Hello Mr. Zeiser, > > without knowledge about your current lnet setup: > > 3 networks - 2 infiniband, 1 eth > > cluster-a: i2ib0 > cluster-b: i2ib1 > conn-ab: tcp0 > > # /etc/modprobe.d/lustre > options lnet ip2nets="tcp0 192.168.0.[1,2]; ib2ib0 10.0.1.*; ib2ib1 > 10.0.2.*" > options lnet routes="tcp0 1 10.0.1.1 at i2ib0; tcp0 1 10.0.2.1 at i2ib1; > i2ib0 1 192.168.0.1; i2ib1 1 192.168.0.2; " > > > A `lctl list_nids` on login-a: > 192.168.0.1 at tcp0 > 10.0.1.1 at i2ib0 > > And the other gateway of course gets the i2ib1 and second tcp0. > The gateways get to be lustre routers then and you can even mount theI hoped to go without lnet routers ...> other lustre on the client nodes - the single modprobe file can be > distributed on all lustre servers/clients. > > mount -t lustre 192.168.0.1 at tcp0:10.0.1.1 at i2ib0:/lustre-a /lustre-a > > I _think_ this should work (of course it depends on how you set up > you current networks, if both cluster a and b have i2ib0 as network > I think you need to rename one>that''s probably the important point; currently, both (independent) installations of course default to i2ib0 :-(> (which I guess would involve an > ugly unmount of the full lustre system and --writeconf, as suggested > by the manual)).so, let''s wait for the next downtime ...> I hope this covers what you were asking.exactly; thanks! Thomas Zeiser> > Greetings, > > Jonathan Buch