Thomas Roth
2011-Jun-14 16:23 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib
Hi all, I''d like to mount two Lustre filesystems on one client. Issues with more than one MGS set aside, the point here is that one of them is an Infiniband-cluster, the other is ethernet-based. And my client is on the ethernet. I have managed to mount the o2ib-fs by setting up an LNET router, but now this client''s LNET doesn;t known how to reach the ethernet-fs. So the basic modprobe.conf reads > options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" This mounts the MGS on the o2ib network. What do I have to add to get to the MGS on the tpc network? Meanwhile I have studied more posts here and came up with > options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib LNET-Router-IP at tcp1; tcp Default-Gateway-IP at tcp2" Doesn''t work either, but I see in the log of the (tcp-)MGS: > LustreError: 120-3: Refusing connection from Client-IP for MGS-IP at tcp2: No matching NI Somethings getting through ... Any ideas? Regards, Thomas
Michael Shuey
2011-Jun-14 17:00 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib
Is your ethernet FS in tcp1, or tcp0? Your config bits indicate the client is in tcp1 - do the servers agree? -- Mike Shuey On Tue, Jun 14, 2011 at 12:23 PM, Thomas Roth <t.roth at gsi.de> wrote:> Hi all, > > I''d like to mount two Lustre filesystems on one client. Issues with more than one MGS set aside, > the point here is that one of them is an Infiniband-cluster, the other is ethernet-based. > And my client is on the ethernet. > I have managed to mount the o2ib-fs by setting up an LNET router, but now this client''s LNET doesn;t > known how to reach the ethernet-fs. > > So the basic modprobe.conf reads > ?> options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" > This mounts the MGS on the o2ib network. > > What do I have to add to get to the MGS on the tpc network? > > Meanwhile I have studied more posts here and came up with > ?> options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib LNET-Router-IP at tcp1; tcp > Default-Gateway-IP at tcp2" > > Doesn''t work either, but I see in the log of the (tcp-)MGS: > ?> LustreError: 120-3: Refusing connection from Client-IP for MGS-IP at tcp2: No matching NI > > Somethings getting through ... > > Any ideas? > > Regards, > Thomas > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Thomas Roth
2011-Jun-14 17:26 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib
Hm, the ethernet FS is in tcp0 - MGS says its nids are MGS-IP at tcp. So not surprising it refuses that connection. On the other hand, > options lnet networks=tcp1(eth0),tcp(eth0:0) routes="o2ib LNET-Router-IP at tcp1; tcp Default-Gateway-IP at tcp" results in > Can''t create route to tcp via Gateway-IP at tcp Cheers, Thomas On 06/14/2011 07:00 PM, Michael Shuey wrote:> Is your ethernet FS in tcp1, or tcp0? Your config bits indicate the > client is in tcp1 - do the servers agree? > > -- > Mike Shuey > > > > On Tue, Jun 14, 2011 at 12:23 PM, Thomas Roth <t.roth at gsi.de> wrote: > > Hi all, > > > > I''d like to mount two Lustre filesystems on one client. Issues with more than one MGS set aside, > > the point here is that one of them is an Infiniband-cluster, the other is ethernet-based. > > And my client is on the ethernet. > > I have managed to mount the o2ib-fs by setting up an LNET router, but now this client''s LNET doesn;t > > known how to reach the ethernet-fs. > > > > So the basic modprobe.conf reads > > > options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" > > This mounts the MGS on the o2ib network. > > > > What do I have to add to get to the MGS on the tpc network? > > > > Meanwhile I have studied more posts here and came up with > > > options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib LNET-Router-IP at tcp1; tcp > > Default-Gateway-IP at tcp2" > > > > Doesn''t work either, but I see in the log of the (tcp-)MGS: > > > LustreError: 120-3: Refusing connection from Client-IP for MGS-IP at tcp2: No matching NI > > > > Somethings getting through ... > > > > Any ideas? > > > > Regards, > > Thomas > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > >
Michael Shuey
2011-Jun-14 18:04 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib
That may be because your gateway doesn''t have an interface on tcp (aka tcp0). I suspect you want to keep your ethernet clients in tcp0, your IB clients in o2ib0, and your router in both. Personally, I find it easiest to just give different module options on each system (rather than try ip2nets stuff). On the ether clients, I''d try: options lnet networks=tcp0(eth0) routes="o2ib0 LNET-router-eth_IP at tcp0" dead_router_check_interval=300 On IB clients: options lnet networks=o2ib0(ib0) routes="tcp0 LNET-router-IB_IP at ib0" dead_router_check_interval=300 then on the router: options lnet networks=tcp0(eth0),o2ib0(ib0) forwarding=enabled accept_timeout=15 Obviously, your file servers will need to have lnet options similar to the clients: options lnet networks=tcp0(eth0) routes="o2ib0 LNET-router-eth_IP at tcp0" dead_router_check_interval=300 options lnet networks=o2ib0(ib0) routes="tcp0 LNET-router-IB_IP at o2ib0" dead_router_check_interval=300 That''s just a guess, your mileage may vary, etc., but I think it''s close to what you want. Note that you really want the dead_router_check_interval if you''re using lnet routers. Without that parameter, the lustre client will automatically mark a router as failed when it''s unavailable but will not check to see if it ever comes back. With this param, it checks every 300 seconds (and re-enables it if found). Hope this helps. -- Mike Shuey On Tue, Jun 14, 2011 at 1:26 PM, Thomas Roth <t.roth at gsi.de> wrote:> Hm, the ethernet FS is in tcp0 - MGS says its nids are MGS-IP at tcp. > So not surprising it refuses that connection. > On the other hand, >> options lnet networks=tcp1(eth0),tcp(eth0:0) routes="o2ib >> LNET-Router-IP at tcp1; tcp Default-Gateway-IP at tcp" > > results in >> Can''t create route to tcp via Gateway-IP at tcp > > Cheers, > Thomas > > > On 06/14/2011 07:00 PM, Michael Shuey wrote: >> >> Is your ethernet FS in tcp1, or tcp0? Your config bits indicate the >> client is in tcp1 - do the servers agree? >> >> -- >> Mike Shuey >> >> >> >> On Tue, Jun 14, 2011 at 12:23 PM, Thomas Roth <t.roth at gsi.de> wrote: >> ?> Hi all, >> ?> >> ?> I''d like to mount two Lustre filesystems on one client. Issues with >> more than one MGS set aside, >> ?> the point here is that one of them is an Infiniband-cluster, the other >> is ethernet-based. >> ?> And my client is on the ethernet. >> ?> I have managed to mount the o2ib-fs by setting up an LNET router, but >> now this client''s LNET doesn;t >> ?> known how to reach the ethernet-fs. >> ?> >> ?> So the basic modprobe.conf reads >> ?> > options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" >> ?> This mounts the MGS on the o2ib network. >> ?> >> ?> What do I have to add to get to the MGS on the tpc network? >> ?> >> ?> Meanwhile I have studied more posts here and came up with >> ?> > options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib >> LNET-Router-IP at tcp1; tcp >> ?> Default-Gateway-IP at tcp2" >> ?> >> ?> Doesn''t work either, but I see in the log of the (tcp-)MGS: >> ?> > LustreError: 120-3: Refusing connection from Client-IP for >> MGS-IP at tcp2: No matching NI >> ?> >> ?> Somethings getting through ... >> ?> >> ?> Any ideas? >> ?> >> ?> Regards, >> ?> Thomas >> ?> _______________________________________________ >> ?> Lustre-discuss mailing list >> ?> Lustre-discuss at lists.lustre.org >> ?> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> ?> >> > > >
Thomas Roth
2011-Jun-14 18:26 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib
Thanks, Michael. I''ll certainly put in the check_interval, that will be needed. However, what I tried was to have an ethernet client that mounts one FS via the LNET router (Infiniband-FS behind it) and simultaneously mounts the other FS, which is on tcp0 - via its default route. So actually I don''t have any IB clients (except for the LNET routers). Probably I messed up the tcpX-network names. Cheers, Thomas On 06/14/2011 08:04 PM, Michael Shuey wrote:> That may be because your gateway doesn''t have an interface on tcp (aka > tcp0). I suspect you want to keep your ethernet clients in tcp0, your > IB clients in o2ib0, and your router in both. Personally, I find it > easiest to just give different module options on each system (rather > than try ip2nets stuff). > > On the ether clients, I''d try: > > options lnet networks=tcp0(eth0) routes="o2ib0 > LNET-router-eth_IP at tcp0" dead_router_check_interval=300 > > On IB clients: > > options lnet networks=o2ib0(ib0) routes="tcp0 LNET-router-IB_IP at ib0" > dead_router_check_interval=300 > > then on the router: > > options lnet networks=tcp0(eth0),o2ib0(ib0) forwarding=enabled accept_timeout=15 > > Obviously, your file servers will need to have lnet options similar to > the clients: > > options lnet networks=tcp0(eth0) routes="o2ib0 > LNET-router-eth_IP at tcp0" dead_router_check_interval=300 > options lnet networks=o2ib0(ib0) routes="tcp0 LNET-router-IB_IP at o2ib0" > dead_router_check_interval=300 > > That''s just a guess, your mileage may vary, etc., but I think it''s > close to what you want. Note that you really want the > dead_router_check_interval if you''re using lnet routers. Without that > parameter, the lustre client will automatically mark a router as > failed when it''s unavailable but will not check to see if it ever > comes back. With this param, it checks every 300 seconds (and > re-enables it if found). > > Hope this helps. > > -- > Mike Shuey > > > > On Tue, Jun 14, 2011 at 1:26 PM, Thomas Roth <t.roth at gsi.de> wrote: > > Hm, the ethernet FS is in tcp0 - MGS says its nids are MGS-IP at tcp. > > So not surprising it refuses that connection. > > On the other hand, > >> options lnet networks=tcp1(eth0),tcp(eth0:0) routes="o2ib > >> LNET-Router-IP at tcp1; tcp Default-Gateway-IP at tcp" > > > > results in > >> Can''t create route to tcp via Gateway-IP at tcp > > > > Cheers, > > Thomas > > > > > > On 06/14/2011 07:00 PM, Michael Shuey wrote: > >> > >> Is your ethernet FS in tcp1, or tcp0? Your config bits indicate the > >> client is in tcp1 - do the servers agree? > >> > >> -- > >> Mike Shuey > >> > >> > >> > >> On Tue, Jun 14, 2011 at 12:23 PM, Thomas Roth <t.roth at gsi.de> wrote: > >> > Hi all, > >> > > >> > I''d like to mount two Lustre filesystems on one client. Issues with > >> more than one MGS set aside, > >> > the point here is that one of them is an Infiniband-cluster, the other > >> is ethernet-based. > >> > And my client is on the ethernet. > >> > I have managed to mount the o2ib-fs by setting up an LNET router, but > >> now this client''s LNET doesn;t > >> > known how to reach the ethernet-fs. > >> > > >> > So the basic modprobe.conf reads > >> > > options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" > >> > This mounts the MGS on the o2ib network. > >> > > >> > What do I have to add to get to the MGS on the tpc network? > >> > > >> > Meanwhile I have studied more posts here and came up with > >> > > options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib > >> LNET-Router-IP at tcp1; tcp > >> > Default-Gateway-IP at tcp2" > >> > > >> > Doesn''t work either, but I see in the log of the (tcp-)MGS: > >> > > LustreError: 120-3: Refusing connection from Client-IP for > >> MGS-IP at tcp2: No matching NI > >> > > >> > Somethings getting through ... > >> > > >> > Any ideas? > >> > > >> > Regards, > >> > Thomas > >> > _______________________________________________ > >> > Lustre-discuss mailing list > >> > Lustre-discuss at lists.lustre.org > >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > >> > > > > > > >-- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum f?r Schwerionenforschung GmbH Planckstra?e 1 64291 Darmstadt www.gsi.de Gesellschaft mit beschr?nkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, Dr. Hartmut Eickhoff Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
Thomas Roth
2011-Jun-14 18:44 UTC
[Lustre-discuss] Mount 2 clusters, different networks - LNET tcp1-tcp2-o2ib - solved?
Hi all, this seems to work with the correct IPs and correct network names ;-[] I now have the following modprobe on my ethernet client: > options lnet networks=tcp1(eth0),tcp0(eth0:0) routes="o2ib LNET-Router at tcp1; tcp Default-Route at tcp1" With these options, loading the modules gives me > Jun 14 20:12:55 kernel: Lustre: Added LNI 10.12.70.183 at tcp1 [8/256/0/180] > Jun 14 20:12:55 kernel: Lustre: Added LNI 10.12.0.21 at tcp [8/256/0/180] which are the IPs of eth0 and eth0:0. Now I still wonder why the alias interface eth0:0 is necessary (if left out, the whole endeavor fails). The routes=statement seems to say: "If you have data for tcp, use the Default-Router-IP and go via the interace that is on network tcp1". Oh well, I should probably take some networking lectures... Regards, Thomas On 06/14/2011 06:23 PM, Thomas Roth wrote:> Hi all, > > I''d like to mount two Lustre filesystems on one client. Issues with more than one MGS set aside, > the point here is that one of them is an Infiniband-cluster, the other is ethernet-based. > And my client is on the ethernet. > I have managed to mount the o2ib-fs by setting up an LNET router, but now this client''s LNET doesn;t > known how to reach the ethernet-fs. > > So the basic modprobe.conf reads > > options lnet networks=tcp1(eth0) routes="o2ib LNET-Router-IP at tcp1" > This mounts the MGS on the o2ib network. > > What do I have to add to get to the MGS on the tpc network? > > Meanwhile I have studied more posts here and came up with > > options lnet networks=tcp1(eth0),tcp2(eth0:0) routes="o2ib LNET-Router-IP at tcp1; tcp > Default-Gateway-IP at tcp2" > > Doesn''t work either, but I see in the log of the (tcp-)MGS: > > LustreError: 120-3: Refusing connection from Client-IP for MGS-IP at tcp2: No matching NI > > Somethings getting through ... > > Any ideas? > > Regards, > Thomas > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss