Dardo D Kleiner - CONTRACTOR
2009-Nov-13 20:34 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
Mellanox ConnectX MT25418, two ports, each connected to a separate IB fabric - ib0 and ib1 have distinct IP subnets, each connected to a separate Lustre router. ibstat: CA ''mlx4_0'' CA type: MT25418 Number of ports: 2 Firmware version: 2.7.0 Hardware version: a0 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 302 LMC: 0 SM lid: 2 Capability mask: 0x02510868 Port 2: State: Active Physical state: LinkUp Rate: 20 Base lid: 5 LMC: 0 SM lid: 1 Capability mask: 0x02510868 ip ad ls: 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0 5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1 /etc/modprobe.d/lustre: options lnet \ ip2nets=" \ o2ib1 xxx.xxx.[176-177].[0-255]; o2ib3(ib0) xxx.xxx.182.[128-191]; o2ib4(ib1) xxx.xxx.182.[192-255]" routes=" \ o2ib1 xxx.xxx.182.129 at o2ib3,xxx.xxx.182.193 at o2ib4" dmesg: . . Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0 . . Why don''t I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"? - Dardo
Isaac Huang
2009-Nov-15 01:13 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
On Fri, Nov 13, 2009 at 03:34:14PM -0500, Dardo D Kleiner - CONTRACTOR wrote:> Mellanox ConnectX MT25418, two ports, each connected to a separate > IB fabric - ib0 and ib1 have distinct IP subnets, each connected > to a separate Lustre router. > ...... > ip ad ls: > 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 > inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0 > 5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 > inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1 > > /etc/modprobe.d/lustre: > options lnet \ > ip2nets=" \ > o2ib1 xxx.xxx.[176-177].[0-255]; > o2ib3(ib0) xxx.xxx.182.[128-191]; > o2ib4(ib1) xxx.xxx.182.[192-255]" > routes=" \ > o2ib1 xxx.xxx.182.129 at o2ib3,xxx.xxx.182.193 at o2ib4" > > dmesg: > . > . > Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0 > . > . > > > Why don''t I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"?What did ''lctl list_nids'' show? It looked like only one NI was initialized. Isaac
Dardo D Kleiner - CONTRACTOR
2009-Nov-15 04:03 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
Isaac Huang wrote:> On Fri, Nov 13, 2009 at 03:34:14PM -0500, Dardo D Kleiner - CONTRACTOR wrote: >> Mellanox ConnectX MT25418, two ports, each connected to a separate >> IB fabric - ib0 and ib1 have distinct IP subnets, each connected >> to a separate Lustre router. >> ...... >> ip ad ls: >> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 >> inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0 >> 5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 >> inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1 >> >> /etc/modprobe.d/lustre: >> options lnet \ >> ip2nets=" \ >> o2ib1 xxx.xxx.[176-177].[0-255]; >> o2ib3(ib0) xxx.xxx.182.[128-191]; >> o2ib4(ib1) xxx.xxx.182.[192-255]" >> routes=" \ >> o2ib1 xxx.xxx.182.129 at o2ib3,xxx.xxx.182.193 at o2ib4" >> >> dmesg: >> . >> . >> Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0 >> . >> . >> >> >> Why don''t I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"? > > What did ''lctl list_nids'' show? It looked like only one NI was > initialized.Only the one o2ib3 NID was listed, I did check that. So its your belief that I should have two distinct NIDs here? Should I be able to route over multiple lnets? On systems that have two HCA''s I certainly do see multiple NIDs, this is the first system I''ve configured with one HCA that has two ports... The filesystem wouldn''t mount with this configuration, obviously. One other bit of information is that it also wouldn''t work if I only specified o2ib4(ib1), without the o2ib3(ib0) line (though now I realize I didn''t to try set the ko2iblnd ipif_name to ib1 in that test). It does work if I only have the o2ib3 lnet definition. - Dardo
Dardo D Kleiner - CONTRACTOR
2009-Nov-16 21:38 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
Stand down. Don''t know what was wrong with my configuration at first, but it does instantiate the two NIDs on the host with multiple ports on a single HCA. Unfortunately, LustreError: 17771:0:(router.c:464:lnet_check_routes()) Routes to o2ib1 via xxx.xxx.182.193 at o2ib4 and xxx.xxx.182.129 at o2ib3 not supported So I couldn''t have done what I wanted to anyway, the answer to my question below "Should I be able to route over multiple lnets?" is clearly no... - Dardo Dardo D Kleiner - CONTRACTOR wrote:> Isaac Huang wrote: >> On Fri, Nov 13, 2009 at 03:34:14PM -0500, Dardo D Kleiner - CONTRACTOR wrote: >>> Mellanox ConnectX MT25418, two ports, each connected to a separate >>> IB fabric - ib0 and ib1 have distinct IP subnets, each connected >>> to a separate Lustre router. >>> ...... >>> ip ad ls: >>> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 >>> inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0 >>> 5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096 >>> inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1 >>> >>> /etc/modprobe.d/lustre: >>> options lnet \ >>> ip2nets=" \ >>> o2ib1 xxx.xxx.[176-177].[0-255]; >>> o2ib3(ib0) xxx.xxx.182.[128-191]; >>> o2ib4(ib1) xxx.xxx.182.[192-255]" >>> routes=" \ >>> o2ib1 xxx.xxx.182.129 at o2ib3,xxx.xxx.182.193 at o2ib4" >>> >>> dmesg: >>> . >>> . >>> Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0 >>> . >>> . >>> >>> >>> Why don''t I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"? >> What did ''lctl list_nids'' show? It looked like only one NI was >> initialized. > > Only the one o2ib3 NID was listed, I did check that. So its your belief that > I should have two distinct NIDs here? Should I be able to route over multiple > lnets? On systems that have two HCA''s I certainly do see multiple NIDs, this > is the first system I''ve configured with one HCA that has two ports... > > The filesystem wouldn''t mount with this configuration, obviously. One other bit > of information is that it also wouldn''t work if I only specified o2ib4(ib1), > without the o2ib3(ib0) line (though now I realize I didn''t to try set the > ko2iblnd ipif_name to ib1 in that test). It does work if I only have the > o2ib3 lnet definition. > > - Dardo > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > >
Isaac Huang
2009-Nov-17 00:03 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
On Mon, Nov 16, 2009 at 04:38:03PM -0500, Dardo D Kleiner - CONTRACTOR wrote:> Stand down. Don''t know what was wrong with my configuration at first, > but it does instantiate the two NIDs on the host with multiple ports > on a single HCA. Unfortunately, > > LustreError: 17771:0:(router.c:464:lnet_check_routes()) Routes to o2ib1 via xxx.xxx.182.193 at o2ib4 and xxx.xxx.182.129 at o2ib3 not supportedIn fact, this limitation could be lifted. The reason it was there was that upper layers would rely on source NID in lnet messages to identify clients - i.e., it was assumed that messages from a same client would carry a same source NID in lnet message headers. It seems that it''s becoming an annoyance as multi-rail configurations grow more popular.> So I couldn''t have done what I wanted to anyway, the answer to my > question below "Should I be able to route over multiple lnets?" is > clearly no... > > - Dardo
Dardo D Kleiner - CONTRACTOR
2009-Nov-17 01:01 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
So are you suggesting I could just comment out the check in router.c? Isaac Huang wrote:> On Mon, Nov 16, 2009 at 04:38:03PM -0500, Dardo D Kleiner - CONTRACTOR wrote: >> Stand down. Don''t know what was wrong with my configuration at first, >> but it does instantiate the two NIDs on the host with multiple ports >> on a single HCA. Unfortunately, >> >> LustreError: 17771:0:(router.c:464:lnet_check_routes()) Routes to o2ib1 via xxx.xxx.182.193 at o2ib4 and xxx.xxx.182.129 at o2ib3 not supported > > In fact, this limitation could be lifted. The reason it was there was > that upper layers would rely on source NID in lnet messages to > identify clients - i.e., it was assumed that messages from a same > client would carry a same source NID in lnet message headers. > > It seems that it''s becoming an annoyance as multi-rail configurations > grow more popular. > >> So I couldn''t have done what I wanted to anyway, the answer to my >> question below "Should I be able to route over multiple lnets?" is >> clearly no... >> >> - Dardo > >
Isaac Huang
2009-Nov-17 01:06 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
On Mon, Nov 16, 2009 at 08:01:12PM -0500, Dardo D Kleiner - CONTRACTOR wrote:> So are you suggesting I could just comment out the check in router.c?That''s enough for lnet but Lustre changes must also be made. Isaac
Dardo D Kleiner - CONTRACTOR
2009-Nov-17 01:10 UTC
[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA
In the next hour - before SC''09 opens the doors? ;) Isaac Huang wrote:> On Mon, Nov 16, 2009 at 08:01:12PM -0500, Dardo D Kleiner - CONTRACTOR wrote: >> So are you suggesting I could just comment out the check in router.c? > > That''s enough for lnet but Lustre changes must also be made. > > Isaac > >