We have added a few IB nodes to our cluster (about 70 our of 600 nodes). What would it take to have lustre go over IB as well as tcp for the rest of the hosts? I know we could use a router machine, but we could only provide maybe 2 gige ports which would be poor. So my questions: would only the oss need HCA''s? or does the MDS need to have hca''s also? It would be nice to have MDS traffic over TCP (fast enough for this user) and IO over IB. How does lustre figure out the preferred path? How can we have the nodes figure out IF I have IB talk to oss''s over IB else use TCP? Anything else to look out for? Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985
On Fri, 2008-10-10 at 11:08 -0400, Brock Palen wrote:> We have added a few IB nodes to our cluster (about 70 our of 600 nodes). > What would it take to have lustre go over IB as well as tcp for the > rest of the hosts?So I''m assuming that at least some of these IB nodes are servers (i.e. OSS) then.> would only the oss need HCA''s? or does the MDS need to have hca''s > also?No. There is no requirement that the MDS use IB just because (some) OSSes use it.> It would be nice to have MDS traffic over TCP (fast enough for > this user) and IO over IB.Fair enough.> How does lustre figure out the preferred path?An LNET node with multiple paths to another LNET node chooses the "best" path. How that decision is made, I''m not so sure, but I tend to think that o2iblnd will be preferred over socklnd.> How can we have the > nodes figure out IF I have IB talk to oss''s over IB else use TCP?Assuming you get the configuration right on the nodes, LNET will just do that using it''s "best path" algorithm. b.
On Oct 10, 2008, at 2:45 PM, Brian J. Murrell wrote:> On Fri, 2008-10-10 at 11:08 -0400, Brock Palen wrote: >> We have added a few IB nodes to our cluster (about 70 our of 600 >> nodes). >> What would it take to have lustre go over IB as well as tcp for the >> rest of the hosts? > > So I''m assuming that at least some of these IB nodes are servers (i.e. > OSS) then.Not right now, the question was because we were thinking abou tit> >> would only the oss need HCA''s? or does the MDS need to have hca''s >> also? > > No. There is no requirement that the MDS use IB just because (some) > OSSes use it.Really? So given that lnet does the best path and it is not part of lustre its self. So if we only hook some of the OSS by IB, is there a way to have a user (who is a user of IB) IO prefer the IB connected OSS''s. If that is not possible now, I think some of the patches announced that are for 1.8 or 2.0 had the ability to select a OSS for only given users. Am I correct?> >> It would be nice to have MDS traffic over TCP (fast enough for >> this user) and IO over IB. > > Fair enough. > >> How does lustre figure out the preferred path? > > An LNET node with multiple paths to another LNET node chooses the > "best" > path. How that decision is made, I''m not so sure, but I tend to think > that o2iblnd will be preferred over socklnd. > >> How can we have the >> nodes figure out IF I have IB talk to oss''s over IB else use TCP? > > Assuming you get the configuration right on the nodes, LNET will > just do > that using it''s "best path" algorithm. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
On Fri, 2008-10-10 at 14:56 -0400, Brock Palen wrote:> > Not right now, the question was because we were thinking abou titOK. In any case, I guess the point I was making is that some servers would need IB as well as the clients, or it would be pointless. Just to be absolutely clear.> Really?LNET configuration/routing is not (yet) one of my strong points, but I''m fairly sure, yes.> So given that lnet does the best path and it is not part of > lustre its self. > So if we only hook some of the OSS by IB, is there a way to have a > user (who is a user of IB) IO prefer the IB connected OSS''s.If you have a client which is connected to multiple networks (i.e. IB and TCP) LNET will use them both. You might have to poke LNET to do so using module parameters, but I think it will use both automatically. Regardless, once LNET has it''s list of interfaces and networks it routes requests accordingly depending on their destination. If a target (either an MDS or OSS) only has a TCP path that path will be used. If a target has more than one path the "best" path will be chosen. I tend to think o2iblnd trumps socklnd. Maybe if there is an LNET engineer reading he can give you more details on how this best path is chosen.> If that is not possible now, I think some of the patches announced > that are for 1.8 or 2.0 had the ability to select a OSS for only > given users. Am I correct?I think you are talking about OST pools. I''m not sure which release that is targeted for. b.
On Oct 10, 2008 15:09 -0400, Brian J. Murrell wrote:> Regardless, once LNET has it''s list of interfaces and networks it routes > requests accordingly depending on their destination. If a target > (either an MDS or OSS) only has a TCP path that path will be used. If a > target has more than one path the "best" path will be chosen. I tend to > think o2iblnd trumps socklnd. > > Maybe if there is an LNET engineer reading he can give you more details > on how this best path is chosen.I''m no LNET expert, but my understanding is: LNET will pick the network with the fewest LNET network hops (TCP network hops are not a consideration), and if they are equal then it will (AFAIK) pick the first network listed in the "networks" module option as the "best" network and use that exclusively until it fails. That means "for clients with IB, put the o2iblnd interface first and the TCP interface second" in the "lnet networks" module option. I think this is discussed in the Lustre manual already.> > If that is not possible now, I think some of the patches announced > > that are for 1.8 or 2.0 had the ability to select a OSS for only > > given users. Am I correct? > > I think you are talking about OST pools. I''m not sure which release > that is targeted for.While OST pools is going to be included into 1.8.0, it doesn''t have the policy engine yet, so the "pick a pool for UID X" doesn''t exist yet. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Oct 12, 2008, at 1:36 AM, Andreas Dilger wrote:> On Oct 10, 2008 15:09 -0400, Brian J. Murrell wrote: >> Regardless, once LNET has it''s list of interfaces and networks it >> routes >> requests accordingly depending on their destination. If a target >> (either an MDS or OSS) only has a TCP path that path will be >> used. If a >> target has more than one path the "best" path will be chosen. I >> tend to >> think o2iblnd trumps socklnd. >> >> Maybe if there is an LNET engineer reading he can give you more >> details >> on how this best path is chosen. > > I''m no LNET expert, but my understanding is: > > LNET will pick the network with the fewest LNET network hops (TCP > network > hops are not a consideration), and if they are equal then it will > (AFAIK) > pick the first network listed in the "networks" module option as > the "best" > network and use that exclusively until it fails.Currently we don''t put any lustre modules in modprobe.conf, lustre loads the correct modules when mounting the filesystem. We do this to keep our loads simple as we have several. It would be nice if LNET picked IB without being told. Similar to the way OpenMPI has network weights of which to try first.> > That means "for clients with IB, put the o2iblnd interface first > and the > TCP interface second" in the "lnet networks" module option. I > think this > is discussed in the Lustre manual already. > >>> If that is not possible now, I think some of the patches announced >>> that are for 1.8 or 2.0 had the ability to select a OSS for only >>> given users. Am I correct? >> >> I think you are talking about OST pools. I''m not sure which release >> that is targeted for. > > While OST pools is going to be included into 1.8.0, it doesn''t have > the policy engine yet, so the "pick a pool for UID X" doesn''t exist > yet. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
On Sun, Oct 12, 2008 at 10:15:01AM -0400, Brock Palen wrote:> ...... > Currently we don''t put any lustre modules in modprobe.conf, lustre > loads the correct modules when mounting the filesystem. We do this > to keep our loads simple as we have several.When nothing has been specified, LNet by default loads the ksocklnd, which in turn by default uses the 1st usable interface returned by SIOCGIFCONF.> It would be nice if LNET picked IB without being told. Similar to > the way OpenMPI has network weights of which to try first.No such automatic mechanism exists. The LNet NIs can only be specified statically via module options (''networks'' or ''ip2nets''). As to choice of path for multi-homed LNet, the decision is solely based on the destination NID. If the NID belongs to a local network (e.g. 10.0.0.1 at o2ib0 is on my local network @o2ib0 if I have a NI in @o2ib0 too), traffic would go through the local NI. If the NID is on a remote network (e.g. 3 at ptl0 if I don''t have a NI in @ptl0), a router would be picked out among available routes based on load already queued on routers, and the local NI to that router would be used for outgoing traffic (e.g. the NI in @tcp0 would be used if 192.168.0.1 at tcp0 is the router chosen). In other words, the LNet path from a multi-homed client to a multi-homed server is determined by the server NID. For example, if both the client and the server are on @tcp0 and @o2ib0, the client would choose IB network if the server NID is in @o2ib0, and TCP network otherwise. The server NID used by Lustre clients should somehow come from the MGS but I''m not sure about it. LNet has no knowledge about whether a peer is multi-homed, so it couldn''t figure out that the IB network is a better path to reach a peer in @tcp. Isaac
On Sun, Oct 12, 2008 at 10:15:01AM -0400, Brock Palen wrote:>> ...... >> Currently we don''t put any lustre modules in modprobe.conf, lustre >> loads the correct modules when mounting the filesystem. We do this >> to keep our loads simple as we have several. > >When nothing has been specified, LNet by default loads the ksocklnd, >which in turn by default uses the 1st usable interface returned by >SIOCGIFCONF. > >> It would be nice if LNET picked IB without being told. Similar to >> the way OpenMPI has network weights of which to try first. > >No such automatic mechanism exists. The LNet NIs can only be specified >statically via module options (''networks'' or ''ip2nets''). > >As to choice of path for multi-homed LNet, the decision is solely >based on the destination NID. If the NID belongs to a local network > (e.g. 10.0.0.1 at o2ib0 is on my local network @o2ib0 if I have a NI in >@o2ib0 too), traffic would go through the local NI. If the NID is on a >remote network (e.g. 3 at ptl0 if I don''t have a NI in @ptl0), a router >would be picked out among available routes based on load already >queued on routers, and the local NI to that router would be used for >outgoing traffic (e.g. the NI in @tcp0 would be used if >192.168.0.1 at tcp0 is the router chosen). >So if I had an MGS/MDT with 1 nic with say 4 nids on different subnets and specified tpc0 at subnet1, tcp1 at subnet2, etc in modprobe.conf, could I then, on an OSS with 4 nics with 1 ip on each subnet per nic, format each OST with parameters to force it to use a certain local nic to connect to the mgs/mdt ?>In other words, the LNet path from a multi-homed client to a >multi-homed server is determined by the server NID. For example, if >both the client and the server are on @tcp0 and @o2ib0, the client >would choose IB network if the server NID is in @o2ib0, and TCP >network otherwise. The server NID used by Lustre clients should >somehow come from the MGS but I''m not sure about it. LNet has no >knowledge about whether a peer is multi-homed, so it couldn''t figure >out that the IB network is a better path to reach a peer in @tcp. >I assume I could do the same with clients to reach the servers and force the clients to use certain nics as well...correct?>IsaacRobert The information contained in this message and its attachments is intended only for the private and confidential use of the intended recipient(s). If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e- mail is strictly prohibited.