Eric Barton
2007-Dec-21 17:14 UTC
[Lustre-discuss] FW: faking IB multi-rail with multihomed clients
Guys, For those of you not party to the original email exchange, this is about how we can aggregate bandwidth across both rails of a dual-rail IB cluster using current lustre/LNET (i.e. before we have implemented transparant LNET support for failover and bandwidth aggregation across multiple networks). The following 2 points are fundamental - everything below is a direct consequence... 1. LNET is perfectly happy with multiple rails, but it doesn''t load balance over them - the rail actually used for any communication is determined by the peer NID. 2. Lustre always uses the same NID to talk to a given server from a given node. It choses the NID (a) with the fewest hops (to minimize routing) and (b) appearing first in the "networks" or "ip2nets" LNET configuration strings. Now consider a 2-rail IB cluster running the OFA stack (i.e. OFED) with the following IPoIB address assignments... ib0 ib1 Servers 192.168.0.* 192.168.1.* Clients 192.168.[2-127].* 192.168.[128-253].* ...here are some different configurations you could create... A. I''ve got many more clients than servers in my cluster. I don''t care if an individual client can''t get 2 rails of bandwidth because the servers are the actual bottleneck... ip2nets="o2ib0(ib0),o2ib1(ib1) 192.168.[0-1].* #all servers;\ o2ib0(ib0) 192.168.[2-253].[0-252/2]#even clients;\ o2ib1(ib1) 192.168.[2-253].[1-253/2]#odd clients" This configuration gives every server 2 NIDs, one on each network - and statically load balances clients between the rails. B. A single client must get 2 rails worth of bandwidth and I don''t care if the max aggregate bandwidth is only (# servers) * (1 rail)... ip2nets="o2ib0(ib0) 192.168.[0-1].[0-252/2]#even servers;\ o2ib1(ib1) 192.168.[0-1].[1-253/2]#odd servers;\ o2ib0(ib0),o2ib1(ib1) 192.168.[2-253].* #clients" This configuration gives every server a single NID on one rail or the other. Clients have a NID on both rails. C. I don''t care how many hoops I have to jump through, but I really want all my clients and all my servers to use both rails... ip2nets="o2ib0(ib0),o2ib2(ib1) 192.168.[0-1].[0-252/2] #even servers;\ o2ib1(ib0),o2ib3(ib1) 192.168.[0-1].[1-253/2] #odd servers;\ o2ib0(ib0),o2ib3(ib1) 192.168.[2-253].[0-252/2]#even clients;\ o2ib1(ib0),o2ib2(ib1) 192.168.[2-253].[1-253/2]#odd clients" This configuration includes 2 additional "fake" o2ib networks to work around lustre''s simplistic NID selection algorithm. It connects "even" clients to "even" servers with o2ib0 on rail0 and to "odd" servers with o2ib3 on rail1. Similarly it connects "odd" clients to "odd" servers with o2ib1 on rail0 and to "even" servers with o2ib2 on rail1. Hope this demystifies things :) Cheers, Eric