Ralf Utermann
2009-Jul-07 13:44 UTC
[Lustre-discuss] Bonded client interfaces and 10GbE server
Dear list, we have setup of OSS and some clients with a dual Gigabit trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4). If the clients stripe over targets on different OSS, they see a dual link bandwidth. If however, they stripe over targets on the same OSS, they only get the bandwith of one link. If I would attach the OSS with a single 10GbE link, could a client then use the second link, when striping over targets on same OSS? Regards, Ralf -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411
Isaac Huang
2009-Jul-07 15:44 UTC
[Lustre-discuss] Bonded client interfaces and 10GbE server
On Tue, Jul 07, 2009 at 03:44:32PM +0200, Ralf Utermann wrote:> Dear list, > > we have setup of OSS and some clients with a dual Gigabit > trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4).If I understand it correctly, xmit_hash_policy=layer3+4 would not allow a single TCP connection to span multiple slaves.> If the clients stripe over targets on different OSS, they see > a dual link bandwidth. If however, they stripe over targets on > the same OSS, they only get the bandwith of one link.Each client would create three TCP connections to an OSS, one for exchanging small control messages, one for incoming bulk messages, and one for outgoing bulk messages. The control connection could be ignored for bandwidth considerations. When you''re reading, only the incoming bulk connection on the client is in use, and when writing the outgoing bulk connection in use. Therefore, for read or write to a same server, any client would utilize only one of its slaves. I''d believe that you''d probably see better aggregate bandwidth when doing read and write simultaneously - the incoming and outgoing bulk connections should have different source ports and therefore they should be using different slaves.> If I would attach the OSS with a single 10GbE link, could > a client then use the second link, when striping over targets > on same OSS?There''s a rather complex way of static configuration to allow for better overall bandwidth (though between any single client and server there''s still one link in use): http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50401393_pgfId-1287958 Thanks, Isaac
Isaac Huang
2009-Jul-07 17:43 UTC
[Lustre-discuss] Bonded client interfaces and 10GbE server
On Tue, Jul 07, 2009 at 11:44:39AM -0400, Isaac Huang wrote:> ...... > > If I would attach the OSS with a single 10GbE link, could > > a client then use the second link, when striping over targets > > on same OSS? > > There''s a rather complex way of static configuration to allow for > better overall bandwidth (though between any single client and server > there''s still one link in use): > http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50401393_pgfId-1287958As an alternative, you might try the ksocklnd bonding on clients and servers, e.g.: options ksocklnd networks="tcp0(eth0, eth1)" Then ksocklnd would create two sets of connections (control, bulk in, and bulk out) and balance traffic over them. The downside is that it might take a long time for ksocklnd to notice a downed NIC and avoid it, and when the downed NIC comes back to life later the ksocklnd might not be able to use it again. Two more gotchas for ksocklnd bonding: 1. IP routing ultimately determines outgoing interfaces and must be configured properly. For example, if both eth0 and eth1 of clients and servers belong to a same IP subnet, all outgoing packets might be sent by a same NIC because the destination IP addresses, though different, belong to a same destination IP network. 2. All incoming messages might arrive on a same NIC. Please refer to linux-*/Documentation/networking/ip-sysctl.txt for arp_ignore. Thanks, Isaac
Klaus Steden
2009-Jul-07 22:00 UTC
[Lustre-discuss] Bonded client interfaces and 10GbE server
Hi Ralf, The specification for 802.3ad does not permit the striping of a single data path across multiple links, i.e. a single TCP/UDP conversation takes place with a lone physical interface, the TCP/IP stack does not split it apart so it can use multiple paths. If you use a single 10GigE link instead multiple GigE (assuming you have a fast performing NIC that supports RDMA), you would see Gbit+ throughput for a single conversation. However, both peers would need to be using 10GigE NICs. LACP bonding only provides more aggregate bandwidth over a given link, it does not double (triple, quadruple, etc.) to a single thread of communication without some application-specific or hardware-specific optimizations. hth, Klaus On 7/7/09 6:44 AM, "Ralf Utermann" <ralf.utermann at physik.uni-augsburg.de> etched on stone tablets:> Dear list, > > we have setup of OSS and some clients with a dual Gigabit > trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4). > If the clients stripe over targets on different OSS, they see > a dual link bandwidth. If however, they stripe over targets on > the same OSS, they only get the bandwith of one link. > > If I would attach the OSS with a single 10GbE link, could > a client then use the second link, when striping over targets > on same OSS? > > Regards, Ralf