Michael Ruepp
2009-May-07 12:50 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Hi there, I am configured a simple tcp lustre 1.8 with one mdc (one nic) and two oss (four nic per oss) As well as in the 1.6 documentation, the multihomed sections is a little bit unclear to me. I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 and 10.111.20.39-42 oss1 Do I have to make modprobe.conf.local look like this to force lustre to use all four interfaces parallel: options lnet networks=tcp0(eth0,eth1,eth2,eth3) Because on Page 138 the 1.8 Manual says: "Note ? In the case of TCP-only clients, the first available non- loopback IP interface is used for tcp0 since the interfaces are not specified. " or do I have to specify it like this: options lnet networks=tcp Because on Page 112 the lustre 1.6 Manual says: "Note ? In the case of TCP-only clients, all available IP interfaces are used for tcp0 since the interfaces are not specified. If there is more than one, the IP of the first one found is used to construct the tcp0 ID." Which is the opposite of the 1.8 Manual My goal ist to let lustre utilize all four Gb Links parallel. And my Lustre Clients are equipped with two Gb links which should be utilized by the lustre clients as well (eth0, eth1) Or is bonding the better solution in terms of performance? Thanks very much for input, Michael Ruepp Schwarzfilm AG
Isaac Huang
2009-May-07 19:57 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote:> Hi there, > ...... > I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 > and 10.111.20.39-42 oss1 > > Do I have to make modprobe.conf.local look like this to force lustre > to use all four interfaces parallel: > > options lnet networks=tcp0(eth0,eth1,eth2,eth3) > Because on Page 138 the 1.8 Manual says: > "Note ? In the case of TCP-only clients, the first available non- > loopback IP interface > is used for tcp0 since the interfaces are not specified. "Correct.> or do I have to specify it like this: > options lnet networks=tcp > Because on Page 112 the lustre 1.6 Manual says: > "Note ? In the case of TCP-only clients, all available IP interfaces > are used for tcp0Wrong. It needs to be updated as well, Sheila?> ...... > My goal ist to let lustre utilize all four Gb Links parallel. And my > Lustre Clients are equipped with two Gb links which should be utilized > by the lustre clients as well (eth0, eth1) > > Or is bonding the better solution in terms of performance?I don''t have any performance comparisons between the two approaches, but I''d suggest to go with Linux bonding instead (let''s call the tcp0(eth0,...ethN) approach Lustre bonding), because: 1. With Lustre bonding it''s rather tricky to get routing right, especially when all NICs reside in a same IP subnet. Lustre tcp network driver, as its name suggests, works at TCP layer and the decision as to which outgoing interface to use depends on Linux IP layer routing. When all NICs live in a same IP subnet, it''s very possible that all outgoing packets would go through the interface of the 1st route in the Linux routing table, unless some tweaking has been done to also take source IPs into account. Incoming packets could also come in via unexpected NICs, depending on your settings in /proc/sys/net/ipv4/conf/*/arp_ignore and your ethernet topology. 2. Linux bonding does a good job of detecting link status via either the ARP monitor or the MII monitor, but no such mechanism exists in Lustre bonding. In fact, the Lustre bonding is an officially obsoleted feature if I remember correctly. Thanks, Isaac
Klaus Steden
2009-May-07 22:02 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Hi Michael, Just want to throw my two cents in with Isaac''s posting, as I spent a great deal of time working with these kinds of features over the course of the last two years. In my experience with Lustre 1.6, in the case where multiple NICs were available, Lustre will default to using the first one exclusively until it detects a failure and then switches over to the next available. It will also not distinguish between different NIC types, i.e. IB, GigE, etc., will be picked based on discovery order not speed or some other metric. I didn''t even touch Lustre bonding, because as you both remark, it''s a little convoluted. I spent a lot of time experimenting with Lustre over 802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS nodes produced very respectable to very good numbers. Across a pair of OSS nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed when running sandbox tests, so it appears that although the LACP driver doesn''t balance a connection across multiple links (i.e. a 2 GigE LACP bond doesn''t give you 2 Gbit throughput for a single network I/O), the Lustre implementation somehow manages to squeeze more data through the pipe. To get it set up, simply configure NIC bonding of whatever flavour suits your needs on the OS nodes, and then assign ''bond0'' to your tcp networks, something like this: options lnet networks=tcp0(bond0) and you should be off to the races. hth, Klaus On 5/7/09 12:57 PM, "Isaac Huang" <He.Huang at Sun.COM> etched on stone tablets:> On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote: >> Hi there, >> ...... >> I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 >> and 10.111.20.39-42 oss1 >> >> Do I have to make modprobe.conf.local look like this to force lustre >> to use all four interfaces parallel: >> >> options lnet networks=tcp0(eth0,eth1,eth2,eth3) >> Because on Page 138 the 1.8 Manual says: >> "Note ? In the case of TCP-only clients, the first available non- >> loopback IP interface >> is used for tcp0 since the interfaces are not specified. " > > Correct. > >> or do I have to specify it like this: >> options lnet networks=tcp >> Because on Page 112 the lustre 1.6 Manual says: >> "Note ? In the case of TCP-only clients, all available IP interfaces >> are used for tcp0 > > Wrong. It needs to be updated as well, Sheila? > >> ...... >> My goal ist to let lustre utilize all four Gb Links parallel. And my >> Lustre Clients are equipped with two Gb links which should be utilized >> by the lustre clients as well (eth0, eth1) >> >> Or is bonding the better solution in terms of performance? > > I don''t have any performance comparisons between the two approaches, > but I''d suggest to go with Linux bonding instead (let''s call the > tcp0(eth0,...ethN) approach Lustre bonding), because: > 1. With Lustre bonding it''s rather tricky to get routing right, > especially when all NICs reside in a same IP subnet. Lustre tcp > network driver, as its name suggests, works at TCP layer and the > decision as to which outgoing interface to use depends on Linux IP > layer routing. When all NICs live in a same IP subnet, it''s very > possible that all outgoing packets would go through the interface of > the 1st route in the Linux routing table, unless some tweaking has > been done to also take source IPs into account. Incoming packets could > also come in via unexpected NICs, depending on your settings in > /proc/sys/net/ipv4/conf/*/arp_ignore and your ethernet topology. > > 2. Linux bonding does a good job of detecting link status via either > the ARP monitor or the MII monitor, but no such mechanism exists in > Lustre bonding. > > In fact, the Lustre bonding is an officially obsoleted feature if I > remember correctly. > > Thanks, > Isaac > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Isaac Huang
2009-May-08 00:20 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On Thu, May 07, 2009 at 03:02:49PM -0700, Klaus Steden wrote:> ...... > I didn''t even touch Lustre bonding, because as you both remark, it''s a > little convoluted. I spent a lot of time experimenting with Lustre over > 802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS > nodes produced very respectable to very good numbers. Across a pair of OSS > nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed > when running sandbox tests, so it appears that although the LACP driver > doesn''t balance a connection across multiple links (i.e. a 2 GigE LACP bond > doesn''t give you 2 Gbit throughput for a single network I/O), the Lustre > implementation somehow manages to squeeze more data through the pipe.Probably because the Lustre TCP driver creates multiple connections between two end points, for different types of data. Thanks, Isaac
Mag Gam
2009-May-09 15:07 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
I second the responses. Go with Native OS bonding, Linux in this case. Makes life so much easier... Good luck On Thu, May 7, 2009 at 8:20 PM, Isaac Huang <He.Huang at sun.com> wrote:> On Thu, May 07, 2009 at 03:02:49PM -0700, Klaus Steden wrote: >> ...... >> I didn''t even touch Lustre bonding, because as you both remark, it''s a >> little convoluted. I spent a lot of time experimenting with Lustre over >> 802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS >> nodes produced very respectable to very good numbers. Across a pair of OSS >> nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed >> when running sandbox tests, so it appears that although the LACP driver >> doesn''t balance a connection across multiple links (i.e. a 2 GigE LACP bond >> doesn''t give you 2 Gbit throughput for a single network I/O), the Lustre >> implementation somehow manages to squeeze more data through the pipe. > > Probably because the Lustre TCP driver creates multiple connections > between two end points, for different types of data. > > Thanks, > Isaac > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Arden Wiebe
2009-May-09 16:18 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Michael, This might help answer some questions. http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not tuned OSS and OST''s pulling 400+MiB/s over TCP Bonding provided by the kernel complete with a cat of the modeprobe.conf file. You have the other links I''ve sent you but the picture above is relevant to your questions. Arden --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> wrote:> From: Michael Ruepp <michael at schwarzfilm.ch> > Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 > To: lustre-discuss at lists.lustre.org > Date: Thursday, May 7, 2009, 5:50 AM > Hi there, > > I am configured a simple tcp lustre 1.8 with one mdc (one > nic) and two? > oss (four nic per oss) > As well as in the 1.6 documentation, the multihomed > sections is a? > little bit unclear to me. > > I give every NID a IP in the same subnet, eg: > 10.111.20.35-38 - oss0? > and 10.111.20.39-42 oss1 > > Do I have to make modprobe.conf.local look like this to > force lustre? > to use all four interfaces parallel: > > options lnet networks=tcp0(eth0,eth1,eth2,eth3) > Because on Page 138 the 1.8 Manual says: > "Note ? In the case of TCP-only clients, the first > available non- > loopback IP interface > is used for tcp0 since the interfaces are not specified. " > > or do I have to specify it like this: > options lnet networks=tcp > Because on Page 112 the lustre 1.6 Manual says: > "Note ? In the case of TCP-only clients, all available IP > interfaces? > are used for tcp0 > since the interfaces are not specified. If there is more > than one, the? > IP of the first one > found is used to construct the tcp0 ID." > > Which is the opposite of the 1.8 Manual > > My goal ist to let lustre utilize all four Gb Links > parallel. And my? > Lustre Clients are equipped with two Gb links which should > be utilized? > by the lustre clients as well (eth0, eth1) > > Or is bonding the better solution in terms of performance? > > Thanks very much for input, > > Michael Ruepp > Schwarzfilm AG > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Andreas Dilger
2009-May-09 18:31 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On May 09, 2009 09:18 -0700, Arden Wiebe wrote:> This might help answer some questions. > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not > tuned OSS and OST''s pulling 400+MiB/s over TCP Bonding provided by the > kernel complete with a cat of the modeprobe.conf file. You have the other > links I''ve sent you but the picture above is relevant to your questions.Arden, thanks for sharing this info. Any chance you could post it to wiki.lustre.org? It would seem there is one bit of info missing somewhere - how does bond0 know which interfaces to use? Also, another oddity - the network monitor is showing 450MiB/s Received, yet the disk is showing only about 170MiB/s going to the disk. Either something is wacky with the monitoring (e.g. it is counting Received for both the eth* networks AND bond0), or Lustre is doing something very wierd and retransmitting the bulk data like crazy (seems unlikely).> --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> wrote: > > > From: Michael Ruepp <michael at schwarzfilm.ch> > > Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 > > To: lustre-discuss at lists.lustre.org > > Date: Thursday, May 7, 2009, 5:50 AM > > Hi there, > > > > I am configured a simple tcp lustre 1.8 with one mdc (one > > nic) and two? > > oss (four nic per oss) > > As well as in the 1.6 documentation, the multihomed > > sections is a? > > little bit unclear to me. > > > > I give every NID a IP in the same subnet, eg: > > 10.111.20.35-38 - oss0? > > and 10.111.20.39-42 oss1 > > > > Do I have to make modprobe.conf.local look like this to > > force lustre? > > to use all four interfaces parallel: > > > > options lnet networks=tcp0(eth0,eth1,eth2,eth3) > > Because on Page 138 the 1.8 Manual says: > > "Note ? In the case of TCP-only clients, the first > > available non- > > loopback IP interface > > is used for tcp0 since the interfaces are not specified. " > > > > or do I have to specify it like this: > > options lnet networks=tcp > > Because on Page 112 the lustre 1.6 Manual says: > > "Note ? In the case of TCP-only clients, all available IP > > interfaces? > > are used for tcp0 > > since the interfaces are not specified. If there is more > > than one, the? > > IP of the first one > > found is used to construct the tcp0 ID." > > > > Which is the opposite of the 1.8 Manual > > > > My goal ist to let lustre utilize all four Gb Links > > parallel. And my? > > Lustre Clients are equipped with two Gb links which should > > be utilized? > > by the lustre clients as well (eth0, eth1) > > > > Or is bonding the better solution in terms of performance? > > > > Thanks very much for input, > > > > Michael Ruepp > > Schwarzfilm AG > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Arden Wiebe
2009-May-10 04:15 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. I''ll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I''ll grab some new screenshots and post the relevant items to the wiki. Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo --- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com> wrote:> From: Andreas Dilger <adilger at sun.com> > Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 > To: "Arden Wiebe" <albert682 at yahoo.com> > Cc: lustre-discuss at lists.lustre.org, "Michael Ruepp" <michael at schwarzfilm.ch> > Date: Saturday, May 9, 2009, 11:31 AM > On May 09, 2009? 09:18 -0700, > Arden Wiebe wrote: > > This might help answer some questions. > > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows > my mostly not > > tuned OSS and OST''s pulling 400+MiB/s over TCP Bonding > provided by the > > kernel complete with a cat of the modeprobe.conf > file.? You have the other > > links I''ve sent you but the picture above is relevant > to your questions. > > Arden, thanks for sharing this info.? Any chance you > could post it to > wiki.lustre.org?? It would seem there is one bit of > info missing somewhere - > how does bond0 know which interfaces to use? > > > Also, another oddity - the network monitor is showing > 450MiB/s Received, > yet the disk is showing only about 170MiB/s going to the > disk.? Either > something is wacky with the monitoring (e.g. it is counting > Received for > both the eth* networks AND bond0), or Lustre is doing > something very > wierd and retransmitting the bulk data like crazy (seems > unlikely). > > > > --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> > wrote: > > > > > From: Michael Ruepp <michael at schwarzfilm.ch> > > > Subject: [Lustre-discuss] tcp network load > balancing understanding lustre 1.8 > > > To: lustre-discuss at lists.lustre.org > > > Date: Thursday, May 7, 2009, 5:50 AM > > > Hi there, > > > > > > I am configured a simple tcp lustre 1.8 with one > mdc (one > > > nic) and two? > > > oss (four nic per oss) > > > As well as in the 1.6 documentation, the > multihomed > > > sections is a? > > > little bit unclear to me. > > > > > > I give every NID a IP in the same subnet, eg: > > > 10.111.20.35-38 - oss0? > > > and 10.111.20.39-42 oss1 > > > > > > Do I have to make modprobe.conf.local look like > this to > > > force lustre? > > > to use all four interfaces parallel: > > > > > > options lnet networks=tcp0(eth0,eth1,eth2,eth3) > > > Because on Page 138 the 1.8 Manual says: > > > "Note ? In the case of TCP-only clients, the > first > > > available non- > > > loopback IP interface > > > is used for tcp0 since the interfaces are not > specified. " > > > > > > or do I have to specify it like this: > > > options lnet networks=tcp > > > Because on Page 112 the lustre 1.6 Manual says: > > > "Note ? In the case of TCP-only clients, all > available IP > > > interfaces? > > > are used for tcp0 > > > since the interfaces are not specified. If there > is more > > > than one, the? > > > IP of the first one > > > found is used to construct the tcp0 ID." > > > > > > Which is the opposite of the 1.8 Manual > > > > > > My goal ist to let lustre utilize all four Gb > Links > > > parallel. And my? > > > Lustre Clients are equipped with two Gb links > which should > > > be utilized? > > > by the lustre clients as well (eth0, eth1) > > > > > > Or is bonding the better solution in terms of > performance? > > > > > > Thanks very much for input, > > > > > > Michael Ruepp > > > Schwarzfilm AG > > > > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at lists.lustre.org > > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > >? ? ??? > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >
Mag Gam
2009-May-10 12:48 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Thanks for the screen shot Arden. What is the maximum # of slaves you can have on a bonded interface? On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com> wrote:> > Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. ?The manual is fairly clear on that. > > In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. ?Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. > > In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target > > To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html > > Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason. > > I''ll be taking a run at upgrading everything to 1.8 in the coming week or so and when I do I''ll grab some new screenshots and post the relevant items to the wiki. ?Otherwise if someone else wants to post the existing screenshots your welcome to use them as they do detail a ground up build. Apparently 1.8 is great with small files now so it should work even better with www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo > > > --- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com> wrote: > >> From: Andreas Dilger <adilger at sun.com> >> Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 >> To: "Arden Wiebe" <albert682 at yahoo.com> >> Cc: lustre-discuss at lists.lustre.org, "Michael Ruepp" <michael at schwarzfilm.ch> >> Date: Saturday, May 9, 2009, 11:31 AM >> On May 09, 2009? 09:18 -0700, >> Arden Wiebe wrote: >> > This might help answer some questions. >> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows >> my mostly not >> > tuned OSS and OST''s pulling 400+MiB/s over TCP Bonding >> provided by the >> > kernel complete with a cat of the modeprobe.conf >> file.? You have the other >> > links I''ve sent you but the picture above is relevant >> to your questions. >> >> Arden, thanks for sharing this info.? Any chance you >> could post it to >> wiki.lustre.org?? It would seem there is one bit of >> info missing somewhere - >> how does bond0 know which interfaces to use? >> >> >> Also, another oddity - the network monitor is showing >> 450MiB/s Received, >> yet the disk is showing only about 170MiB/s going to the >> disk.? Either >> something is wacky with the monitoring (e.g. it is counting >> Received for >> both the eth* networks AND bond0), or Lustre is doing >> something very >> wierd and retransmitting the bulk data like crazy (seems >> unlikely). >> >> >> > --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> >> wrote: >> > >> > > From: Michael Ruepp <michael at schwarzfilm.ch> >> > > Subject: [Lustre-discuss] tcp network load >> balancing understanding lustre 1.8 >> > > To: lustre-discuss at lists.lustre.org >> > > Date: Thursday, May 7, 2009, 5:50 AM >> > > Hi there, >> > > >> > > I am configured a simple tcp lustre 1.8 with one >> mdc (one >> > > nic) and two >> > > oss (four nic per oss) >> > > As well as in the 1.6 documentation, the >> multihomed >> > > sections is a >> > > little bit unclear to me. >> > > >> > > I give every NID a IP in the same subnet, eg: >> > > 10.111.20.35-38 - oss0 >> > > and 10.111.20.39-42 oss1 >> > > >> > > Do I have to make modprobe.conf.local look like >> this to >> > > force lustre >> > > to use all four interfaces parallel: >> > > >> > > options lnet networks=tcp0(eth0,eth1,eth2,eth3) >> > > Because on Page 138 the 1.8 Manual says: >> > > "Note ? In the case of TCP-only clients, the >> first >> > > available non- >> > > loopback IP interface >> > > is used for tcp0 since the interfaces are not >> specified. " >> > > >> > > or do I have to specify it like this: >> > > options lnet networks=tcp >> > > Because on Page 112 the lustre 1.6 Manual says: >> > > "Note ? In the case of TCP-only clients, all >> available IP >> > > interfaces >> > > are used for tcp0 >> > > since the interfaces are not specified. If there >> is more >> > > than one, the >> > > IP of the first one >> > > found is used to construct the tcp0 ID." >> > > >> > > Which is the opposite of the 1.8 Manual >> > > >> > > My goal ist to let lustre utilize all four Gb >> Links >> > > parallel. And my >> > > Lustre Clients are equipped with two Gb links >> which should >> > > be utilized >> > > by the lustre clients as well (eth0, eth1) >> > > >> > > Or is bonding the better solution in terms of >> performance? >> > > >> > > Thanks very much for input, >> > > >> > > Michael Ruepp >> > > Schwarzfilm AG >> > > >> > > >> > > _______________________________________________ >> > > Lustre-discuss mailing list >> > > Lustre-discuss at lists.lustre.org >> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > >> > >> > >> > >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Arden Wiebe
2009-May-10 13:12 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Mag, your welcome. From the page referenced first for a search for Linux Bonding it states: How many bonding devices can I have? There is no limit. How many slaves can a bonding device have? This is limited only by the number of network interfaces Linux supports and/or the number of network cards you can place in your system. --- On Sun, 5/10/09, Mag Gam <magawake at gmail.com> wrote:> From: Mag Gam <magawake at gmail.com> > Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8 > To: "Arden Wiebe" <albert682 at yahoo.com> > Cc: "Andreas Dilger" <adilger at sun.com>, "Michael Ruepp" <michael at schwarzfilm.ch>, lustre-discuss at lists.lustre.org > Date: Sunday, May 10, 2009, 5:48 AM > Thanks for the screen shot Arden. > > What is the maximum # of slaves you can have on a bonded > interface? > > > > On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com> > wrote: > > > > Bond0 knows which interface to utilize because all the > other eth0-5 are designated as slaves in their configuration > files. ?The manual is fairly clear on that. > > > > In the screenshot the memory used in gnome system > monitor is at 452.4 MiB of 7.8 GiB and the sustained > bandwidth to the OSS and OST is 404.2 MiB/s which > corresponds roughly to what collectl is showing for KBWrite > for Disks. ?Collectl shows a few different results for > Disks, Network and Lustre OST and I believe it to be > measuring the other OST on the network around 170MiB/s if > you view the other screenshot for OST1 or lustrethree. > > > > In the screenshots Lustreone=MGS Lustretwo=MDT > Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target > > > > To help clarify the entire network and stress testing > I did with all the clients I could give it is at > www.ioio.ca/Lustre-tcp-bonding/images/html and > www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html > > > > Proper benchmarking would be nice though as I just hit > it with everything I could and it lived so I was happy. I > found the manual to be lacking in benchmarking and really > wanted to make nice graphs of it all but failed with iozone > to do so for some reason. > > > > I''ll be taking a run at upgrading everything to 1.8 in > the coming week or so and when I do I''ll grab some new > screenshots and post the relevant items to the wiki. > ?Otherwise if someone else wants to post the existing > screenshots your welcome to use them as they do detail a > ground up build. Apparently 1.8 is great with small files > now so it should work even better with > www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo > > > > > > --- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com> > wrote: > > > >> From: Andreas Dilger <adilger at sun.com> > >> Subject: Re: [Lustre-discuss] tcp network load > balancing understanding lustre 1.8 > >> To: "Arden Wiebe" <albert682 at yahoo.com> > >> Cc: lustre-discuss at lists.lustre.org, > "Michael Ruepp" <michael at schwarzfilm.ch> > >> Date: Saturday, May 9, 2009, 11:31 AM > >> On May 09, 2009? 09:18 -0700, > >> Arden Wiebe wrote: > >> > This might help answer some questions. > >> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows > >> my mostly not > >> > tuned OSS and OST''s pulling 400+MiB/s over > TCP Bonding > >> provided by the > >> > kernel complete with a cat of the > modeprobe.conf > >> file.? You have the other > >> > links I''ve sent you but the picture above is > relevant > >> to your questions. > >> > >> Arden, thanks for sharing this info.? Any chance > you > >> could post it to > >> wiki.lustre.org?? It would seem there is one bit > of > >> info missing somewhere - > >> how does bond0 know which interfaces to use? > >> > >> > >> Also, another oddity - the network monitor is > showing > >> 450MiB/s Received, > >> yet the disk is showing only about 170MiB/s going > to the > >> disk.? Either > >> something is wacky with the monitoring (e.g. it is > counting > >> Received for > >> both the eth* networks AND bond0), or Lustre is > doing > >> something very > >> wierd and retransmitting the bulk data like crazy > (seems > >> unlikely). > >> > >> > >> > --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> > >> wrote: > >> > > >> > > From: Michael Ruepp <michael at schwarzfilm.ch> > >> > > Subject: [Lustre-discuss] tcp network > load > >> balancing understanding lustre 1.8 > >> > > To: lustre-discuss at lists.lustre.org > >> > > Date: Thursday, May 7, 2009, 5:50 AM > >> > > Hi there, > >> > > > >> > > I am configured a simple tcp lustre 1.8 > with one > >> mdc (one > >> > > nic) and two > >> > > oss (four nic per oss) > >> > > As well as in the 1.6 documentation, > the > >> multihomed > >> > > sections is a > >> > > little bit unclear to me. > >> > > > >> > > I give every NID a IP in the same > subnet, eg: > >> > > 10.111.20.35-38 - oss0 > >> > > and 10.111.20.39-42 oss1 > >> > > > >> > > Do I have to make modprobe.conf.local > look like > >> this to > >> > > force lustre > >> > > to use all four interfaces parallel: > >> > > > >> > > options lnet > networks=tcp0(eth0,eth1,eth2,eth3) > >> > > Because on Page 138 the 1.8 Manual > says: > >> > > "Note ? In the case of TCP-only > clients, the > >> first > >> > > available non- > >> > > loopback IP interface > >> > > is used for tcp0 since the interfaces > are not > >> specified. " > >> > > > >> > > or do I have to specify it like this: > >> > > options lnet networks=tcp > >> > > Because on Page 112 the lustre 1.6 > Manual says: > >> > > "Note ? In the case of TCP-only > clients, all > >> available IP > >> > > interfaces > >> > > are used for tcp0 > >> > > since the interfaces are not specified. > If there > >> is more > >> > > than one, the > >> > > IP of the first one > >> > > found is used to construct the tcp0 > ID." > >> > > > >> > > Which is the opposite of the 1.8 Manual > >> > > > >> > > My goal ist to let lustre utilize all > four Gb > >> Links > >> > > parallel. And my > >> > > Lustre Clients are equipped with two Gb > links > >> which should > >> > > be utilized > >> > > by the lustre clients as well (eth0, > eth1) > >> > > > >> > > Or is bonding the better solution in > terms of > >> performance? > >> > > > >> > > Thanks very much for input, > >> > > > >> > > Michael Ruepp > >> > > Schwarzfilm AG > >> > > > >> > > > >> > > > _______________________________________________ > >> > > Lustre-discuss mailing list > >> > > Lustre-discuss at lists.lustre.org > >> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > > >> > > >> > > >> > > >> > > _______________________________________________ > >> > Lustre-discuss mailing list > >> > Lustre-discuss at lists.lustre.org > >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > >> Cheers, Andreas > >> -- > >> Andreas Dilger > >> Sr. Staff Engineer, Lustre Group > >> Sun Microsystems of Canada, Inc. > >> > >> > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > >
Kevin Van Maren
2009-May-10 14:04 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On May 10, 2009, at 7:12 AM, Arden Wiebe <albert682 at yahoo.com> wrote:> > Mag, your welcome. From the page referenced first for a search for > Linux Bonding it states: > > How many bonding devices can I have? > > There is no limit. > How many slaves can a bonding device have? > > This is limited only by the number of network interfaces Linux > supports and/or the number of network cards you can place in your > system.In practice, most configurations are limited to the (typical) 4 or 8 maximum supported by the switch you are using.> --- On Sun, 5/10/09, Mag Gam <magawake at gmail.com> wrote: > >> From: Mag Gam <magawake at gmail.com> >> Subject: Re: [Lustre-discuss] tcp network load balancing >> understanding lustre 1.8 >> To: "Arden Wiebe" <albert682 at yahoo.com> >> Cc: "Andreas Dilger" <adilger at sun.com>, "Michael Ruepp" <michael at schwarzfilm.ch >> >, lustre-discuss at lists.lustre.org >> Date: Sunday, May 10, 2009, 5:48 AM >> Thanks for the screen shot Arden. >> >> What is the maximum # of slaves you can have on a bonded >> interface? >> >> >> >> On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com> >> wrote: >>> >>> Bond0 knows which interface to utilize because all the >> other eth0-5 are designated as slaves in their configuration >> files. The manual is fairly clear on that. >>> >>> In the screenshot the memory used in gnome system >> monitor is at 452.4 MiB of 7.8 GiB and the sustained >> bandwidth to the OSS and OST is 404.2 MiB/s which >> corresponds roughly to what collectl is showing for KBWrite >> for Disks. Collectl shows a few different results for >> Disks, Network and Lustre OST and I believe it to be >> measuring the other OST on the network around 170MiB/s if >> you view the other screenshot for OST1 or lustrethree. >>> >>> In the screenshots Lustreone=MGS Lustretwo=MDT >> Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target >>> >>> To help clarify the entire network and stress testing >> I did with all the clients I could give it is at >> www.ioio.ca/Lustre-tcp-bonding/images/html and >> www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html >>> >>> Proper benchmarking would be nice though as I just hit >> it with everything I could and it lived so I was happy. I >> found the manual to be lacking in benchmarking and really >> wanted to make nice graphs of it all but failed with iozone >> to do so for some reason. >>> >>> I''ll be taking a run at upgrading everything to 1.8 in >> the coming week or so and when I do I''ll grab some new >> screenshots and post the relevant items to the wiki. >> Otherwise if someone else wants to post the existing >> screenshots your welcome to use them as they do detail a >> ground up build. Apparently 1.8 is great with small files >> now so it should work even better with >> www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo >>> >>> >>> --- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com> >> wrote: >>> >>>> From: Andreas Dilger <adilger at sun.com> >>>> Subject: Re: [Lustre-discuss] tcp network load >> balancing understanding lustre 1.8 >>>> To: "Arden Wiebe" <albert682 at yahoo.com> >>>> Cc: lustre-discuss at lists.lustre.org, >> "Michael Ruepp" <michael at schwarzfilm.ch> >>>> Date: Saturday, May 9, 2009, 11:31 AM >>>> On May 09, 2009 09:18 -0700, >>>> Arden Wiebe wrote: >>>>> This might help answer some questions. >>>>> http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows >>>> my mostly not >>>>> tuned OSS and OST''s pulling 400+MiB/s over >> TCP Bonding >>>> provided by the >>>>> kernel complete with a cat of the >> modeprobe.conf >>>> file. You have the other >>>>> links I''ve sent you but the picture above is >> relevant >>>> to your questions. >>>> >>>> Arden, thanks for sharing this info. Any chance >> you >>>> could post it to >>>> wiki.lustre.org? It would seem there is one bit >> of >>>> info missing somewhere - >>>> how does bond0 know which interfaces to use? >>>> >>>> >>>> Also, another oddity - the network monitor is >> showing >>>> 450MiB/s Received, >>>> yet the disk is showing only about 170MiB/s going >> to the >>>> disk. Either >>>> something is wacky with the monitoring (e.g. it is >> counting >>>> Received for >>>> both the eth* networks AND bond0), or Lustre is >> doing >>>> something very >>>> wierd and retransmitting the bulk data like crazy >> (seems >>>> unlikely). >>>> >>>> >>>>> --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch> >>>> wrote: >>>>> >>>>>> From: Michael Ruepp <michael at schwarzfilm.ch> >>>>>> Subject: [Lustre-discuss] tcp network >> load >>>> balancing understanding lustre 1.8 >>>>>> To: lustre-discuss at lists.lustre.org >>>>>> Date: Thursday, May 7, 2009, 5:50 AM >>>>>> Hi there, >>>>>> >>>>>> I am configured a simple tcp lustre 1.8 >> with one >>>> mdc (one >>>>>> nic) and two >>>>>> oss (four nic per oss) >>>>>> As well as in the 1.6 documentation, >> the >>>> multihomed >>>>>> sections is a >>>>>> little bit unclear to me. >>>>>> >>>>>> I give every NID a IP in the same >> subnet, eg: >>>>>> 10.111.20.35-38 - oss0 >>>>>> and 10.111.20.39-42 oss1 >>>>>> >>>>>> Do I have to make modprobe.conf.local >> look like >>>> this to >>>>>> force lustre >>>>>> to use all four interfaces parallel: >>>>>> >>>>>> options lnet >> networks=tcp0(eth0,eth1,eth2,eth3) >>>>>> Because on Page 138 the 1.8 Manual >> says: >>>>>> "Note ? In the case of TCP-only >> clients, the >>>> first >>>>>> available non- >>>>>> loopback IP interface >>>>>> is used for tcp0 since the interfaces >> are not >>>> specified. " >>>>>> >>>>>> or do I have to specify it like this: >>>>>> options lnet networks=tcp >>>>>> Because on Page 112 the lustre 1.6 >> Manual says: >>>>>> "Note ? In the case of TCP-only >> clients, all >>>> available IP >>>>>> interfaces >>>>>> are used for tcp0 >>>>>> since the interfaces are not specified. >> If there >>>> is more >>>>>> than one, the >>>>>> IP of the first one >>>>>> found is used to construct the tcp0 >> ID." >>>>>> >>>>>> Which is the opposite of the 1.8 Manual >>>>>> >>>>>> My goal ist to let lustre utilize all >> four Gb >>>> Links >>>>>> parallel. And my >>>>>> Lustre Clients are equipped with two Gb >> links >>>> which should >>>>>> be utilized >>>>>> by the lustre clients as well (eth0, >> eth1) >>>>>> >>>>>> Or is bonding the better solution in >> terms of >>>> performance? >>>>>> >>>>>> Thanks very much for input, >>>>>> >>>>>> Michael Ruepp >>>>>> Schwarzfilm AG >>>>>> >>>>>> >>>>>> >> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> Lustre-discuss at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>>> >>>>> >>>>> >>>>> >>>>> >> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> Cheers, Andreas >>>> -- >>>> Andreas Dilger >>>> Sr. Staff Engineer, Lustre Group >>>> Sun Microsystems of Canada, Inc. >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Christopher J. Walker
2009-May-10 14:07 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Mag Gam wrote:> Thanks for the screen shot Arden. > > What is the maximum # of slaves you can have on a bonded interface? > > > > On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com> wrote: >> Bond0 knows which interface to utilize because all the other eth0-5 are designated as slaves in their configuration files. The manual is fairly clear on that. >> >> In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which corresponds roughly to what collectl is showing for KBWrite for Disks. Collectl shows a few different results for Disks, Network and Lustre OST and I believe it to be measuring the other OST on the network around 170MiB/s if you view the other screenshot for OST1 or lustrethree. >> >> In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target >> >> To help clarify the entire network and stress testing I did with all the clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html >> >> Proper benchmarking would be nice though as I just hit it with everything I could and it lived so I was happy. I found the manual to be lacking in benchmarking and really wanted to make nice graphs of it all but failed with iozone to do so for some reason.I too have been trying to benchmark a lustre filesystem with iozone 3.321. Sometimes it works, and sometimes it hangs. I turned on debugging, and ran a test with 2 clients on each of 40 machines. In the output, I get lines like: loop: R_STAT_DATA for client 9 For 79 clients, there are two of these messages in the output, and for one of them only 1. I''ve had a brief skim of the source code, and I think that the problem is that iozone uses UDP packets to communicate. On a heavily loaded network, one of these is bound to get lost. Presumably iozone doesn''t have the right retry strategy. The iozone author has suggested using a different network for the timing packets - but I don''t think I can justify the time or expense involved in building one purely to do some benchmarking. Chris PS On a machine with 2 bonded Gigabit ethernet cards, I found I needed two iozone threads to get the available bandwidth. One iozone thread seemed to get the bandwidth from one card only.
Brian J. Murrell
2009-May-10 15:00 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On Sun, 2009-05-10 at 15:07 +0100, Christopher J. Walker wrote:> > I''ve had a brief skim of the source code, and I think that the problem > is that iozone uses UDP packets to communicate. On a heavily loaded > network, one of these is bound to get lost. Presumably iozone doesn''t > have the right retry strategy.Why not use a benchmark that uses an established MPI (such as MPICH or LAM, which can run it''s message passing infrastructure on a TCP transport such as rsh or ssh) library. IOR is one such benchmark. Of course, if your network is really so loaded as to be dropping UDP packets then that will probably impact the latency of the MPI messages. Not sure if that will have a meaningful impact on IOR or not. I tend to think the messaging is quite low volume so perhaps not. In any case, it can add another data point to your debugging efforts to help prove or disprove your hypothesis. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090510/07ea42fe/attachment.bin
Klaus Steden
2009-May-12 22:50 UTC
[Lustre-discuss] tcp network load balancing understanding lustre 1.8
On 5/10/09 6:12 AM, "Arden Wiebe" <albert682 at yahoo.com> etched on stone tablets:>Mag, your welcome. From the page referenced first for a search for Linux> Bonding it states:How many bonding devices can I have? There is no> limit.How many slaves can a bonding device have? This is limited only by the> number of network interfaces Linux supports and/or the number of network cards > you can place in your system. >If memory serves, the LACP spec allows for a maximum of 8 devices within an aggregate group. I don''t know if the ALB and TLB modes of the Linux bonding implementation enforces any limit, though. Klaus