Erich Focht wrote:> Hi, > > does anything speak against using trunked gigabit ethernet links for OSSes? > Should I expect the bandwidth to scale with the number of ethernet ports? I''m > thinking of trying 2 or 4 ethernet ports in parallel with either the bonding > driver or something similar (or is there any other way to get more bandwidth > out of an OSS?). Any experience reports on gigabit trunking with Lustre would > be very much appreciated.I use bonded gigabit links and it works fine. I use a switch that supports 802.3ad Link Aggregation, and the bonding ethernet driver in Linux. From the perspective of the software you have one interface, so there''s no confusion. This is also a form of fault-tolerance; your operation will continue despite a switch port or NIC or cable failure. I haven''t tried four ports, but I can saturate 2 gigabit links from a Lustre OST if the data is in memory. -jwb
On Wed, 2005-06-01 at 13:04 +0200, Erich Focht wrote:> Hi Jeffrey, > > thanks for your reply. So you basically say that using the Linux bonding > driver with 2 ports on the OSS is working for you. Great! You use it in > production? Only on OSS or also on clients? Any trouble, ever?My installation is still an experimental toy. Lustre isn''t developed to the point where I can use it for my applications. So if I say something "works", that means I tried it, and it appeared to have all the expected attributes of a working system. The bonding driver in Linux is still somewhat flaky. I used it with dual tg3 NICs (the kind that are built-in on many mainboards) and a D-Link switch that supports 802.3ad. The maximum throughput between any two hosts on the same switch was 1 gigabit, but i got the full 2 gigabits when more than two hosts were communicating (2 clients and one OSS, for example). I always get a kernel oops when I down the bond0 interface, but I don''t get that with newer kernels (2.6.12). Just one more reason why I wish Lustre tracked kernel.org changes instead of SuSE changes .... -jwb
Erich, I use Lustre''s socknal bonding and it seem to work quite well for me. You just use multiple --hostaddr entries when defining your OSS nodes. For example: ${LMC} -m $CONFIG --add net --node oss1a --nid oss1a --nettype tcp --hostaddr 172.17.17.112/255.255.0.0 --hostaddr 172.17.17.113/255.255.0.0 You then need to get the clients to connect to both ports by using "lctl --net tcp add_peer oss1a 172.17.17.112 988" on the clients. In my example I have both IPs in the same subnet. Using different subnets simplifies the routing setup somewhat. I''m interested to see that Jeffrey is using bonding successfully. Last time I tried it I got very poor performance numbers - looks like its time to try it again! If it works as well as Jeffrey suggests I''ll go with that instead as its a much simpler configuration. I wouldn''t think using 4 GigE ports will help too much - the extra overhead caused by the interrupts will probably offset the extra bandwidth. I tried a 4-port card once and got worse performance than when using a 2-port card.... I think 2 ports is the sweet spot. Its different if you use a single 10GigE card..... Regards, Daire> Hi Jeffrey, > > thanks for your reply. So you basically say that using the Linux bonding > driver with 2 ports on the OSS is working for you. Great! You use it in > production? Only on OSS or also on clients? Any trouble, ever? > > I was wondering whether any of the optimisations I read about in Lustre > presentations (e.g. zero-copy?) could have an impact on the usage of bonding > drivers with Lustre. Whether 4 trunked ethernet ports can be saturated or not > is a secondary question right now, so far I''d like to know whether there are > any potential dangers when using channel bonding with Lustre. Any comment from > ClusterFS developers? > > Thanks, > best regards, > Erich > > On Monday 30 May 2005 19:49, Jeffrey Baker wrote: > > Erich Focht wrote: > > > Hi, > > > > > > does anything speak against using trunked gigabit ethernet links for > > > OSSes? Should I expect the bandwidth to scale with the number of ethernet > > > ports? I''m thinking of trying 2 or 4 ethernet ports in parallel with > > > either the bonding driver or something similar (or is there any other way > > > to get more bandwidth out of an OSS?). Any experience reports on gigabit > > > trunking with Lustre would be very much appreciated. > > > > I use bonded gigabit links and it works fine. I use a switch that > > supports 802.3ad Link Aggregation, and the bonding ethernet driver in > > Linux. From the perspective of the software you have one interface, so > > there''s no confusion. > > > > This is also a form of fault-tolerance; your operation will continue > > despite a switch port or NIC or cable failure. > > > > I haven''t tried four ports, but I can saturate 2 gigabit links from a > > Lustre OST if the data is in memory. > > > > -jwb > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >
On Wed, 1 Jun 2005, Daire Byrne wrote:> I use Lustre''s socknal bonding and it seem to work quite well for me. You > just use multiple --hostaddr entries when defining your OSS nodes. For > example: > > ${LMC} -m $CONFIG --add net --node oss1a --nid oss1a --nettype tcp --hostaddr 172.17.17.112/255.255.0.0 --hostaddr 172.17.17.113/255.255.0.0 > > You then need to get the clients to connect to both ports by using "lctl > --net tcp add_peer oss1a 172.17.17.112 988" on the clients. In my example > I have both IPs in the same subnet. Using different subnets simplifies the > routing setup somewhat.Actually, this should be done automagically by lconf or zeroconf, but last time I checked they got it wrong. The following patch was needed to get lconf to do the right thing (it''s a simple bug, I haven''t cared to report to bugzilla since we have been busy with broken hardware and there seems to be no response to bug reports anyway): --------------------8<------------------------------------ diff -wru ../dist/lustre/utils/lconf ./lustre/utils/lconf --- ../dist/lustre/utils/lconf Tue Apr 12 10:59:24 2005 +++ ./lustre/utils/lconf Fri May 13 11:02:27 2005 @@ -1315,7 +1315,7 @@ lctl.network(self.net_type, self.nid) if self.net_type == ''tcp'': sys_tweak_socknal() - for hostaddr in self.db.get_hostaddr(): + for hostaddr in self.hostaddr: ip = string.split(hostaddr, ''/'')[0] if len(string.split(hostaddr, ''/'')) == 2: netmask = string.split(hostaddr, ''/'')[1] @@ -1373,7 +1373,7 @@ if node_is_router(): self.disconnect_peer_gateways() if self.net_type == ''tcp'': - for hostaddr in self.db.get_hostaddr(): + for hostaddr in self.hostaddr: ip = string.split(hostaddr, ''/'')[0] lctl.del_interface(self.net_type, ip) --------------------8<------------------------------------ Since this exposes up-until-now not widely used functionality, expect things to break for some people :) zeroconf is totally broken wrt this. As stated, using different subnets simplifies the routing setup. Simply plugging it in Linux will be helpful and make sure all your traffic will go over one interface due to answering all requests on the same subnet with the same MAC address. lconf/zeroconf should probably fix this automagically too. I have a perl script to do this somewhere if you''re interested.> I''m interested to see that Jeffrey is using bonding successfully. Last > time I tried it I got very poor performance numbers - looks like its time > to try it again! If it works as well as Jeffrey suggests I''ll go with > that instead as its a much simpler configuration.We had issues with the bonding driver, but we weren''t able to do any serious testing before we got sidestepped by hardware issues. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- Whattaya mean I can''t logon to an active Node? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Actually, this should be done automagically by lconf or zeroconf, but > last time I checked they got it wrong. > > The following patch was needed to get lconf to do the right thing > (it''s a simple bug, I haven''t cared to report to bugzilla since we > have been busy with broken hardware and there seems to be no response > to bug reports anyway):I use zeroconf (autofs) to mount lustre to the lconf stuff doesn''t help me much. What I did was add all the "lctl --net tcp add_peer" entries into my modules.conf so that they get setup automatically when the Lustre modules are loaded. As far as I am aware Lustre tries to match ports on the same subnet automatically. If I have a server with eth0 on 172.17 and eth1 on 172.18 subnets and a client with a single port on 172.17 then the client will connect to eth0 on the server. If I have another client with a single port on 172.18 then it will connect to eth1. If I have a dual port client with eth0 on 172.17 and eth1 on 172.18 then it will connect to both ports of the server. So if you can split your machines using subnets you should get pretty good load-balancing.> As stated, using different subnets simplifies the routing setup. > Simply plugging it in Linux will be helpful and make sure all your > traffic will go over one interface due to answering all requests on > the same subnet with the same MAC address. > > lconf/zeroconf should probably fix this automagically too. I have a > perl script to do this somewhere if you''re interested.On the servers I used "source routing" to get around this. Seems to work okay. Only problem is that it doesn''t work across subnets. Having machines on two subnets isn''t really an option for us hence the awkward routing. Here''s what I added to the /etc/rc.local on one of our OSS/OST servers: #Lustre 2-port setup ip route add 172.17.0.0/16 via 172.17.17.112 table 1 ip route add 0/0 via 172.17.0.3 table 1 ip rule add from 172.17.17.112 lookup 1 ip route flush cache Where 172.17.17.112 is eth0 on the server. eth1 (the last device started) by default will do all the routing to the subnet unless I use the above "hack".> We had issues with the bonding driver, but we weren''t able to do any > serious testing before we got sidestepped by hardware issues.I''ll try and give the bonding stuff another go again 2moro. It would simplify our setup and do away with the "lctl --net tcp add_peer" and "source-routing" hack. Will let you know how I get on. Daire
Hi Jeffrey, thanks for your reply. So you basically say that using the Linux bonding driver with 2 ports on the OSS is working for you. Great! You use it in production? Only on OSS or also on clients? Any trouble, ever? I was wondering whether any of the optimisations I read about in Lustre presentations (e.g. zero-copy?) could have an impact on the usage of bonding drivers with Lustre. Whether 4 trunked ethernet ports can be saturated or not is a secondary question right now, so far I''d like to know whether there are any potential dangers when using channel bonding with Lustre. Any comment from ClusterFS developers? Thanks, best regards, Erich On Monday 30 May 2005 19:49, Jeffrey Baker wrote:> Erich Focht wrote: > > Hi, > > > > does anything speak against using trunked gigabit ethernet links for > > OSSes? Should I expect the bandwidth to scale with the number of ethernet > > ports? I''m thinking of trying 2 or 4 ethernet ports in parallel with > > either the bonding driver or something similar (or is there any other way > > to get more bandwidth out of an OSS?). Any experience reports on gigabit > > trunking with Lustre would be very much appreciated. > > I use bonded gigabit links and it works fine. I use a switch that > supports 802.3ad Link Aggregation, and the bonding ethernet driver in > Linux. From the perspective of the software you have one interface, so > there''s no confusion. > > This is also a form of fault-tolerance; your operation will continue > despite a switch port or NIC or cable failure. > > I haven''t tried four ports, but I can saturate 2 gigabit links from a > Lustre OST if the data is in memory. > > -jwb
Hi, does anything speak against using trunked gigabit ethernet links for OSSes? Should I expect the bandwidth to scale with the number of ethernet ports? I''m thinking of trying 2 or 4 ethernet ports in parallel with either the bonding driver or something similar (or is there any other way to get more bandwidth out of an OSS?). Any experience reports on gigabit trunking with Lustre would be very much appreciated. Regards, Erich