thr3ads.net - Lustre discuss - [Lustre-discuss] using trunked Gbit links? [May 2006]

If this information is useful, please help other people find it:
Share via:

Jeffrey Baker

2006-May-19 07:36 UTC

[Lustre-discuss] using trunked Gbit links?

Erich Focht wrote:> Hi,
> 
> does anything speak against using trunked gigabit ethernet links for OSSes?
> Should I expect the bandwidth to scale with the number of ethernet ports?
I''m
> thinking of trying 2 or 4 ethernet ports in parallel with either the
bonding
> driver or something similar (or is there any other way to get more
bandwidth
> out of an OSS?). Any experience reports on gigabit trunking with Lustre
would
> be very much appreciated.
I use bonded gigabit links and it works fine.  I use a switch that 
supports 802.3ad Link Aggregation, and the bonding ethernet driver in 
Linux.  From the perspective of the software you have one interface, so 
there''s no confusion.

This is also a form of fault-tolerance; your operation will continue 
despite a switch port or NIC or cable failure.

I haven''t tried four ports, but I can saturate 2 gigabit links from a 
Lustre OST if the data is in memory.

-jwb

Jeffrey W. Baker

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

On Wed, 2005-06-01 at 13:04 +0200, Erich Focht wrote:> Hi Jeffrey,
> 
> thanks for your reply. So you basically say that using the Linux bonding
> driver with 2 ports on the OSS is working for you. Great! You use it in 
> production? Only on OSS or also on clients? Any trouble, ever? 
My installation is still an experimental toy.  Lustre isn''t developed
to
the point where I can use it for my applications.  So if I say something
"works", that means I tried it, and it appeared to have all the
expected
attributes of a working system.

The bonding driver in Linux is still somewhat flaky.  I used it with
dual tg3 NICs (the kind that are built-in on many mainboards) and a
D-Link switch that supports 802.3ad.  The maximum throughput between any
two hosts on the same switch was 1 gigabit, but i got the full 2
gigabits when more than two hosts were communicating (2 clients and one
OSS, for example).

I always get a kernel oops when I down the bond0 interface, but I don''t
get that with newer kernels (2.6.12).  Just one more reason why I wish
Lustre tracked kernel.org changes instead of SuSE changes ....

-jwb

Daire Byrne

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

Erich,

I use Lustre''s socknal bonding and it seem to work quite well for me.
You
just use multiple --hostaddr entries when defining your OSS nodes. For 
example:

${LMC} -m $CONFIG --add net --node oss1a --nid oss1a --nettype tcp --hostaddr
172.17.17.112/255.255.0.0 --hostaddr 172.17.17.113/255.255.0.0

You then need to get the clients to connect to both ports by using "lctl 
--net tcp add_peer oss1a 172.17.17.112 988" on the clients. In my example 
I have both IPs in the same subnet. Using different subnets simplifies the 
routing setup somewhat.

I''m interested to see that Jeffrey is using bonding successfully. Last 
time I tried it I got very poor performance numbers - looks like its time 
to try it again! If it works as well as Jeffrey suggests I''ll go with 
that instead as its a much simpler configuration.

I wouldn''t think using 4 GigE ports will help too much - the extra 
overhead caused by the interrupts will probably offset the extra 
bandwidth. I tried a 4-port card once and got worse performance than when 
using a 2-port card.... I think 2 ports is the sweet spot. Its different 
if you use a single 10GigE card.....

Regards,

Daire

> Hi Jeffrey,
> 
> thanks for your reply. So you basically say that using the Linux bonding
> driver with 2 ports on the OSS is working for you. Great! You use it in 
> production? Only on OSS or also on clients? Any trouble, ever? 
> 
> I was wondering whether any of the optimisations I read about in Lustre
> presentations (e.g. zero-copy?) could have an impact on the usage of
bonding
> drivers with Lustre. Whether 4 trunked ethernet ports can be saturated or
not
> is a secondary question right now, so far I''d like to know whether
there are
> any potential dangers when using channel bonding with Lustre. Any comment
from
> ClusterFS developers?
> 
> Thanks,
> best regards,
> Erich
> 
> On Monday 30 May 2005 19:49, Jeffrey Baker wrote:
> > Erich Focht wrote:
> > > Hi,
> > >
> > > does anything speak against using trunked gigabit ethernet links
for
> > > OSSes? Should I expect the bandwidth to scale with the number of
ethernet
> > > ports? I''m thinking of trying 2 or 4 ethernet ports in
parallel with
> > > either the bonding driver or something similar (or is there any
other way
> > > to get more bandwidth out of an OSS?). Any experience reports on
gigabit
> > > trunking with Lustre would be very much appreciated.
> >
> > I use bonded gigabit links and it works fine.  I use a switch that
> > supports 802.3ad Link Aggregation, and the bonding ethernet driver in
> > Linux.  From the perspective of the software you have one interface,
so
> > there''s no confusion.
> >
> > This is also a form of fault-tolerance; your operation will continue
> > despite a switch port or NIC or cable failure.
> >
> > I haven''t tried four ports, but I can saturate 2 gigabit
links from a
> > Lustre OST if the data is in memory.
> >
> > -jwb
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
>

Niklas Edmundsson

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

On Wed, 1 Jun 2005, Daire Byrne wrote:
> I use Lustre''s socknal bonding and it seem to work quite well for
me. You
> just use multiple --hostaddr entries when defining your OSS nodes. For
> example:
>
> ${LMC} -m $CONFIG --add net --node oss1a --nid oss1a --nettype tcp
--hostaddr 172.17.17.112/255.255.0.0 --hostaddr 172.17.17.113/255.255.0.0
>
> You then need to get the clients to connect to both ports by using
"lctl
> --net tcp add_peer oss1a 172.17.17.112 988" on the clients. In my
example
> I have both IPs in the same subnet. Using different subnets simplifies the
> routing setup somewhat.
Actually, this should be done automagically by lconf or zeroconf, but 
last time I checked they got it wrong.

The following patch was needed to get lconf to do the right thing 
(it''s a simple bug, I haven''t cared to report to bugzilla
since we
have been busy with broken hardware and there seems to be no response 
to bug reports anyway):

--------------------8<------------------------------------
diff -wru ../dist/lustre/utils/lconf ./lustre/utils/lconf
--- ../dist/lustre/utils/lconf  Tue Apr 12 10:59:24 2005
+++ ./lustre/utils/lconf        Fri May 13 11:02:27 2005
@@ -1315,7 +1315,7 @@
              lctl.network(self.net_type, self.nid)
          if self.net_type == ''tcp'':
              sys_tweak_socknal()
-            for hostaddr in self.db.get_hostaddr():
+            for hostaddr in self.hostaddr:
                  ip = string.split(hostaddr, ''/'')[0]
                  if len(string.split(hostaddr, ''/'')) == 2:
                      netmask = string.split(hostaddr, ''/'')[1]
@@ -1373,7 +1373,7 @@
          if  node_is_router():
              self.disconnect_peer_gateways()
          if self.net_type == ''tcp'':
-            for hostaddr in self.db.get_hostaddr():
+            for hostaddr in self.hostaddr:
                  ip = string.split(hostaddr, ''/'')[0]
                  lctl.del_interface(self.net_type, ip)

--------------------8<------------------------------------

Since this exposes up-until-now not widely used functionality, expect 
things to break for some people :)

zeroconf is totally broken wrt this.

As stated, using different subnets simplifies the routing setup. 
Simply plugging it in Linux will be helpful and make sure all your 
traffic will go over one interface due to answering all requests on 
the same subnet with the same MAC address.

lconf/zeroconf should probably fix this automagically too. I have a 
perl script to do this somewhere if you''re interested.
> I''m interested to see that Jeffrey is using bonding successfully.
Last
> time I tried it I got very poor performance numbers - looks like its time
> to try it again! If it works as well as Jeffrey suggests I''ll go
with
> that instead as its a much simpler configuration.
We had issues with the bonding driver, but we weren''t able to do any 
serious testing before we got sidestepped by hardware issues.

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  Whattaya mean I can''t logon to an active Node?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Daire Byrne

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

> Actually, this should be done automagically by lconf or zeroconf, but 
> last time I checked they got it wrong.
> 
> The following patch was needed to get lconf to do the right thing 
> (it''s a simple bug, I haven''t cared to report to bugzilla
since we
> have been busy with broken hardware and there seems to be no response 
> to bug reports anyway):
I use zeroconf (autofs) to mount lustre to the lconf stuff doesn''t help
me
much. What I did was add all the "lctl --net tcp add_peer" entries
into my
modules.conf so that they get setup automatically when the Lustre modules 
are loaded.

As far as I am aware Lustre tries to match ports on the same subnet 
automatically. If I have a server with eth0 on 172.17 and eth1 on
172.18 subnets and a client with a single port on 172.17 then the client 
will connect to eth0 on the server. If I have another client with a single 
port on 172.18 then it will connect to eth1. If I have a dual port client 
with eth0 on 172.17 and eth1 on 172.18 then it will connect to both ports 
of the server.

So if you can split your machines using subnets you should get pretty good 
load-balancing.
> As stated, using different subnets simplifies the routing setup. 
> Simply plugging it in Linux will be helpful and make sure all your 
> traffic will go over one interface due to answering all requests on 
> the same subnet with the same MAC address.
> 
> lconf/zeroconf should probably fix this automagically too. I have a 
> perl script to do this somewhere if you''re interested.
On the servers I used "source routing" to get around this. Seems to
work
okay. Only problem is that it doesn''t work across subnets. Having
machines
on two subnets isn''t really an option for us hence the awkward routing.

Here''s what I added to the /etc/rc.local on one of our OSS/OST servers:
#Lustre 2-port setup
ip route add 172.17.0.0/16 via 172.17.17.112 table 1
ip route add 0/0 via 172.17.0.3 table 1
ip rule add from 172.17.17.112 lookup 1
ip route flush cache

Where 172.17.17.112 is eth0 on the server. eth1 (the last device started) 
by default will do all the routing to the subnet unless I use the above 
"hack".
> We had issues with the bonding driver, but we weren''t able to do
any
> serious testing before we got sidestepped by hardware issues.
I''ll try and give the bonding stuff another go again 2moro. It would 
simplify our setup and do away with the "lctl --net tcp add_peer" and 
"source-routing" hack. Will let you know how I get on.

Daire

Erich Focht

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

Hi Jeffrey,

thanks for your reply. So you basically say that using the Linux bonding
driver with 2 ports on the OSS is working for you. Great! You use it in 
production? Only on OSS or also on clients? Any trouble, ever? 

I was wondering whether any of the optimisations I read about in Lustre
presentations (e.g. zero-copy?) could have an impact on the usage of bonding
drivers with Lustre. Whether 4 trunked ethernet ports can be saturated or not
is a secondary question right now, so far I''d like to know whether
there are
any potential dangers when using channel bonding with Lustre. Any comment from
ClusterFS developers?

Thanks,
best regards,
Erich

On Monday 30 May 2005 19:49, Jeffrey Baker wrote:> Erich Focht wrote:
> > Hi,
> >
> > does anything speak against using trunked gigabit ethernet links for
> > OSSes? Should I expect the bandwidth to scale with the number of
ethernet
> > ports? I''m thinking of trying 2 or 4 ethernet ports in
parallel with
> > either the bonding driver or something similar (or is there any other
way
> > to get more bandwidth out of an OSS?). Any experience reports on
gigabit
> > trunking with Lustre would be very much appreciated.
>
> I use bonded gigabit links and it works fine.  I use a switch that
> supports 802.3ad Link Aggregation, and the bonding ethernet driver in
> Linux.  From the perspective of the software you have one interface, so
> there''s no confusion.
>
> This is also a form of fault-tolerance; your operation will continue
> despite a switch port or NIC or cable failure.
>
> I haven''t tried four ports, but I can saturate 2 gigabit links
from a
> Lustre OST if the data is in memory.
>
> -jwb

Erich Focht

2006-May-19 07:36 UTC

head link

[Lustre-discuss] using trunked Gbit links?

Hi,

does anything speak against using trunked gigabit ethernet links for OSSes?
Should I expect the bandwidth to scale with the number of ethernet ports?
I''m
thinking of trying 2 or 4 ethernet ports in parallel with either the bonding
driver or something similar (or is there any other way to get more bandwidth
out of an OSS?). Any experience reports on gigabit trunking with Lustre would
be very much appreciated.

Regards,
Erich

Lustre discuss - May 2006 - using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?

[Lustre-discuss] using trunked Gbit links?