Hi All,
Just wondering if someone can give us some insight into the logic that
ksocklnd uses to decide which connections to make.
There''s not so much in the Lustre operations manual about it, but the
impression I get from reading around is that if we have:
options lnet networks=tcp0(eth0,eth1)
on all of our dual-connected hosts, then they will load-balance by making
multiple connections between clients and servers. Indeed, they do.
However, I would have expected 4 TCP connection bundles (i.e. 12
connections) (eth[0,1]<->eth[0,1]), but we actually get two (i.e. 6
connections) (eth0<->eth0 and eth1<->eth1). How does lustre know
which
combinations to use???
Some important points about our setup:
- This is a shared network segment 172.16.0.0/16
- Three switches (LeftSwitch<->TopSwitch<->RightSwitch)
- all dual connected hosts are connected to both LeftSwitch and RightSwitch
- clients network interfaces are 172.16.4.x/16 (eth0,leftswitch) and
172.16.5.x/16 (eth1,rightswitch)
- OSS/MDS network interfaces are 172.16.0.x/16 (eth0,leftswitch) and
172.16.1.x/16 (eth1,rightswitch)
- to get good routing, we have static routes configured as such, on all
dual-connect machines:
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
172.16.4.0 0.0.0.0 255.255.255.0 U 0 0 0
eth0
172.16.5.0 0.0.0.0 255.255.255.0 U 0 0 0
eth1
172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0
eth0
172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0
eth1
(i.e. all traffic between clients and servers shouldn''t traverse
TopSwitch,
notwithstanding occasional ugly arp issues)
So lustre has done the right thing in connecting eth0<->eth0 and
eth1<->eth1
in this case. But how does it know? Does the client connect to both server
addresses and throw away any connections originating from the same address?
Is there some check of the return path?
My motive here is that I also have a set of singly-connected machines, and
want to have their traffic balanced across both server networks (single
connect machines come in via topswitch). Right now, these clients all
connect to the eth0 address (172.16.0.x) on all OSSes and the MDS. All the
traffic goes via leftswitch, and my peak bandwidth to a single OSS is
therefore 1 gigabit, and the disks are capable of more than that. What if
my single connect client was a 10gig NIC? Or lots of 1gig single connect
clients? It seems that we are getting it wrong in these cases.
I understand the issues with routing return traffic from the OSS - I am
happy/planning to configure source-based routing on the server nodes, but I
only want to go to the effort once I understand whatever black magic
ksocklnd is doing to decide which connections it should make! If I go ahead
and configure the source routing, will we end up with two connections being
made from a client with a single IP?
Thanks for your help,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080828/87892fa8/attachment.html