I use a LEAF Bering distribution which is 2.4.18 kernel based. I wanted to experiment using it for link load balancing and redundancy and ran up some hitches. Pointers would be welcome and helpful. I set up a single machine with 2 ethernet interfaces as per the network schematic below. +----------+ A.B.64.175/26 | |-------- eth0--------------- gw A.B.64.134 | LEAF Box | | |-------- eth1--------------- gw A.B.65.129 +----------+ A.B.65.131/29 I have a third ethernet port that I can configure as 192.168.1.1 local LAN interface.A.B.64.134 is a land link (approx 400ms latency) while A.B.65.129 is a satellite link (approx 750ms latency). The latency was found by changing default route and pinging the same target IP. [Case 1] I first wanted to check if fail over from one interface to another works using the metrics declarative in the routes for priority of routes. The commands I gave and the outputs are as under: # ip ad flush dev eth0 # ip ad flush dev eth1 # ip ad flush dev eth2 # ip ad add dev eth0 A.B.64.175/26 # ip ad add dev eth1 A.B.65.131/29 # ip add ro default via A.B.64.134 metric 1 # ip add ro default via A.B.65.129 metric 2 # ping W.X.Y.Z [Result 1] Pings responds with packet replies if both are connected. If I disconnect the ethernet cable from eth0, the ping was still going thro''. If I connect the cable on eth0 and disconnect eth1, ping stops. If I connect back eth1, ping resumes with the icmp packet count at a much larger number than when it stopped with the difference in packets shown as lost. I thought by looking at ping latency, I could make out which link is being used. Latency was always 750ms. My surmise: The originating IP for the ping is taken as A.B.65.131. Thus replies do not land up if eth1 is not connected even though packets go out of eth0. If eth1 was connected, it was used as a preferred route as originating IP was from this subnet. [Question 1] Am I wrong? Is my interpretation of metrics wrong? [Case 2] I removed the default route and added a multipath route using commands as under: # ip ro del default # ip ro del default # ip ro add default nexthop via A.B.64.134 dev eth0 weight 1 \ nexthop via A.B.65.129 dev eth1 weight 1 [Result 2] Giving a ping here had the same results as in [Result 1]. I expected each ping packet to have different latency switching between 450 and 750ms. Did not happen. Latency was 750ms consistently. [Case 3] The above weight go by flows and not packets. Maybe a single single ping is treated as one flow. I changed the multipath to include equalize using commands as under: # ip ro del default # ip ro add default equalize nexthop via A.B.64.134 dev eth0 weight 1 \ nexthop via A.B.65.129 dev eth1 weight 1 [Result 3] Same as [Result 1] and [Result 2]. Atleast here I should have got latencies switching between 450 and 750ms for alternating ICMP requests. [Questions] 1. Is this method of testing correct? 2. Are there any other utilities/ methodologies that I can use to test this better? 3. Is expecting load balancing/ redundancy to happen for a single flow wrong? TIA Mohan _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
S Mohan wrote:> [Question 1] > Am I wrong? Is my interpretation of metrics wrong?yes and yes, but it''s a common mistake. packets will still use the lower metric if the route exists. until the kernel knows the route/interface is dead and removes it, your metric 2 default route won''t do anything.> [Result 2] > Giving a ping here had the same results as in [Result 1]. I expected each > ping packet to have different latency switching between 450 and 750ms. Did > not happen. Latency was 750ms consistently. > [Case 3] > The above weight go by flows and not packets. Maybe a single single ping is > treated as one flow. I changed the multipath to include equalize using > commands as under:does anyone know the difference between using the equalize command, and not ? I haven''t been able to work it out.> [Result 3] > Same as [Result 1] and [Result 2]. Atleast here I should have got latencies > switching between 450 and 750ms for alternating ICMP requests.probabaly not.> [Questions] > 1. Is this method of testing correct? > 2. Are there any other utilities/ methodologies that I can use to test this > better? > 3. Is expecting load balancing/ redundancy to happen for a single flow > wrong?you''re better off testing with machines behind the LEAF machine. doing load balancing/multipath stuff the way you are doesn''t work particularly well for a single machine. especially the router itself. using the equalize route method, you should start to see packets alternate routes on both links. if all your requests are going out from the same machine, you may see it use one link more than the other. if you have numerous machines behind the router, eventually it should balance out. also, you''ll probabaly need source-based routing for the return paths, unless your ISPs are doing tricky things. the equalise route method also doesn''t give real fail-over. if one link goes down, every second packet/connection/flow/blah will still try to go out it. if you want real fail-over with load balancing, perhaps you should look here: http://www.ssi.bg/~ja/#multigw -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Damion de Soto - Software Engineer email: damion@snapgear.com SnapGear --- ph: +61 7 3435 2809 | Custom Embedded Solutions fax: +61 7 3891 3630 | and Security Appliances web: http://www.snapgear.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/