Hello all! I applied http://www.ssi.bg/~ja/routes-2.6.8-10.diff patch to kernel 2.6.8.1 and it works fine, or almost fine. It does the load balancing well, but when one link is dropped it continues to try it. At the end of http://www.ssi.bg/~ja/nano.txt it is said to ping gateway 1 and gateway 2, for the kernel to know if that route is working, but since my linux is connected to the links through 1 dedicated link and one adsl modem, I tryied to: 1) remove ethernet cable from linux nic: the patch worked well, began to send traffic only to the yet working, link. 2) remove telephone line from adsl modem (or external ethernet cable from the dedic. link switch): the patch didn''t work, continued trying to send traffic to the dropped link. So, I think its happening because linux, since it can ping the switch (or adsl modem) thinks that link is good. Did you have this problem? Some hint? Thank you! Tom Lobato
On 1/19/07, Tom Lobato <tomlobato@gmail.com> wrote:> Hello all! > > I applied http://www.ssi.bg/~ja/routes-2.6.8-10.diff patch to kernel > 2.6.8.1 and it works fine, or almost fine. It does the load balancing > well, but when one link is dropped it continues to try it. > At the end of http://www.ssi.bg/~ja/nano.txt it is said to ping > gateway 1 and gateway 2, for the kernel to know if that route is > working, but since my linux is connected to the links through 1 > dedicated link and one adsl modem, I tryied to: > 1) remove ethernet cable from linux nic: the patch worked well, > began to send traffic only to the yet working, link. > 2) remove telephone line from adsl modem (or external ethernet > cable from the dedic. link switch): the patch didn''t work, continued > trying to send traffic to the dropped link. > So, I think its happening because linux, since it can ping the > switch (or adsl modem) thinks that link is good. > > Did you have this problem? Some hint? > Thank you! >My experience has been mixed. The patch worked very well in many cases but in some it worked only if the first hop gateway was down and not any of the subsequent hops. So as you mentioned its happening since it can ping the switch / modem, it thinks the link is good. You can make a script which will keep on running in the background and check it the links are up or not and if any of the links is down, it can change the default route and provide a failover. -- Manish Kathuria Tux Technologies http://www.tuxtechnologies.co.in/
On 01/19/07 12:45, Manish Kathuria wrote:> My experience has been mixed. The patch worked very well in many cases > but in some it worked only if the first hop gateway was down and not > any of the subsequent hops. So as you mentioned its happening since it > can ping the switch / modem, it thinks the link is good. You can make > a script which will keep on running in the background and check it the > links are up or not and if any of the links is down, it can change the > default route and provide a failover.I have been tasked with writing such a script. In my scenario, I''m taking it a bit further though. I am planing on having my script test the actual service that I''m trying to connect to. I.e. connect to port 80 and request a page. I''m having to go this route because I''ve had sporadic MTU issues in one of our (primary) paths. The provider is suppose to be repairing the problem, however I need a solution before that can happen. I am planing on writing a small daemon, probably in Perl, that will run the tests. What I don''t have a good way to do is alter the routing tables, short of shelling out and running ip directly. I would like to know if any one knows of any other way to alter the routing tables / rules short of calling a shell command. Grant. . . .
Hi! Thank you. Manish Kathuria escreveu:> On 1/19/07, Tom Lobato <tomlobato@gmail.com> wrote: >> Hello all! >> >> I applied http://www.ssi.bg/~ja/routes-2.6.8-10.diff patch to kernel >> 2.6.8.1 and it works fine, or almost fine. It does the load balancing >> well, but when one link is dropped it continues to try it. >> At the end of http://www.ssi.bg/~ja/nano.txt it is said to ping >> gateway 1 and gateway 2, for the kernel to know if that route is >> working, but since my linux is connected to the links through 1 >> dedicated link and one adsl modem, I tryied to: >> 1) remove ethernet cable from linux nic: the patch worked well, >> began to send traffic only to the yet working, link. >> 2) remove telephone line from adsl modem (or external ethernet >> cable from the dedic. link switch): the patch didn''t work, continued >> trying to send traffic to the dropped link. >> So, I think its happening because linux, since it can ping the >> switch (or adsl modem) thinks that link is good. >> >> Did you have this problem? Some hint? >> Thank you! >> > > My experience has been mixed. The patch worked very well in many cases > but in some it worked only if the first hop gateway was down and not > any of the subsequent hops. So as you mentioned its happening since it > can ping the switch / modem, it thinks the link is good. You can make > a script which will keep on running in the background and check it the > links are up or not and if any of the links is down, it can change the > default route and provide a failover. >Oh yes, in really I already made such scripts, before to know this patch, using "4.2. Routing for multiple uplinks/providers" from Adv-Routing-HOWTO information. But facing this problem, I think the best solution is to use it again. Somebody know if there is working in progress for solve this? Is there some goal for include this patch to the mainstream kernel? What is the possibility of it? Tom Lobato
On 1/20/07, Grant Taylor <gtaylor@riverviewtech.net> wrote:> On 01/19/07 12:45, Manish Kathuria wrote: > > My experience has been mixed. The patch worked very well in many cases > > but in some it worked only if the first hop gateway was down and not > > any of the subsequent hops. So as you mentioned its happening since it > > can ping the switch / modem, it thinks the link is good. You can make > > a script which will keep on running in the background and check it the > > links are up or not and if any of the links is down, it can change the > > default route and provide a failover. > > I have been tasked with writing such a script. In my scenario, I''m > taking it a bit further though. I am planing on having my script test > the actual service that I''m trying to connect to. I.e. connect to port > 80 and request a page. I''m having to go this route because I''ve had > sporadic MTU issues in one of our (primary) paths. The provider is > suppose to be repairing the problem, however I need a solution before > that can happen.The method I have adopted is to use a shell script which pings a popular remote site ''s IP (for example www.yahoo.com or www.google.com) through each of the interfaces every 10 seconds. The default multipath route is replaced by a single default gateway if reply is not received for 4 consecutive tries from one of the links. This is to avoid very frequent failovers. However, the link is treated as live as soon as a ping reply is received and the multipath route is activated. -- Manish Kathuria Tux Technologies http://www.tuxtechnologies.co.in/
On 1/27/07, Geoff Dornan <geoff@cmcnetworks.net> wrote:> Hi > > Can you post your script please? > > Cheers > geoff >> > On 1/20/07, Grant Taylor <gtaylor@riverviewtech.net> wrote: > > On 01/19/07 12:45, Manish Kathuria wrote: > > > My experience has been mixed. The patch worked very well in many > cases > > > but in some it worked only if the first hop gateway was down and not > > > any of the subsequent hops. So as you mentioned its happening since > it > > > can ping the switch / modem, it thinks the link is good. You can > make > > > a script which will keep on running in the background and check it > the > > > links are up or not and if any of the links is down, it can change > the > > > default route and provide a failover. > > > > I have been tasked with writing such a script. In my scenario, I''m > > taking it a bit further though. I am planing on having my script test > > the actual service that I''m trying to connect to. I.e. connect to > port > > 80 and request a page. I''m having to go this route because I''ve had > > sporadic MTU issues in one of our (primary) paths. The provider is > > suppose to be repairing the problem, however I need a solution before > > that can happen. > > The method I have adopted is to use a shell script which pings a > popular remote site ''s IP (for example www.yahoo.com or > www.google.com) through each of the interfaces every 10 seconds. The > default multipath route is replaced by a single default gateway if > reply is not received for 4 consecutive tries from one of the links. > This is to avoid very frequent failovers. However, the link is treated > as live as soon as a ping reply is received and the multipath route > is activated. >The script is appended. It assumes that you have followed the steps as described in nano.txt with or without applying the patches. Though it appears to be very simplistic, its working great at a number of locations. #!/bin/bash -x TESTIP=www.yahoo.com CHECK=0 ISPA=1 ISPB=1 LINKSTATUS=1 COUNTA=0 COUNTB=0 EXTIF1=eth1 EXTIF2=eth2 GW1=172.16.1.1 GW2=192.168.1.1 W1=1 W2=1 while : ; do ping -I $EXTIF1 -c 1 $TESTIP > /dev/null 2>&1 RETVAL=$? if [ $RETVAL -ne 0 ]; then COUNTA=`expr $COUNTA + 1` else COUNTA=0 fi if [ $COUNTA -ge 4 ]; then ISPA=0 else ISPA=1 fi ping -I $EXTIF2 -c 1 $TESTIP > /dev/null 2>&1 RETVAL=$? if [ $RETVAL -ne 0 ]; then COUNTB=`expr $COUNTB + 1` else COUNTB=0 fi if [ $COUNTB -ge 4 ]; then ISPB=0 else ISPB=1 fi if [ $ISPA -eq 1 ]; then if [ $ISPB -eq 1 ]; then NEWSTATUS=1 elif [ $ISPB -eq 0 ]; then NEWSTATUS=2 fi elif [ $ISPA -eq 0 ]; then if [ $ISPB -eq 1 ]; then NEWSTATUS=3 fi fi case $LINKSTATUS in 1) if [ $NEWSTATUS -eq 2 ]; then ip route replace default via $GW1 dev $EXTIF1 elif [ $NEWSTATUS -eq 3 ]; then ip route replace default via $GW2 dev $EXTIF2 fi;; 2) if [ $NEWSTATUS -eq 1 ]; then ip route del default ip route replace default table 222 proto static \ nexthop via $GW1 dev $EXTIF1 weight $W1\ nexthop via $GW2 dev $EXTIF2 weight $W2 elif [ $NEWSTATUS -eq 3 ]; then ip route replace default via $GW2 dev $EXTIF2 fi;; 3) if [ $NEWSTATUS -eq 1 ]; then ip route del default ip route replace default table 222 proto static \ nexthop via $GW1 dev $EXTIF1 weight $W1\ nexthop via $GW2 dev $EXTIF2 weight $W2 elif [ $NEWSTATUS -eq 2 ]; then ip route replace default via $GW1 dev $EXTIF1 fi;; *) echo;; esac LINKSTATUS=$NEWSTATUS sleep 10 done Let me know if you can think of any improvements or modifications. -- Manish Kathuria Tux Technologies http://www.tuxtechnologies.co.in/
Hi! Manish Kathuria Wrote:> > The method I have adopted is to use a shell script which pings a > popular remote site ''s IP (for example www.yahoo.com or > www.google.com) through each of the interfaces every 10 seconds. The > default multipath route is replaced by a single default gateway if > reply is not received for 4 consecutive tries from one of the links. > This is to avoid very frequent failovers. However, the link is treated > as live as soon as a ping reply is received and the multipath route > is activated.Now I''m using the ping options: ping -n -w 10 -c 2 -I $lnk1_dev $lnk1_pingtarget But so I''m getting some false negatives. Can you show what ping options you use? Tom Lobato _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 1/28/07, Tom Lobato <tomlobato@gmail.com> wrote:> > Manish Kathuria Wrote: > > > > The method I have adopted is to use a shell script which pings a > > popular remote site ''s IP (for example www.yahoo.com or > > www.google.com) through each of the interfaces every 10 seconds. The > > default multipath route is replaced by a single default gateway if > > reply is not received for 4 consecutive tries from one of the links. > > This is to avoid very frequent failovers. However, the link is treated > > as live as soon as a ping reply is received and the multipath route > > is activated. > Now I''m using the ping options: > > ping -n -w 10 -c 2 -I $lnk1_dev $lnk1_pingtarget > > But so I''m getting some false negatives. Can you show what ping options you > use? > Tom Lobato >Please see the script posted earlier. The simple ping command with the following options is repeated every 10 seconds using an endless loop. ping -I $EXTIF1 -c 1 $TESTIP > /dev/null 2>&1 -- Manish Kathuria Tux Technologies http://www.tuxtechnologies.co.in/
Hi! Thank you for the script. I''m trying it. Well, I made a simple modification and would like to hear opnions. Until now, I just added one more TESTIP, so I''m pinging one IP for each link. Also I''m using the IP instead name address, and used the DNS IP of each provider for the ping. I made this because the ping to external sites (yahoo, google) is too slow here, mainly when the link is under heavy load. So I''m afraid it can try ping without success and "think" the link is down. Also, for don''t get falses ''link down'', did you tried to increase the number of 4 ping fails before replace the route? What do you think about? I appreciate suggestions, PS: although alteration be so simple, if someone want to see, tell me and I send a mail. Tom Lobato Manish Kathuria wrote:> The script is appended. It assumes that you have followed the steps as > described in nano.txt with or without applying the patches. Though it > appears to be very simplistic, its working great at a number of > locations. > > #!/bin/bash -x > > TESTIP=www.yahoo.com > CHECK=0 > ISPA=1 > ISPB=1 > LINKSTATUS=1 > COUNTA=0 > COUNTB=0 > EXTIF1=eth1 > EXTIF2=eth2 > GW1=172.16.1.1 > GW2=192.168.1.1 > W1=1 > W2=1 > > while : ; do > > ping -I $EXTIF1 -c 1 $TESTIP > /dev/null 2>&1 > RETVAL=$? > if [ $RETVAL -ne 0 ]; then > COUNTA=`expr $COUNTA + 1` > else > COUNTA=0 > fi > > if [ $COUNTA -ge 4 ]; then > ISPA=0 > else > ISPA=1 > fi > > ping -I $EXTIF2 -c 1 $TESTIP > /dev/null 2>&1 > RETVAL=$? > if [ $RETVAL -ne 0 ]; then > COUNTB=`expr $COUNTB + 1` > else > COUNTB=0 > fi > > if [ $COUNTB -ge 4 ]; then > ISPB=0 > else > ISPB=1 > fi > > > if [ $ISPA -eq 1 ]; then > if [ $ISPB -eq 1 ]; then > NEWSTATUS=1 > elif [ $ISPB -eq 0 ]; then > NEWSTATUS=2 > fi > elif [ $ISPA -eq 0 ]; then > if [ $ISPB -eq 1 ]; then > NEWSTATUS=3 > fi > fi > > case $LINKSTATUS in > > 1) if [ $NEWSTATUS -eq 2 ]; then > ip route replace default via $GW1 dev $EXTIF1 > elif [ $NEWSTATUS -eq 3 ]; then > ip route replace default via $GW2 dev $EXTIF2 > fi;; > > 2) if [ $NEWSTATUS -eq 1 ]; then > ip route del default > ip route replace default table 222 proto static \ > nexthop via $GW1 dev $EXTIF1 weight $W1\ > nexthop via $GW2 dev $EXTIF2 weight $W2 > elif [ $NEWSTATUS -eq 3 ]; then > ip route replace default via $GW2 dev $EXTIF2 > fi;; > > 3) if [ $NEWSTATUS -eq 1 ]; then > ip route del default > ip route replace default table 222 proto static \ > nexthop via $GW1 dev $EXTIF1 weight $W1\ > nexthop via $GW2 dev $EXTIF2 weight $W2 > elif [ $NEWSTATUS -eq 2 ]; then > ip route replace default via $GW1 dev $EXTIF1 > fi;; > > *) echo;; > > esac > > LINKSTATUS=$NEWSTATUS > sleep 10 > done > > Let me know if you can think of any improvements or modifications. >
On 2/8/07, Tom Lobato <tomlobato@gmail.com> wrote:> Thank you for the script. I''m trying it. > > Well, I made a simple modification and would like to hear opnions. > Until now, I just added one more TESTIP, so I''m pinging one IP for each link. > Also I''m using the IP instead name address, and used the DNS IP of each provider > for the ping. I made this because the ping to external sites (yahoo, google) is too slow > here, mainly when the link is under heavy load. So I''m afraid it can try ping > without success and "think" the link is down.I just used a popular external site because it may happen that connectivity from your location to the provider''s DNS is there but the provider''s link with the rest of the internet is down so even if you get a successful ping reply, the link isn''t working in the real sense. Also, I preferred using a name instead of IP address because there could be multiple IP addresses associated with the site name and they can change too. But I don''t see anything wrong in your approach. What do you mean by slow ? I don''t think ping reply time should be an issue. We are more concerned with the success. Obviously, it should not time out. The ping reply times I get here for sites like www.yahoo.com and www.google.com are to the tune of 300 ms. You can increase the pin> Also, for don''t get falses ''link down'', did you tried to increase the number of 4 > ping fails before replace the route? What do you think about? >4 successful ping fails means that the link has been down for anywhere between 40-50 seconds which I think was a sufficient time interval to carry a failover. But you can increase it depending upon your requirements. For restoring the link, the script doesn''t wait for that much time.> PS: although alteration be so simple, if someone want to see, tell me and I send a mail. > Tom LobatoIt would be great to see your final script. -- Manish Kathuria Tux Technologies http://www.tuxtechnologies.co.in/
Manish Kathuria escreveu:> On 2/8/07, Tom Lobato <tomlobato@gmail.com> wrote: > >> Thank you for the script. I''m trying it. >> >> Well, I made a simple modification and would like to hear opnions. >> Until now, I just added one more TESTIP, so I''m pinging one IP for >> each link. >> Also I''m using the IP instead name address, and used the DNS IP of >> each provider >> for the ping. I made this because the ping to external sites (yahoo, >> google) is too slow >> here, mainly when the link is under heavy load. So I''m afraid it can >> try ping >> without success and "think" the link is down. > > I just used a popular external site because it may happen that > connectivity from your location to the provider''s DNS is there but the > provider''s link with the rest of the internet is down so even if you > get a successful ping reply, the link isn''t working in the real sense.ok, I noted here my DNS server block pings (!) so I''m also using a site now.> Also, I preferred using a name instead of IP address because there > could be multiple IP addresses associated with the site name and they > can change too. But I don''t see anything wrong in your approach. What > do you mean by slow ? I don''t think ping reply time should be an > issue. We are more concerned with the success. Obviously, it should > not time out.I agree, but here "slow == timeout" =) I''m suspecting the adsl modem is the problem. I have two dynamic IP links, adsl/pppoe 400kbps and cable-modem/dhcp 4Mbps. Anyway, I changed my mind and will connect links directly to linux (no routers), with the drawback of not have fixed IP/GW/MASK/NET''s, but with advantages of need no routers, need no port forwarding in routers, a more auto-sufficient solution. So, I''m using your script as base (although I had made another, I liked yours), making scripts for dhcp and pppoe create files with connection info, from where it reads data for set LB. If someone more wants it, tell me and I send a mail. I know I could apply the patchs and these scripts would too more simple, but the patch does not detect fail if it is beyond the gateway.> > The ping reply times I get here for sites like www.yahoo.com and > www.google.com are to the tune of 300 ms.Here, without any internet use from localnet, i get ~150ms for both. So, really, it appears I have another problem, not ping delay. Maybe too load on adsl link, although I set weights 10 for cable link and 1 for adsl. Tom Lobato