I have a firewall with two redundant internet connections coming in (eth0 and eth1) and an intranet behind eth2. What I am trying to do is have data off of eth2 split evenly between eth0 and eth1, and if one interface goes down, to fully utilize the other. What I''m trying to do is have all data from eth0 be passed on to eth2 (unless it''s stopped by the firewall), same with eth1, and all data from eth2 be split evenly between eth0 and eth1. currently I have the following routes and rules to accomplish this: ip route add 10.0.0.0/8 via GATEWAY0 table 1 proto static ip route add 10.0.0.0/8 via GATEWAY1 table 2 proto static ip route add default table default scope global nexthop via GATEWAY0 dev eth0 weight 1 nexthop via GATEWAY1 dev eth1 weight 1 ip rule add pref 1500 iif eth0 table 1 ip rule add pref 1501 iif eth1 table 2 ip rule add pref 100 iif eth2 table default This does NOT work properly. From localhost, everything works perfectly. I can bring up and down interfaces and everything works properly and transparently. But, from the intranet, everything stops. With a different default route: ip route add default via GATEWAY0 dev eth0 table default everything is fine from both localhost and the intranet. Same with GATEWAY1 eth1. Can anyone offer advice on how to resolve this problem? The only way I can think of so far is a remarkably simple but stupid hack, where I just ping -I eth0 GATEWAY0 and ping -I eth1 GATEWAY1 every thirty seconds or so and switch default routes if an interface is down. This obviously does not solve the problem, nor allow bandwidth to be shared across both lines. Any help would be greatly appreciated. Seth J. Blank Systems Operations Capital Market Services, LLC _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Sorry, I really wasn''t paying attention when I wrote this (i.e. I''ve had no sleep). I have the routing tables working properly for the internal network. What I need to do is have the routing tables update the gateways when a line is down. i.e. intranet ----- firewall ----- router1 ----- internet \-- router2 ----- internet Currently, I have the gateway from the firewall being nexthops between router1 and router2. This works fine. But what I need to do is have the firewall check the links between router1/2 and the internet and switch gateways if a line is down. What I want to do, but can''t figure out how to, is send out a packet through router1 and see if it gets an arbitrary number of hops (probably 3) out. If not, switch the default route to use the other gateway. This needs to be done for both gateways, and there also needs to be a route to restore the gateways when the line goes back up. Any help would be greatly appreciated. Thanks so much, Seth J. Blank Systems Operations Capital Market Services, LLC Seth J. Blank wrote:> I have a firewall with two redundant internet connections coming in > (eth0 and eth1) and an intranet behind eth2. > > What I am trying to do is have data off of eth2 split evenly between > eth0 and eth1, and if one interface goes down, to fully utilize the > other. > > What I''m trying to do is have all data from eth0 be passed on to eth2 > (unless it''s stopped by the firewall), same with eth1, and all data > from eth2 be split evenly between eth0 and eth1. > > currently I have the following routes and rules to accomplish this: > > ip route add 10.0.0.0/8 via GATEWAY0 table 1 proto static > ip route add 10.0.0.0/8 via GATEWAY1 table 2 proto static > > ip route add default table default scope global nexthop via GATEWAY0 > dev eth0 weight 1 nexthop via GATEWAY1 dev eth1 weight 1 > > ip rule add pref 1500 iif eth0 table 1 > ip rule add pref 1501 iif eth1 table 2 > ip rule add pref 100 iif eth2 table default > > This does NOT work properly. > From localhost, everything works perfectly. I can bring up and down > interfaces and everything works properly and transparently. > But, from the intranet, everything stops. With a different default route: > ip route add default via GATEWAY0 dev eth0 table default > everything is fine from both localhost and the intranet. Same with > GATEWAY1 eth1. > > Can anyone offer advice on how to resolve this problem? > The only way I can think of so far is a remarkably simple but stupid > hack, where I just ping -I eth0 GATEWAY0 and ping -I eth1 GATEWAY1 > every thirty seconds or so and switch default routes if an interface > is down. This obviously does not solve the problem, nor allow > bandwidth to be shared across both lines. > > Any help would be greatly appreciated. > > Seth J. Blank > Systems Operations > Capital Market Services, LLC > > > > _______________________________________________ > LARTC mailing list / LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/ >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Witaj Seth, W Twoim liście datowanym 13 października 2003 (18:24:08) można przeczytać: SJB> Sorry, I really wasn''t paying attention when I wrote this (i.e. I''ve had SJB> no sleep). SJB> I have the routing tables working properly for the internal network. SJB> What I need to do is have the routing tables update the gateways when a SJB> line is down. SJB> i.e. intranet ----- firewall ----- router1 ----- internet SJB> \-- router2 ----- internet SJB> Currently, I have the gateway from the firewall being nexthops between SJB> router1 and router2. This works fine. But what I need to do is have the SJB> firewall check the links between router1/2 and the internet and switch SJB> gateways if a line is down. SJB> What I want to do, but can''t figure out how to, is send out a packet SJB> through router1 and see if it gets an arbitrary number of hops (probably SJB> 3) out. If not, switch the default route to use the other gateway. This SJB> needs to be done for both gateways, and there also needs to be a route SJB> to restore the gateways when the line goes back up. I have a load balancing setup for 3 uplinks (3 different providers and technologies) w/failover set with http://www.ssi.bg/~ja/ Nano-HOWTO (carefully done By-The-Book - any shortcut and it''s gone). When you need to check if the net is reachable with either of the links just try to ping some machines outside (a set would be nice) forcing an output address to be one or the other and decide if you need to change normal multihop gateway to single hop one via link 1 or 2. Should work with nano, because it''s preserving output address thus preserving the routes. Works for me (after some sleepless nights, tons of caffe :). I can pull the plug out and nothing bad happens (only the traffic shaping needs some correction). [cut the rest] -- Pozdrowienia, Robert _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Robert Kurjata wrote:>I have a load balancing setup for 3 uplinks (3 different providers and >technologies) w/failover set with http://www.ssi.bg/~ja/ Nano-HOWTO >(carefully done By-The-Book - any shortcut and it''s gone). > >I have finished implementing this step by step, and things still do not appear to be working. During the testing phase, I have two problems (output which differs from what the howto says I should get). 1) When I run "ip route list table main", only the proper entries for NWE1/NME1 and NWE2/NME2 come up, not the one for NWI/NMI. 2) "ip route get from (IPE1|IPE2) to 204.152.189.113" both return "network unreachable" All the other output matches exactly. My only thoughts are that I''ve swapped an IP or two somewhere, but I''ve been over the script a ton of times already, and nothing presents itself to me. Any help or troubleshooting hints would be greatly appreciated. Seth J. Blank Systems Operations Capital Market Services, LLC>When you need to check if the net is reachable with either of the >links just try to ping some machines outside (a set would be nice) >forcing an output address to be one or the other and decide if you >need to change normal multihop gateway to single hop one via link 1 or >2. Should work with nano, because it''s preserving output address thus >preserving the routes. Works for me (after some sleepless nights, tons >of caffe :). I can pull the plug out and nothing bad happens >(only the traffic shaping needs some correction). > > >[cut the rest] > > >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
"Seth J. Blank" wrote:> I have finished implementing this step by step, and things still do not > appear to be working. > > During the testing phase, I have two problems (output which differs from > what the howto says I should get). > 1) When I run "ip route list table main", only the proper entries for > NWE1/NME1 and NWE2/NME2 come up, not the one for NWI/NMI.The placement within the script of the line: ip route del default table main is probably what is killing this. When the interface is brought up, an entry is made into the main table. You''re purging that entry. So arrange things so that IFI comes up AFTER the del The lo device should be in main also. Another possibility is that the original main table did not have the IFI entry when you ran ip rule add prio ### table main The final thing that comes to mind is that you did not even execute ip link set $IFI up ip addr flush dev $IFI ip addr add $IPI/$NMI brd + dev $IFI (or ip addr add $IPI/$NMI brd $BRDI dev $IFI) # this is the line that populates main> 2) "ip route get from (IPE1|IPE2) to 204.152.189.113" both return > "network unreachable"IS the network reachable?!! My $0.25 is on IFI1 being dead. Try ping -c1 -I eth# 204.152.189.113 where "#" is set to the first interface.> Any help or troubleshooting hints would be greatly appreciated. > > Seth J. Blankbuck _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Yeah, I figured out the problem (stupid mistake on my end) and everything is working now. With one exception. If I pull the cat5 out of eth0 (external interface 1) then everything just hangs. No connections can be made, etc. Pulling the cat5 out of eth1 (external interface 2) has no effect. The connection stays like this until eth0 is plugged back in (it picks back up immediately) What this suggests to me is that , even though I''m using the two nexthops, all the data is trying to go over eth0, and nothing is being sent over eth1. ... And I just confirmed this with iptraf. So the question now is, why aren''t the nexthops working? I patched the kernel, followed the nano howto precisely, and can use both interfaces just fine (ping -I eth0/1, etc.). If I set the default route to either eth0 or eth1, everything works fine. But with the nexthops, it does not appear as if the load is being balanced. Here is my table): default proto static nexthop via GW1 dev eth0 weight 1 nexthop via GW2 dev eth1 weight 1 Any thoughts? Thanks a ton for all your help so far, Seth gypsy wrote:>"Seth J. Blank" wrote: > > >>I have finished implementing this step by step, and things still do not >>appear to be working. >> >>During the testing phase, I have two problems (output which differs from >>what the howto says I should get). >>1) When I run "ip route list table main", only the proper entries for >>NWE1/NME1 and NWE2/NME2 come up, not the one for NWI/NMI. >> >> > >The placement within the script of the line: > ip route del default table main >is probably what is killing this. When the interface is brought up, an >entry is made into the main table. You''re purging that entry. So >arrange things so that IFI comes up AFTER the del > >The lo device should be in main also. > >Another possibility is that the original main table did not have the IFI >entry when you ran > ip rule add prio ### table main > >The final thing that comes to mind is that you did not even execute > ip link set $IFI up > ip addr flush dev $IFI > ip addr add $IPI/$NMI brd + dev $IFI > (or ip addr add $IPI/$NMI brd $BRDI dev $IFI) # this is the line >that populates main > > > >>2) "ip route get from (IPE1|IPE2) to 204.152.189.113" both return >>"network unreachable" >> >> > >IS the network reachable?!! My $0.25 is on IFI1 being dead. Try > ping -c1 -I eth# 204.152.189.113 >where "#" is set to the first interface. > > > >>Any help or troubleshooting hints would be greatly appreciated. >> >>Seth J. Blank >> >> >buck > >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Another weird piece of information to add. If I ifdown eth0, everything starts being routed over eth1. But if I just yank the cord out of eth0, the system sits there trying to route over eth0. This persists for much longer than the 60 seconds it should take, max, for the kernel to update the routing tables. And it''s still confusing me why the traffic isn''t being split evenly between eth0 and eth1 (iptraf shows everything going over eth0, no traffic at all on eth1). Thank you all so much for your help, Seth _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Hi Seth, I cant find anything more than posting my working script for load balancing over two links (it was for three links and I home I didn''t remove too much). It has been done strictly by the rules on Nano-HOWTO and works. The main part is the PING section at the end. This ensures that kernel sees dead gateways and recovers. But of course it WILL NOT work without some kernel patching (dead gateway detection, static routes - just use a Jumbo Patch from http://www.ssi.bg/~ja/ ). A final word is: the routers didn''t even have to respond to pings. They need to respond to ARPS. This stuff doesn''t work properly for PPP or PPPoE connections as they usually are NoARP. I also have some shaping done with TC/CBQ on both links. VERY IMPORTANT: all the testing is USELESS if you have less than 40-50 users doing lots of requests to different sites as a routes are just cached in kernel. In my system even with 10-20 users balancing is usually poor improving greatly with number of users - the diference between links lowers down to 10%. Hopefully I will get some free time to write a step-by-step howto because it took me some time to understand the thing. Home this helped someone, Greetings to the list ---------------------------cut here------------------------------------------ #!/bin/bash # This script is done by : Robert Kurjata Sep, 2003. # feel free to use it in any usefull way # CONFIGURATION IP=/sbin/ip PING=/bin/ping #--------------- LINK PART ----------------- # EXTIFn - interface name # EXTIPn - outgoing IP # EXTMn - netmask length (bits) # EXTGWn - outgoing gateway #------------------------------------------- # LINK 1 EXTIF1=eth2 EXTIP1EXTM1EXTGW1 # LINK 2 EXTIF2=eth1 EXTIP2EXTM2EXTGW2 #ROUTING PART # removing old rules and routes echo "removing old rules" ${IP} rule del prio 50 table main ${IP} rule del prio 201 from ${EXTIP1}/${EXTM1} table 201 ${IP} rule del prio 202 from ${EXTIP2}/${EXTM2} table 202 ${IP} rule del prio 221 table 221 echo "flushing tables" ${IP} route flush table 201 ${IP} route flush table 202 ${IP} route flush table 221 echo "removing tables" ${IP} route del table 201 ${IP} route del table 202 ${IP} route del table 221 # setting new rules echo "Setting new routing rules" # main table w/o default gateway here ${IP} rule add prio 50 table main ${IP} route del default table main # identified routes here ${IP} rule add prio 201 from ${EXTIP1}/${EXTM1} table 201 ${IP} rule add prio 202 from ${EXTIP2}/${EXTM2} table 202 ${IP} route add default via ${EXTGW1} dev ${EXTIF1} src ${EXTIP1} proto static table 201 ${IP} route append prohibit default table 201 metric 1 proto static ${IP} route add default via ${EXTGW2} dev ${EXTIF2} src ${EXTIP2} proto static table 202 ${IP} route append prohibit default table 202 metric 1 proto static # mutipath ${IP} rule add prio 221 table 221 ${IP} route add default table 221 proto static \ nexthop via ${EXTGW1} dev ${EXTIF1} weight 2\ nexthop via ${EXTGW2} dev ${EXTIF2} weight 3 ${IP} route flush cache while : ; do ${PING} -c 1 ${EXTGW1} ${PING} -c 1 ${EXTGW2} sleep 60 done ---------------------------cut here------------------------------------------ -- Pozdrowienia, Robert Kurjata _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Thanks Robert, that''s almost exactly what I had (I didn''t have ip route flush cache). The problem is, everything is routing fine, and the data is being split evenly over eth0 and eth1, but as soon as I pull the cable out of eth0 (pulling it out of eth1 doesn''t seem to matter) the connection goes out and the routes never recover until I plug the cable back in (at which point things start flowing perfectly again without any prompting from me). On the other hand, if I ifdown eth0, the routes switch over silently. As soon as I bring eth0 back up, data''s going over both eth0 and eth1 again. In other words, things are working almost exactly as they should be, but when the cat5 comes out, things just die. Someone suggested that I use mii tools and just ifdown eth0 if it''s out, and that might work, but I''d really rather have a solution done solely within routing tables if possible. The other reason I want to do this from the routing tables is because I expect any problems to be further down the line than the cable into the firewall. The network will be set up like this: intranet eth2 --- firewall --- eth0 --- router1 --- internet \-- eth1 --- router2 --- internet When the connection from router1 to the internet goes down, I need the firewall to stop sending data over eth0 and commit fully to eth1. When that link comes back up, I need the routes restored. Same for the other way around. The way I was thinking of doing this was by sending out an ICMP packet (say, to google.com) over each interface with a TTL of 3, and if it didn''t come back, change the route. But both the nano howto and the dead gateway detection howto seem to say that the routes as I have them (and you put them) should be able to handle this problem already. My problem is that it obviously doesn''t. If it did, pulling the cable out of eth0 wouldn''t cause such an issue. So I guess what I''m asking is, does anyone have any suggestions about how to troubleshoot this problem? Thanks so much everyone, Seth Robert Kurjata wrote:>Hi Seth, > >I cant find anything more than posting my working script for load >balancing over two links (it was for three links and I home I didn''t >remove too much). It has been done strictly by the rules on >Nano-HOWTO and works. The main part is the PING section at the end. >This ensures that kernel sees dead gateways and recovers. >But of course it WILL NOT work without some kernel patching (dead >gateway detection, static routes - just use a Jumbo Patch from >http://www.ssi.bg/~ja/ ). > >A final word is: the routers didn''t even have to respond to pings. >They need to respond to ARPS. This stuff doesn''t work properly for PPP >or PPPoE connections as they usually are NoARP. > >I also have some shaping done with TC/CBQ on both links. > >VERY IMPORTANT: all the testing is USELESS if you have less than 40-50 >users doing lots of requests to different sites as a routes are just >cached in kernel. In my system even with 10-20 users balancing is >usually poor improving greatly with number of users - the diference >between links lowers down to 10%. > >Hopefully I will get some free time to write a step-by-step howto >because it took me some time to understand the thing. > >Home this helped someone, Greetings to the list >---------------------------cut here------------------------------------------ > >#!/bin/bash ># This script is done by : Robert Kurjata Sep, 2003. ># feel free to use it in any usefull way > ># CONFIGURATION >IP=/sbin/ip >PING=/bin/ping > >#--------------- LINK PART ----------------- ># EXTIFn - interface name ># EXTIPn - outgoing IP ># EXTMn - netmask length (bits) ># EXTGWn - outgoing gateway >#------------------------------------------- > ># LINK 1 >EXTIF1=eth2 >EXTIP1>EXTM1>EXTGW1> ># LINK 2 >EXTIF2=eth1 >EXTIP2>EXTM2>EXTGW2> >#ROUTING PART ># removing old rules and routes > >echo "removing old rules" >${IP} rule del prio 50 table main >${IP} rule del prio 201 from ${EXTIP1}/${EXTM1} table 201 >${IP} rule del prio 202 from ${EXTIP2}/${EXTM2} table 202 >${IP} rule del prio 221 table 221 >echo "flushing tables" >${IP} route flush table 201 >${IP} route flush table 202 >${IP} route flush table 221 >echo "removing tables" >${IP} route del table 201 >${IP} route del table 202 >${IP} route del table 221 > ># setting new rules >echo "Setting new routing rules" > ># main table w/o default gateway here >${IP} rule add prio 50 table main >${IP} route del default table main > ># identified routes here >${IP} rule add prio 201 from ${EXTIP1}/${EXTM1} table 201 >${IP} rule add prio 202 from ${EXTIP2}/${EXTM2} table 202 > >${IP} route add default via ${EXTGW1} dev ${EXTIF1} src ${EXTIP1} proto static table 201 >${IP} route append prohibit default table 201 metric 1 proto static > >${IP} route add default via ${EXTGW2} dev ${EXTIF2} src ${EXTIP2} proto static table 202 >${IP} route append prohibit default table 202 metric 1 proto static > ># mutipath >${IP} rule add prio 221 table 221 > >${IP} route add default table 221 proto static \ > nexthop via ${EXTGW1} dev ${EXTIF1} weight 2\ > nexthop via ${EXTGW2} dev ${EXTIF2} weight 3 > >${IP} route flush cache > > >>while : ; do > ${PING} -c 1 ${EXTGW1} > ${PING} -c 1 ${EXTGW2} > sleep 60 >done > >---------------------------cut here------------------------------------------ > > >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/