(I know that what I''m wanting to do can be done, but for some reason I can not get it to work for the life of me. I think I have been staring at it too long and too closely.) I have two different internet connections from two cooperating ISPs. I also have a small 8 block of IPs that are globally routable that both ISPs will route to me via my world facing globally routable IPs that I have with them. I.e. ISP A has a route to 75.19.28.7/29 via 12.34.56.78 and ISP B has a route to 75.19.28.7/29 via 87.65.43.21. I want to use one ISP as the primary default gateway and the other ISP as a backup default gateway. That is to say I want to *NOT* use load balancing rather just redundancy in this situation. I do *NOT* need to use NAT because I do have the globally routable IP address on *ALL* interfaces. I.e. eth0: 75.19.28.6 (DMZ) eth1: 12.34.56.78 (ISP A) eth2: 87.65.43.21 (ISP B) I want this router to use the default gateway for ISP A of 12.34.56.254 and only use the default gateway of ISP B 78.65.43.1 if the default gateway of ISP A can not be reached. If I set up the interfaces with their IPs and subnets and set up multiple default routes with varying metrics (for priority) and test by taking an interface down, things work. However, this is not a realistic test because the interface will never physically go down. For the sake of discussion, let one link be a DSL modem and the other link be a cable modem. Each of the links is an external modem that uses an ethernet cable to connect in to the router. Thus no matter what the state of the link coming in to my facility is, the link on the Linux router will always be up b/c the ethernet between the router and the modems sitting on the next shelf down will always be up. I need a way for the Linux kernel to try to use a default gateway and switch to another one if it does not see any traffic. Any help that any one could offer will be greatly appreciated. Thanks in advance, Grant. . . .
Use a ping script, which pings some IP every minute or so. Ping can bind to a specific interface. Ping -c 1 -w 1 -I eth1 $SOME_IP Ping -c 1 -w 1 -I eth2 $SOME_IP Check for return values for those pings. Change your default routes based on the ping results. This is the basic idea. You can add many other things to this, more IPs, more counts, change time interval... (Better use IPs than domain names, so that DNS queries won''t have problem)> -----Original Message----- > From: lartc-bounces@mailman.ds9a.nl[mailto:lartc-bounces@mailman.ds9a.nl]> On Behalf Of Grant Taylor > Sent: Thursday, June 21, 2007 3:06 PM > To: Mail List - Linux Advanced Routing and Traffic Control > Subject: [LARTC] Redundant internet connections. > > (I know that what I''m wanting to do can be done, but for some reason I > can not get it to work for the life of me. I think I have beenstaring> at it too long and too closely.) > > I have two different internet connections from two cooperating ISPs.I> also have a small 8 block of IPs that are globally routable that both > ISPs will route to me via my world facing globally routable IPs that I > have with them. I.e. ISP A has a route to 75.19.28.7/29 via12.34.56.78> and ISP B has a route to 75.19.28.7/29 via 87.65.43.21. > > I want to use one ISP as the primary default gateway and the other ISP > as a backup default gateway. That is to say I want to *NOT* use load > balancing rather just redundancy in this situation. > > I do *NOT* need to use NAT because I do have the globally routable IP > address on *ALL* interfaces. > > I.e. > eth0: 75.19.28.6 (DMZ) > eth1: 12.34.56.78 (ISP A) > eth2: 87.65.43.21 (ISP B) > > I want this router to use the default gateway for ISP A of12.34.56.254> and only use the default gateway of ISP B 78.65.43.1 if the default > gateway of ISP A can not be reached. > > If I set up the interfaces with their IPs and subnets and set up > multiple default routes with varying metrics (for priority) and testby> taking an interface down, things work. However, this is not arealistic> test because the interface will never physically go down. > > For the sake of discussion, let one link be a DSL modem and the other > link be a cable modem. Each of the links is an external modem thatuses> an ethernet cable to connect in to the router. Thus no matter whatthe> state of the link coming in to my facility is, the link on the Linux > router will always be up b/c the ethernet between the router and the > modems sitting on the next shelf down will always be up. > > I need a way for the Linux kernel to try to use a default gateway and > switch to another one if it does not see any traffic. > > Any help that any one could offer will be greatly appreciated. > > > > Thanks in advance, > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 06/21/07 02:46, Russell Stuart wrote:> Well, it may be that you are connected to the modem by Ethernet, but > that doesn''t mean you can''t arrange to know if the link is up or > down.If you are familiar with Cisco, there is a physical link, and a protocol link. I''m ending with an (physical link) Up / (protocol down) Down scenario, which can not be detected by Linux''s device state.> For DSL, you can run PPPoE on your Linux box. That way you will know > when your link is down because the PPPoE connection dies, taking all > routes with it. I do this. It works. In the case of a cable modem > you can request a short dhcp-lease-time (see the option of that name > in dhcp-options(5)) which achieves the same thing. This is by far > the best solution because it reacts quickly, and altering of the > routing table happens automagically as the links go up and down.Ugh! Besides the fact that this is not possible (in my scenario) it is in my opinion, EXTREMELY sub-optimal. Don''t even get me started on PPPoE. There is also the fact that the DHCP leases would have to be sub-minute in length to even remotely come close to working for this.> Assuming this isn''t possible for some reason the only other way to do > this is manually. Ie, you monitor the link somehow. There are any > number of ways you can do this. One nice way is use Nagios to > monitor the link. This is nice because Nagios can do things when the > link goes down and comes back up again - like altering your routing > table. Nagios is also good because it allows for some hysteresis, ie > waiting for a few failed pings before taking action. And it can > report what happened by SMS or whatever. There are a lot of Nagios > type monitoring systems out there, maybe you use one. Failing a home > baked shell script will work just as well. It would just use say: > ping -n -q -c 1 -w 120 -i 20 -I a.d.d.r next.hop.addr in a continuous > loop to verify the link is up.Double Ugh! Why do I need to implement a daemon to do this when just about every other OS that I work with will purportedly do this its self. Linux can purportedly do this too supposedly with Dead Gateway Detection and / or Equal Cost Multipath Routing or some combination there of. No, I feel like there is a way to do this, I''m just over looking it. If I do need to go back to this method, I''ll completely re-design what needs to be done or switch to a different router OS (Free/Net BSD?) to do this.> Finally, be careful in how you set up your routing. You want to > avoid asymmetric routing, and that will happen by default when > someone connects to your backup link unless you take special steps to > avoid it.Actually, asymmetric routes are what I want to use in the event traffic does go to the backup route while the primary is up and running. Keep in mind that no one will be connecting to any of the IP addresses assigned to the router (save for router management) but rather the globally routable IP addresses in the DMZ behind said router. Grant. . . .
Grant Taylor wrote:> I need a way for the Linux kernel to try to use a default gateway and > switch to another one if it does not see any traffic.I don''t know about any working in-kernel solutions, but you can do it trivially with netfilter and a cronjob: * In netfilter do this: -t mangle -N ispA -t mangle -A ispA -j RETURN -t mangle -N ispB -t mangle -A ispB -j RETURN -t mangle -A PREROUTING -i $ifA -s ! a.a.a.a/aa -j ispA -t mangle -A PREROUTING -i $ifB -s ! b.b.b.b/bb -j ispB where a.a.a.a and b.b.b.b are subnets describing your first 1 - 2 hops, so traffic from your upstream router will not count. * Then make a cron job that run this every minute: iptables -t mangle -vnxZL isp[AB] and will look for the first number on the third line. If it is not 0 - the link is alive, otherwise change the routing tables accordingly. Of course you can have up to 1 minute of downtime, but it does not look so bad IMO. HTH Peter
On 06/21/07 10:35, Peter Rabbitson wrote:> I don''t know about any working in-kernel solutions, but you can do it > trivially with netfilter and a cronjob:<snip> If I understand what you are proposing correctly, it looks like you are jumping to a sub-chain used used only for counting traffic. If the counters show traffic, you are saying that traffic is flowing across the link and thus the link must be up and functional. Right? If the link is not up and functional the take action to not use that link. I''m also not clearly understanding how matching the source IP will work on either link considering that both links will have the capability to pass traffic for the same globally routable DMZ subnet. Though I think this could be mitigated by altering the rules to count packets going out or coming in an interface rather than based on source / destination IP.> Of course you can have up to 1 minute of downtime, but it does not look > so bad IMO.One minute may or may not be bad. I know that it is a long time (when you are trying to ssh) but automatic failover is better than manual. And the one minute will probably be much faster than manual failover. Grant. . . .
Grant Taylor wrote:> On 06/21/07 10:35, Peter Rabbitson wrote: >> I don''t know about any working in-kernel solutions, but you can do it >> trivially with netfilter and a cronjob: > > <snip> > > If I understand what you are proposing correctly, it looks like you are > jumping to a sub-chain used used only for counting traffic. If the > counters show traffic, you are saying that traffic is flowing across the > link and thus the link must be up and functional. Right?Almost correct> If the link is not up and functional the take action to not use that link.This is not something I do automatically in netfilter - it is a responsibility of the cron job.> I''m also not clearly understanding how matching the source IP will work > on either link considering that both links will have the capability to > pass traffic for the same globally routable DMZ subnet. Though I think > this could be mitigated by altering the rules to count packets going out > or coming in an interface rather than based on source / destination IP.I am counting only INcomming traffic (the -i flag). The source matching is there only for the following reason: consider You ->1-> Uplink router ->2-> Internet If hop 2 is down, then the uplink router might send you back ICMP messages that whatever destination you are trying to reach is unreachable. This will count as traffic from the internet, whereas in fact it isn''t. This is why you need to exclude (thus the _!_ in -s) the immediate uplink hops, and count incomming traffic (whatever it might be) from the "far side" of the internet only.
On 06/21/07 11:00, Peter Rabbitson wrote:> This is not something I do automatically in netfilter - it is a > responsibility of the cron job.*nod*> I am counting only INcomming traffic (the -i flag). The source matching > is there only for the following reason: consider > > You ->1-> Uplink router ->2-> Internet > > If hop 2 is down, then the uplink router might send you back ICMP > messages that whatever destination you are trying to reach is > unreachable. This will count as traffic from the internet, whereas in > fact it isn''t. This is why you need to exclude (thus the _!_ in -s) the > immediate uplink hops, and count incomming traffic (whatever it might > be) from the "far side" of the internet only.Ah, here is part of the problem. ( eth1 ) --- (DSL Modem) / DSL Gateway Server --- (DMZ) --- (Linux Router) ( eth2 ) --- (Cable Modem / Cable Gateway Note: Globally routable DMZ is connected to eth0. Traffic will be to / from servers in the DMZ and clients on the internet at large. My "Linux Router" (above) *IS* the system that would send the ICMP ... unreachable message. So, there is not an upstream router to look for traffic from. I suppose that I could match traffic coming in eth1 or eth2, but I would have to be careful about he source / destination. However the very existence of inbound traffic means that the link is up for at least inbound traffic. However I also need to know that I can send traffic too. I''ve had situations where the traffic would come in but not go out (Do NOT ask how why!). I suppose such monitoring will work, but I still feel like there is a better solution out there. There is also the fact that I am wanting to use one route unless it is down and then use the backup. If the primary route is up and traffic comes in the backup, it is to go back out the primary. Grant. . . .
Grant Taylor wrote:> On 06/21/07 11:00, Peter Rabbitson wrote: > Ah, here is part of the problem. > > ( eth1 ) --- (DSL Modem) / DSL Gateway > Server --- (DMZ) --- (Linux Router) > ( eth2 ) --- (Cable Modem / Cable Gateway > > Note: Globally routable DMZ is connected to eth0. > > Traffic will be to / from servers in the DMZ and clients on the internet > at large. > > My "Linux Router" (above) *IS* the system that would send the ICMP ... > unreachable message. So, there is not an upstream router to look for > traffic from. > > I suppose that I could match traffic coming in eth1 or eth2, but I would > have to be careful about he source / destination. However the very > existence of inbound traffic means that the link is up for at least > inbound traffic. However I also need to know that I can send traffic > too.You are misunderstanding how ICMP works. The modems themselves are hops, and the thing they connect to is another hop. Just look at the first several entries of a traceroute to any destination, and you will see what I mean. If you still do not believe me - pull the ISP side cable from the modem, while still having your router connected to it, and try to do a ping to somewhere. Look at the source of the dest. unreachable message - it will come from the modem, not from the linux box.> I''ve had situations where the traffic would come in but not go out > (Do NOT ask how why!).This would be a problem with your router configuration. It is virtually impossible to have an upstream problem that would cause this. It either works both ways or does not at all.> I suppose such monitoring will work, but I still feel like there is a > better solution out there.I thought so too, but it seems that the only thing that comes close (and still does not cut it) are the DGD patches. And (this is my personal opinion) the fact they have not been included in the kernel for such a long time, indicates there is something fishy about them. I myself am using a different approach as I am doing load balancing as well. A script sends icmp ping packets with large payloads to several destinations and computes the mean rtt. Then the ratio of both rtts is used to assign link weights. When no pings come back one of the weights will be 0, and effectively no routing will be performed through this link.> There is also the fact that I am wanting to use one route unless it is > down and then use the backup. If the primary route is up and traffic > comes in the backup, it is to go back out the primary. >Nothing above prevents you from doing this, although it is a bad idea. Of course if you know what you are doing and still want to do it - it''s your system :)
On 06/21/07 11:47, Peter Rabbitson wrote:> You are misunderstanding how ICMP works. The modems themselves are hops, > and the thing they connect to is another hop. Just look at the first > several entries of a traceroute to any destination, and you will see > what I mean. If you still do not believe me - pull the ISP side cable > from the modem, while still having your router connected to it, and try > to do a ping to somewhere. Look at the source of the dest. unreachable > message - it will come from the modem, not from the linux box.Um, if you are using bridging modems (like I am) you are incorrect. If you are using modem router combos, yes. Every single install that I have used bridging modems on between the Linux router and the ISP acts the same way. If I have a workstation behind a Linux router (that is doing basic NATing) connected to a bridging DSL / Cable modem and I unplug the phone line or the coax cable from the modem, it is the Linux box that sends the ICMP message, NOT the modem. This is as expected too. The bridging modems bridge the traffic from the ethernet to the DSL / cable modem which is in turn bridged from DSL / cable back to a network interface at the ISP. Thus there is one broadcast domain between the Linux router and the ISPs router. Thus there is not IP device between the Linux router and the ISP router to send an ICMP message back. No, again, if you are dealing with modem router combos, I''ll grant you what you say, but not on bridging modems.> This would be a problem with your router configuration. It is virtually > impossible to have an upstream problem that would cause this. It either > works both ways or does not at all.No, it was not a fault with my router. It was a fault radio in an (W)ISPs core network. Completely out of my control. When the ISP replaced the piece of equipment in their core (not even on the link to me) things started working correctly again.> I thought so too, but it seems that the only thing that comes close (and > still does not cut it) are the DGD patches. And (this is my personal > opinion) the fact they have not been included in the kernel for such a > long time, indicates there is something fishy about them.I agree that something is not quite right about the DGD patches. Though, I''ve applied them to 2.6.21.5 and did not have any more luck with them, so I''m not sure that there is much use for them. However I think that the DGD tests and failures there is were related to my config not being right.> I myself am using a different approach as I am doing load balancing as > well. A script sends icmp ping packets with large payloads to several > destinations and computes the mean rtt. Then the ratio of both rtts is > used to assign link weights. When no pings come back one of the weights > will be 0, and effectively no routing will be performed through this link.*nod* I am presently using dual load balanced SDSL circuits with automated (OSPF) failover at my office. This is working out VERY well. However the questions I''m asking have to do with a project for a different client.> Nothing above prevents you from doing this, although it is a bad idea. > Of course if you know what you are doing and still want to do it - it''s > your system :)The contracts for the connections dictate that one is only used as a backup. If the primary is up any and all traffic outbound is to go out over it. So, if traffic comes in over the backup, returning out bound traffic is to go out the primary. Seeing as how the DMZ behind this router is globally routable, I''m not worried about issues with asymmetric routes. There are asymmetric routes in the core all the time. In my opinion, it is only at the edge where you NAT that you have to maintain IP addresses and thus have to be very careful and avoide asymmetric routes. Also, seeing as how both circuits are an ethernet connection that can carry a frame size / MTU of 1500 byes, I don''t see the problems that would be introduced by encapsulated traffic like PPPoE for one link verses the other link. In short, I''m willing to listen to problems with the asymmetric routes, but I have yet to hear any thing that concerns me or even chafes me a little. Grant. . . .
Grant Taylor wrote:> No, again, if you are dealing with modem router combos, I''ll grant you > what you say, but not on bridging modems.*nod* I had several cases when my ISP had problems like the one ou describe below, so the first 2 hops were pingable but nothing outside.l This is why I suggested the entire ISP subnet exclusion, just to be on the safe side.>> This would be a problem with your router configuration. It is >> virtually impossible to have an upstream problem that would cause >> this. It either works both ways or does not at all. > > No, it was not a fault with my router. It was a fault radio in an > (W)ISPs core network. Completely out of my control. When the ISP > replaced the piece of equipment in their core (not even on the link to > me) things started working correctly again.I got to give you this one. Murphy at work.> *nod* I am presently using dual load balanced SDSL circuits with > automated (OSPF) failover at my office. This is working out VERY well. > However the questions I''m asking have to do with a project for a > different client.No contest here either. It''s just rather rare for a small scale end-user to be able to get access to IGPs.> asymmetric routes. Also, seeing as how both circuits are an ethernet > connection that can carry a frame size / MTU of 1500 byes, I don''t see > the problems that would be introduced by encapsulated traffic like PPPoE > for one link verses the other link. In short, I''m willing to listen to > problems with the asymmetric routes, but I have yet to hear any thing > that concerns me or even chafes me a little. >I misread the part about the stuff behind the router being routable. There is nothing wrong with asymmetric routing in this case. However you bring up an interesting point about MTU, only to dismiss it right there. I think you will have a problem with the default MTU of 1500 being combined with the effective MTU of PPPoE links being 1492. Too many systems in this day and age have PMTU discovery enabled, and you know what is the current state of ICMP messaging on the net. Peter
On 06/21/07 12:37, Peter Rabbitson wrote:> *nod* I had several cases when my ISP had problems like the one you > describe below, so the first 2 hops were pingable but nothing outside. > This is why I suggested the entire ISP subnet exclusion, just to be on > the safe side.*nod*> I got to give you this one. Murphy at work.Ya, Murphy and I go back a long way. I can usually tell when I''m on the right track to solving a problem. If I''m about to beat something, I start having other little problems, i.e. batteries in equipment going out, not having the proper patch cord (strait through verses cross over), not having proper user name and / or password for equipment, etc. I''ve gotten to the point that I rather like seeing such speed bumps because I have noticed that they are usually an indication that I''m at least going the right direction.> No contest here either. It''s just rather rare for a small scale end-user > to be able to get access to IGPs.Well, just because OSPF is what is used does not mean that I have access to the IGP. To make things work, I''m having to have my ISP co-locate a piece of their equipment at my facility so they are using the IGP with in their administrative domain. I pick up from the single ethernet interface out of said equipment. It''s just a political / administrative paradigm shift, but it does allow the circuits to do what I want them to do and rather nicely at that I might add.> I misread the part about the stuff behind the router being routable. > There is nothing wrong with asymmetric routing in this case. However you > bring up an interesting point about MTU, only to dismiss it right there. > I think you will have a problem with the default MTU of 1500 being > combined with the effective MTU of PPPoE links being 1492. Too many > systems in this day and age have PMTU discovery enabled, and you know > what is the current state of ICMP messaging on the net.*nod* I figured that the globally routable DMZ IPs was not sinking in so I tried re-stating it differently to see if it would make it. ;) Both of my links use statically assigned IP addresses on the raw ethernet interfaces. Thus there is no encapsulation (MTU) overhead to worry about, i.e. no PPPoE. Seeing as how I''m running MTUs of 1500 out my interfaces to the world and at least that or larger in to the ISP (ATM links have 4470 (set for something else some time previous) I don''t think MTU issues will be on my end. Incidentally, this is one of the reasons that I try to avoid PPPoE if I can. Well MTU and the fact that our local incumbent phone company as an ISP likes to tare down the PPPoE connections after less than 60 seconds of inactivity *WITH OUT* notifying the client end. Thus our only reliable recourse is to tare down the connection on the client end before the ILEC does so that we know the state and can re-establish it on demand when needed. Grant. . . .
On Thu, Jun 21, 2007 at 05:35:13PM +0200, Peter Rabbitson wrote:> Grant Taylor wrote: > > >I need a way for the Linux kernel to try to use a default gateway and > >switch to another one if it does not see any traffic.should something like this work default proto static metric 5 nexthop via 58.173.108.1 dev vlan2 weight 10 nexthop via 10.20.20.106 dev ppp0 weight 20 and then let the dgd detect dead gateways and drop the relevant route about.> > I don''t know about any working in-kernel solutions, but you can do it > trivially with netfilter and a cronjob: > > * In netfilter do this: > -t mangle -N ispA > -t mangle -A ispA -j RETURN > -t mangle -N ispB > -t mangle -A ispB -j RETURN > -t mangle -A PREROUTING -i $ifA -s ! a.a.a.a/aa -j ispA > -t mangle -A PREROUTING -i $ifB -s ! b.b.b.b/bb -j ispB > > where a.a.a.a and b.b.b.b are subnets describing your first 1 - 2 hops, > so traffic from your upstream router will not count. > > * Then make a cron job that run this every minute: > iptables -t mangle -vnxZL isp[AB] > and will look for the first number on the third line. If it is not 0 - > the link is alive, otherwise change the routing tables accordingly. > > Of course you can have up to 1 minute of downtime, but it does not look > so bad IMO. > > HTH > > Peter > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >_______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 06/21/07 16:01, Alex Samad wrote:> should something like this work > > default proto static metric 5 > nexthop via 58.173.108.1 dev vlan2 weight 10 > nexthop via 10.20.20.106 dev ppp0 weight 20 > > and then let the dgd detect dead gateways and drop the relevant route > about.Doesn''t this use "Equal Cost Multi Path" (ECMP) routing? If so, how does this take in to account that I do not want any of the traffic to run over the backup connection unless the primary is down? It is my understanding that the weights of an ECMP route are for a fraction of the traffic. I.e. 10/30 and 20/30 of the traffic will use each of the routes. (Note: I state 10/30 and 20/30 because the man page indicates that 10/30 does not equal 1/3. Namely because the kernel creates an in memory route for each weight for each route. Thus if you use a weight of 10, there will be 10 routes in memory.) Grant. . . .
On Thu, Jun 21, 2007 at 04:24:19PM -0500, Grant Taylor wrote:> On 06/21/07 16:01, Alex Samad wrote: > >should something like this work > > > >default proto static metric 5 > > nexthop via 58.173.108.1 dev vlan2 weight 10 > > nexthop via 10.20.20.106 dev ppp0 weight 20 > > > >and then let the dgd detect dead gateways and drop the relevant route > >about. > > Doesn''t this use "Equal Cost Multi Path" (ECMP) routing?sorry yep, just woken up, reading and answering whilst eating breakfast okay then why not default via preffered path default via backup path metric 100> > If so, how does this take in to account that I do not want any of the > traffic to run over the backup connection unless the primary is down? > > It is my understanding that the weights of an ECMP route are for a > fraction of the traffic. I.e. 10/30 and 20/30 of the traffic will use > each of the routes. > > (Note: I state 10/30 and 20/30 because the man page indicates that > 10/30 does not equal 1/3. Namely because the kernel creates an in > memory route for each weight for each route. Thus if you use a weight > of 10, there will be 10 routes in memory.) > > > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >_______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 06/21/07 17:18, Alex Samad wrote:> sorry yep, just woken up, reading and answering whilst eating breakfast*nod*> okay then why not > > default via preffered path > default via backup path metric 100I''ve done that with a metric of 0/1, and 1/2. The problem that I''m seeing is that the system will never try to use the second metric. It''s as if the system will never go to a next higher metric if it does not receive an error while trying to use a lower metric. Grant. . . .
On Thu, Jun 21, 2007 at 05:23:23PM -0500, Grant Taylor wrote:> On 06/21/07 17:18, Alex Samad wrote: > >sorry yep, just woken up, reading and answering whilst eating breakfast > > *nod* > > >okay then why not > > > >default via preffered path > >default via backup path metric 100 > > I''ve done that with a metric of 0/1, and 1/2. The problem that I''m > seeing is that the system will never try to use the second metric. It''s > as if the system will never go to a next higher metric if it does not > receive an error while trying to use a lower metric.Strange I am running openwrt on a linksys wr54gs with 1 cable and 1 adsl. I load balance, (also have julian patches applied - its 2.4.30), when the routing notices the link is dead, so if i do a ip li. then it marks the routes as dead and stops using them, once the interface is brought down the routes disappear I haven;t followed the dgd threads, but I seem to remember it having some problem with upstream detection. You talked about getting OSPF routing for this, is this from the ISP''s inbound as well as outbound. Wouldn''t OSPF handle link state as well ? (it been a while since I looked at OSPF)> > > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >_______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Ok, after more testing and trying things that others have suggested, I''ve made some headways. Or at least what I think is some head ways. This is not an answer, just data that I have gathered along the way to help others that are trying to help me. I have determined that either I can not get the DGD patches (routes-2.6.21-15.diff) off of Julian''s site to work the way that I think it should, or I''m using the wrong patch there from, or said patch does not work. I don''t know which, and I can''t really say one way or the other. If I compile a stock 2.6.21.5 kernel (plus patch to see my VMWare LSI SCSI card (should make no difference in routing)) with out ECMP or any advanced routing, I can get the system to fail to the next route after a period of time if the first is down. I do this by adding the two alternate routes with the same metric in reverse order that I want to use. I.e. if I have the following routes: a.b.c.d (preferred) and z.y.x.w (backup) I add the backup route and then the preferred route it will fail over after time. If I set /proc/sys/net/ipv4/route/gc_timeout to 10 seconds the system will fall back to the backup route in about 120 seconds. I''m still playing with numbers in the /proc tree. The problem with this method is that I have yet to get it to start re-using the primary route when it becomes available again. If I use the previously mentioned DGD patch, the system will just try to cache the route for something like 245 days. I''m still wondering if I am applying the correct patch. This happens with or with out ECMP compiled in to the kernel. Grant. . . .
On 06/21/07 17:30, Alex Samad wrote:> Strange I am running openwrt on a linksys wr54gs with 1 cable and 1 > adsl. I load balance, (also have julian patches applied - its > 2.4.30), when the routing notices the link is dead, so if i do a ip > li. then it marks the routes as dead and stops using them, once the > interface is brought down the routes disappearI am not wanting load balancing. Rather I want to use one link and only use the second if the first is down.> I haven;t followed the dgd threads, but I seem to remember it having > some problem with upstream detection.*nod* I''m getting that consensus.> You talked about getting OSPF routing for this, is this from the > ISP''s inbound as well as outbound. Wouldn''t OSPF handle link state as > well ? (it been a while since I looked at OSPF)The OSPF was for a different project / different installation. Grant. . . .
On Thursday 21 June 2007 18:02, Grant Taylor wrote:> On 06/21/07 11:47, Peter Rabbitson wrote: > > You are misunderstanding how ICMP works. The modems themselves are hops, > > and the thing they connect to is another hop. Just look at the first > > several entries of a traceroute to any destination, and you will see > > what I mean. If you still do not believe me - pull the ISP side cable > > from the modem, while still having your router connected to it, and try > > to do a ping to somewhere. Look at the source of the dest. unreachable > > message - it will come from the modem, not from the linux box. > > Um, if you are using bridging modems (like I am) you are incorrect.This is absolutetly the way to do it with ADSL. Using a modem in bridged mode minimizes the responsability of the modem/router which is a potentially unstable device. Let the stable Linux box do the work (routing+nat) and get the public IP. And firewall the Linux box itself with iptables. This is the most flexible and stable way to go. Cheers Gustavo -- Angulo Sólido - Tecnologias de Informação http://angulosolido.pt
(Off thread topic.) On 06/22/07 06:54, Gustavo Homem wrote:> This is absolutetly the way to do it with ADSL.I could not agree more.> Using a modem in bridged mode minimizes the responsability of the > modem/router which is a potentially unstable device. Let the stable > Linux box do the work (routing+nat) and get the public IP. And > firewall the Linux box itself with iptables. This is the most > flexible and stable way to go.*nod* About the only thing that I''m looking at doing differently at my house is to use the Thompson USB SpeedTouch (330) USB ADSL modem to put the ATM stack on the Linux box its self. This way the Linux kernel will handle the bridging and buffering verses an external device that has arbitrary pauses waiting for buffers to fill prior to transmitting data. My preliminary tests with the ATM stack on Linux show a speed increase over the external bridging modem too. :) My tests show that Linux / Windows think the raw ATM with bridging circuit will get close to 1.6 Mbps while the bridged devices get closer to 1.5 Mbps. I also see a lower latency between the device connected to the DSL and the upstream gateway by a factor of 3 - 5 ms. Grant. . . .
On Friday 22 June 2007 15:22, Grant Taylor wrote:> (Off thread topic.) > > On 06/22/07 06:54, Gustavo Homem wrote: > > This is absolutetly the way to do it with ADSL. > > I could not agree more. > > > Using a modem in bridged mode minimizes the responsability of the > > modem/router which is a potentially unstable device. Let the stable > > Linux box do the work (routing+nat) and get the public IP. And > > firewall the Linux box itself with iptables. This is the most > > flexible and stable way to go. > > *nod* About the only thing that I''m looking at doing differently at my > house is to use the Thompson USB SpeedTouch (330) USB ADSL modem to put > the ATM stack on the Linux box its self.I''ve done this, but I think it''s unreliable for professional use. The USB modems are non-standard so if one burns you can''t exchange it for a different one without feasible but time consuming tweaking (tried more then one USB devices...). Even for Ethernet briding devices I only use models which are delivered by ISPs (rather than retail shop devices), to garantee they were tested for stability: POTS: http://www.huawei.com/products/terminal/products/view.do?id=87 ISDN: http://www.acbs-dsl-store.com/contenu/Articles/Article.asp?PdtNum=DSLGP628LP These models run forever in bridged mode. The second one accepts multiple PPPoE clients on different ports.> This way the Linux kernel will > handle the bridging and buffering verses an external device that has > arbitrary pauses waiting for buffers to fill prior to transmitting data. > > My preliminary tests with the ATM stack on Linux show a speed increase > over the external bridging modem too. :) My tests show that Linux /That''s expectable since using PPPoA instead of PPPoEoA, reduces the overhead. But I don''t know a standard PPPoA setup. But if we want QoS working, we can''t use the full line capability anyway.> Windows think the raw ATM with bridging circuit will get close to 1.6 > Mbps while the bridged devices get closer to 1.5 Mbps. I also see a > lower latency between the device connected to the DSL and the upstream > gateway by a factor of 3 - 5 ms.Even if that happens, it would hardly compensate the risk of lower reliability. Cheers Gustavo -- Angulo Sólido - Tecnologias de Informação http://angulosolido.pt
On 06/22/07 09:57, Gustavo Homem wrote:> I''ve done this, but I think it''s unreliable for professional use. The > USB modems are non-standard so if one burns you can''t exchange it for > a different one without feasible but time consuming tweaking (tried > more then one USB devices...). > > Even for Ethernet briding devices I only use models which are > delivered by ISPs (rather than retail shop devices), to garantee they > were tested for stability: > > POTS: http://www.huawei.com/products/terminal/products/view.do?id=87 > > ISDN: http://www.acbs-dsl-store.com/contenu/Articles/Article.asp?PdtNum=DSLGP628LP > > > These models run forever in bridged mode. The second one accepts > multiple PPPoE clients on different ports. > > > That''s expectable since using PPPoA instead of PPPoEoA, reduces the > overhead. But I don''t know a standard PPPoA setup. > > But if we want QoS working, we can''t use the full line capability > anyway. > > Even if that happens, it would hardly compensate the risk of lower > reliability.All very valid points and things to consider. However for a home environment / non critical environment, it provides a lot of potential. Grant. . . .
On 06/21/07 17:35, Grant Taylor wrote:> The problem with this method is that I have yet to get it to start > re-using the primary route when it becomes available again.After doing some more testing and investigation, I think I know why the system appears to not be using the primary route. My test / lab setup consists of a Linux router with two subnets bound to one interface (eth0 and eth0:1) and my (VMWare) test Linux system with two ethernet interfaces bridged the the local LAN with one subnet on each interface. I have two (as far as Linux is concerned) physical interfaces so that I can have TX / RX counters for each interface to see which way the traffic is going out. This worked fine to have the system fall from the primary down to the secondary route when the primary route went away. However I never saw the traffic from the test Linux system back to the interface for the primary route. After doing some investigation I think this is because the same MAC address is used for both the primary and secondary routes, seeing as how both addresses are on the same physical interface on my Linux router. So, to test this, I took down the primary route, let the test Linux box fall back to the backup route, which it did. Then I brought the primary route back on line and waited. As expected the traffic did not start using the primary route, presumably because of MAC addresses for routes being cached with an association to a device. So, while the system was pinging out to the world with the primary route brought back up, I cleared entries from the local test Linux boxes ARP cache and all of the sudden, traffic started going out the correct interface. So, now I think that the method of having two equal cost (metric) routes on the box will work. I''m now going to test where the two routes are different MAC addresses to see if the traffic does indeed start using the proper rout again (Seeing as how there should not be any confusion with MAC addresses.) Grant. . . .
On 06/22/07 13:57, Grant Taylor wrote:> I''m now going to test where the two routes are different MAC > addresses to see if the traffic does indeed start using the proper > rout again.Ok, I have done it and it is working. The short answer is all you need to have backup routes is to enter them in reverse order. You do not need to do any special kernel options, patch the kernel or any thing else, or any special ip rules. All you need to do is to enter the routes in the reverse of the order that you want them to be used. For example, if I have two different internet connections, each with their own default gateway. Obviously the two default gateways have to not be on the same subnet. GW1: A.B.C.D GW2: Z.Y.X.W GW3: K.L.M.N route add default gw K.L.M.N route add default gw Z.Y.X.W route add default gw A.B.C.D Note: All the above routes are the same metric (default of 0). I do not know why you have to add the routes in reverse. I have just noticed that route adds the routes as the highest priority to the routing table. Filled from the top, not the bottom type thing. So, conversely add them in the reverse order. In my current test environment I have two identical VMWare virtual machines (literal copy from one to the other) that I have modified the configuration and tested. I''ll try to depict it below: ( ISP 1 ) --- ... --- ( ISP 1) --- ( Internet ) ( ) | (DMZ) --- ( Router ) ( Peering Link) ( ) | ( ISP 2 ) --- ... --- ( ISP 2) --- ( Internet ) In this scenario, the DMZ IP address space is from ISP 1. ISP 1 has a route to the DMZ via the ISP 1 IP address on my local Linux router. ISP 1 has a secondary route to the DMZ via the IP address on ISP 2s router over the peering link. ISP 2 has a route to the DMZ via the ISP 2 IP address on my local Linux router. The link between my local Linux router and ISP 1 is a high speed wireless link. The link between my local Linux router and ISP 2 is a lower speed ADSL link. The ADSL link from ISP 2 is *ONLY* used for backup access in case my local Linux router is unable to communicate with ISP 1s router. Thus if for some reason traffic does come in to my ISP 2 IP address it is to go back out the ISP 1 link, thus asymmetric routing. I appreciate all the suggestions that everyone submitted while trying to help resolve this issue. In the end it turned out that everything that was needed is already in the stock / vanilla kernel.org kernel. All I had to do was be smart enough to use it. Some points to help others with this issue if they ever need it: - Equal Cost Multi Path (a.k.a. E.C.M.P.) routing is NOT needed. - NO ip rule(s) were needed to pull this off. - NO additional routing tables were needed to pull this off. - NO patches (i.e. Julian''s Dead Gateway Detection patch) were needed to make this off. - NO special scripts were needed to monitor and / or modify the routing table(s). (Note: This is applicable to my scenario, see below.) With regards to the monitoring of routing tables, I did not need to do any thing special, i.e. no ping or arping was needed. I think this was because when my primary route went down I would start using the secondary route and the returning traffic would always try to use the primary and fail back to the secondary route. When the primary route did come back up the inbound traffic would come in the primary interface / route thus incrementing the counters in my kernel thus making the kernel aware that the primary route was indeed back up so it could switch back to it. Note: In my test, I was manually taking the interface down on one VM and subsequently bring it back up and restoring the route(s) across it. In my opinion, this interface fiddling on the upstream end is not automatic, but is out side of the scope of the client end failing back to a backup route. If I were trying to do this between two systems where the link in the middle (between intermediary switches) went down, I believe I would have to do some sort of heart beat across the link. In this case, I would probably use (read: try) arping first and then switch to something else if that did not work. Grant. . . .