thr3ads.net - LARTC - dead router detection [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Guillermo Gómez (Gomix)

2007-Nov-06 12:39 UTC

dead router detection

Hi all

I would like to know what happens with a dead router in a multipath 
configuration like the one presented 
http://lartc.org/howto/lartc.rpdb.multiple-links.html
Do i need to monitor dead routers and reconfigure ?

Guillermo

Grant Taylor

2007-Nov-06 18:00 UTC

head link

Re: dead router detection

On 11/06/07 06:39, Guillermo Gómez (Gomix) wrote:> I would like to know what happens with a dead router in a multipath
> configuration like the one presented
> http://lartc.org/howto/lartc.rpdb.multiple-links.html
>
> Do i need to monitor dead routers and reconfigure ?
Dead Gateway Detection (a.k.a. DGD) built in to stock Linux kernels will
detect the death of immediately connected gateways. DGD will only
work with gateways on the same subnet, not beyond other gateways. DGD
running on ''Client'' below will detect the death of
''Router A'' or ''Router
B'' but not ''Router C'' nor ''Router
D''. For ''Client'' to be aware of the
death of ''Router C'' or ''Router D'' a routing
protocol will need to be used.

+----------+ +----------+
+---+ Router A +---+---+ Router C +---
+--------+ | +----------+ | +----------+
| Client +---+ |
+--------+ | +----------+ | +----------+
+---+ Router B +---+---+ Router D +---
+----------+ +----------+

DGD is used for the Linux kernel to detect when a given router is
unreachable and to fail over to the next available route. For this to
work ''Client'' would have to have the following two routes in
place.

route add default gw <Router A> metric <N>
route add default gw <Router B> metric <N>

DGD will detect the failure of one gateway (route) and fall back to the
next available gateway (route).

One point of interest is that DGD purportedly only works with default
routes, not routes to specific destinations. I have not personally used
this so I can not say for sure.

I have tested the following scenario with stock Linux kernels and had
success.

I had two routes set up on each system that the network bound to the
opposing systems dummy0 available via the opposing systems eth0 and eth1
interfaces. So each system had two routes to the opposing dummy0 network.

I ran pings from one systems dummy0 interface to the other systems
dummy0 interface. I then disconnected the ethernet cable from one of
the systems eth interfaces. With in 60 seconds the system that I did
not disconnect the cable on would realize that the gateway was dead and
drop back to the one remaining gateway.

If I plugged the ethernet cable back in and manually restored the config
on the system that I unplugged the cable from (when the interface went
down the kernel removed its configuration) the system would send traffic
back to the other system using both interfaces.

So, say I unplugged the cable from eth0 on B, A would realize that the
route that used B:eth0 as the gateway was dead and so A would stop using
that route. B would know immediately that replies needed to to back to
A over eth1 because it already knew that it could not reach eth0 on A
because its interface was down.

Once I plugged the cable back in to eth0 on B and re-configured the IP
address and routes back to A (again the kernel removed the interface
config and routes when it saw the physical link was dead) B immediately
started using both routes again. A allowed the traffic to come back in
eth0 while still sending the traffic out eth1. After about 45 - 60
seconds of live traffic on eth0 the kernel on A decided that the gateway
was back alive and started using the route again.

When I ran this test I was trying to make sure that A would work with
out any regard to B. B was under someone else''s control and as such
not
my worry how it behaved. I found that A would detect the lack of the
ability to reach B via one route or the other and start using the
remaining route(s) as it should.

I did not need to run any sort of monitoring of traffic on either eth0
or eth1 on A because I was able to rely on incoming traffic from both
routes to increment the kernel''s packet counters that were used by the
DGD algorithm. However if I was implementing both sides of this
situation I would have needed to periodically do something like an
ARPing to both eth0 and eth1 on B to make sure that they were both
alive. More specifically, I would need to ARPing to see if the routes
were resurrected. The kernel would watch packet counters to see when a
route died. However when the route died, there would be no normal
traffic to start incrementing the counters when the route came back to
life. Thus I would need to create the traffic via ARPing.

There is another issue that you need to be aware of with what "Routing
for multiple uplinks/providers". Namely when you use "Routing for
multiple uplinks/providers" you have multiple external IP addresses that
systems see you coming from. When you are coming from multiple external
IP addresses, you can not shift traffic from one route to the other(s)
with out breaking where the connection appears to be from.

Grant. . . .

Apparently Analagous Threads

Search for more maybe matching threads

LARTC - Nov 2007 - dead router detection

dead router detection

Re: dead router detection

Apparently Analagous Threads