thr3ads.net - LARTC - Non-traditional Failover Query [Feb 2004]

If this information is useful, please help other people find it:
Share via:

Jonathan Day

2004-Feb-23 20:17 UTC

Non-traditional Failover Query

Hi,

Partly because I never like straightforward solutions,
I am looking to implement a non-standard failover
system that owes its origins to mixing RAID 5 with
some beer.

The idea is to have machines A, B and C, configured as
follows:

1) Any given process is running on TWO machines at the
same time. If a process or machine fails, then a new
backup is started on the third machine. There is thus
a rotation around the nodes.

2) Packets destined for any of the machines should be
received by ALL of the machines.

3) If the primary process is on A, then replies from B
and C should still be generated, but be transparently
dropped. Likewise, if the primary process is on B or
C.

Let''s say you have an Apache process running on A and
B. B is shadowing A on everything. It''s at the same
point, has the same connections established, etc.
Failover becomes merely "ungagging" B.

How is this a LARTC problem?

Uhhh... because this approach is seriously abusing the
entire networking stack. We essentially have all three
machines running with an identical MAC and IP address
visible to the outside. The "distinct" address is
purely internal.

What this requires is a way of tricking all three
machines into believing that they are the sole
recipients. This keeps the stacks in a uniform state,
which means we can fail-over the connections without
having to either checkpoint or copy stateful
information, both of which get ugly when you start
talking about lots of information.

This leads to the second network-related problem. If
you have two identical machines starting from
identical states, and processing identical streams,
then they should end up in identical states - ie:
crashed.

This is easily fixed. If A is the machine you are
starting all the processes on, and B is your "mirror",
then C needs to take up the excess load.

In other words, A+C is a cluster, and B+C is a second
cluster. Processes migrated to C from A or B aren''t
mirrored. (This is akin to RAID 5''s partial backup.)

So, the three machines need to be seen from four
distinct views:

A) From outside, a single machine is visible.

B) From the HA perspective, there is one primary
machine, one mirror machine and one spare

C) From the load-balance perspective, there are two
overlapping clusters.

D) From the LAN perspective, there are three distinct,
uniquely-addressable machines.

Using Linux'' advanced networking, BPF and netfilter
layers, connection mirroring for HA/load-balancing
purposes should be straight-forward. Much more so than
using the "wedge" concept proposed by MIT. Because you
need send nothing more than flags between the
machines, it should also be less expensive on the LAN
and processor.

So... how would you go about doing this?


__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Daniel Chemko

2004-Feb-23 20:56 UTC

head link

RE: Non-traditional Failover Query

This seems like a band-aid front end solution to a back-end Session
tracking problem unless I am miss-reading your request. 
> 1) Any given process is running on TWO machines at the
> same time. If a process or machine fails, then a new
> backup is started on the third machine. There is thus
> a rotation around the nodes.
Linux Heartbeat
> 2) Packets destined for any of the machines should be
> received by ALL of the machines.
IPT#1
I am not sure if there''s already an iptables module that does this, but
you can expect something like that to go into there, since that''s where
most really out there networking stuff goes (connection tracking is a
different problem)
> 3) If the primary process is on A, then replies from B
> and C should still be generated, but be transparently
> dropped. Likewise, if the primary process is on B or
> C.
IPT #2
This again would have to be a very elaborate iptables module that also
interacts with a userspace app which keeps track of which backend
process is really alive, and which ones are just spinning.
> Uhhh... because this approach is seriously abusing the
> entire networking stack. We essentially have all three
> machines running with an identical MAC and IP address
> visible to the outside. The "distinct" address is
> purely internal.
Iptables -j SNAT --help
> What this requires is a way of tricking all three
> machines into believing that they are the sole
> recipients. This keeps the stacks in a uniform state,
> which means we can fail-over the connections without
> having to either checkpoint or copy stateful
> information, both of which get ugly when you start
> talking about lots of information.
The iptables module described at IPT #1
> This leads to the second network-related problem. If
> you have two identical machines starting from
> identical states, and processing identical streams,
> then they should end up in identical states - ie:
> crashed.
This can''t be guaranteed, but you would hope that two identical
machines
doing exactly the same thing would be more or less the same, but it also
depends on external hosts servicing the two machines identically and
within a reasonable time differential.
> This is easily fixed. If A is the machine you are
> starting all the processes on, and B is your "mirror",
> then C needs to take up the excess load.
> 
> In other words, A+C is a cluster, and B+C is a second
> cluster. Processes migrated to C from A or B aren''t
> mirrored. (This is akin to RAID 5''s partial backup.)
You kind of lost me here. 
> B) From the HA perspective, there is one primary
> machine, one mirror machine and one spare
You are trying to mix two HA paradigms together, Hot and Cold Standby. I
doubt this is possible with most current HA software, but prove me
wrong. I''d like to know!
> C) From the load-balance perspective, there are two
> overlapping clusters.
This isn''t quite true since the load-balancer does not split the load,
it just discriminates as to who''s response is relevant.

(PS: When the primary goes belly up, how do you plan on doing a takeover
AFTER the secondary has sent out packets that are effectively dropped?
It seems that TCP retries would all be sent and dropped and the session
would timeout)

Having two simultaneously servicing events seems a waste. I would say
that statefull hot-standby is you''re best bet. Make it a 3 way hot
standby if the HA software supports it. This of course means that the
server software you''re using to has to be HA aware.
> Using Linux'' advanced networking, BPF and netfilter
> layers, connection mirroring for HA/load-balancing
> purposes should be straight-forward. Much more so than
> using the "wedge" concept proposed by MIT. Because you
> need send nothing more than flags between the
> machines, it should also be less expensive on the LAN
> and processor.
Trust me, THIS IS NOT EASY!
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

sufcrusher

2004-Feb-24 11:39 UTC

head link

Re: Non-traditional Failover Query

I don''t see how you can keep all TCP/IP aspects of this in sync when
mirroring all packets to 2 servers. I''m sure packet numbers would very
quickly start getting out of sync causing one of the machines to drop the
packets. And what happens if there''s a dropped packet to one of the
servers?

Assuming we''re talking about a machine Y (in front of machines A, B and
C)
that does all this iptables stuff. Why not set it up as a proxy server?
Servers A, B and C have normal MAC''s and IP''s (non-publicif
you want).
Server Y has a public IP. All client connections come in on Y and then Y
makes it''s own requests to A and B (using seperate TCP connections) and
sends the quickest response back to the client, ignoring the slower response
(if it ever comes). This proxy could even do a little caching where
possible.

Google a little and you''ll find real life examples for such setups.

-

----- Original Message ----- 
From: "Daniel Chemko" <dchemko@smgtec.com>
To: "Jonathan Day" <imipak@yahoo.com>;
<lartc@mailman.ds9a.nl>
Sent: Monday, February 23, 2004 9:56 PM
Subject: RE: [LARTC] Non-traditional Failover Query


This seems like a band-aid front end solution to a back-end Session
tracking problem unless I am miss-reading your request.
> 1) Any given process is running on TWO machines at the
> same time. If a process or machine fails, then a new
> backup is started on the third machine. There is thus
> a rotation around the nodes.
Linux Heartbeat
> 2) Packets destined for any of the machines should be
> received by ALL of the machines.
IPT#1
I am not sure if there''s already an iptables module that does this, but
you can expect something like that to go into there, since that''s where
most really out there networking stuff goes (connection tracking is a
different problem)
> 3) If the primary process is on A, then replies from B
> and C should still be generated, but be transparently
> dropped. Likewise, if the primary process is on B or
> C.
IPT #2
This again would have to be a very elaborate iptables module that also
interacts with a userspace app which keeps track of which backend
process is really alive, and which ones are just spinning.
> Uhhh... because this approach is seriously abusing the
> entire networking stack. We essentially have all three
> machines running with an identical MAC and IP address
> visible to the outside. The "distinct" address is
> purely internal.
Iptables -j SNAT --help
> What this requires is a way of tricking all three
> machines into believing that they are the sole
> recipients. This keeps the stacks in a uniform state,
> which means we can fail-over the connections without
> having to either checkpoint or copy stateful
> information, both of which get ugly when you start
> talking about lots of information.
The iptables module described at IPT #1
> This leads to the second network-related problem. If
> you have two identical machines starting from
> identical states, and processing identical streams,
> then they should end up in identical states - ie:
> crashed.
This can''t be guaranteed, but you would hope that two identical
machines
doing exactly the same thing would be more or less the same, but it also
depends on external hosts servicing the two machines identically and
within a reasonable time differential.
> This is easily fixed. If A is the machine you are
> starting all the processes on, and B is your "mirror",
> then C needs to take up the excess load.
>
> In other words, A+C is a cluster, and B+C is a second
> cluster. Processes migrated to C from A or B aren''t
> mirrored. (This is akin to RAID 5''s partial backup.)
You kind of lost me here.
> B) From the HA perspective, there is one primary
> machine, one mirror machine and one spare
You are trying to mix two HA paradigms together, Hot and Cold Standby. I
doubt this is possible with most current HA software, but prove me
wrong. I''d like to know!
> C) From the load-balance perspective, there are two
> overlapping clusters.
This isn''t quite true since the load-balancer does not split the load,
it just discriminates as to who''s response is relevant.

(PS: When the primary goes belly up, how do you plan on doing a takeover
AFTER the secondary has sent out packets that are effectively dropped?
It seems that TCP retries would all be sent and dropped and the session
would timeout)

Having two simultaneously servicing events seems a waste. I would say
that statefull hot-standby is you''re best bet. Make it a 3 way hot
standby if the HA software supports it. This of course means that the
server software you''re using to has to be HA aware.
> Using Linux'' advanced networking, BPF and netfilter
> layers, connection mirroring for HA/load-balancing
> purposes should be straight-forward. Much more so than
> using the "wedge" concept proposed by MIT. Because you
> need send nothing more than flags between the
> machines, it should also be less expensive on the LAN
> and processor.
Trust me, THIS IS NOT EASY!

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

rubens@etica.net

2004-Feb-24 17:49 UTC

head link

Re: Non-traditional Failover Query

Take a look at
http://www.ultramonkey.org/papers/lvs_jan_2004/stuff/lvs_jan_2004.pdf

With the active-active load-balancing technique described, it becomes a
matter of implementing your policy with the ipt_saru rules.


Rubens


On Mon, 23 Feb 2004, Jonathan Day wrote:
> Hi,
>
> Partly because I never like straightforward solutions,
> I am looking to implement a non-standard failover
> system that owes its origins to mixing RAID 5 with
> some beer.
>
> The idea is to have machines A, B and C, configured as
> follows:
>
> 1) Any given process is running on TWO machines at the
> same time. If a process or machine fails, then a new
> backup is started on the third machine. There is thus
> a rotation around the nodes.
>
> 2) Packets destined for any of the machines should be
> received by ALL of the machines.
>
> 3) If the primary process is on A, then replies from B
> and C should still be generated, but be transparently
> dropped. Likewise, if the primary process is on B or
> C.
>
> Let''s say you have an Apache process running on A and
> B. B is shadowing A on everything. It''s at the same
> point, has the same connections established, etc.
> Failover becomes merely "ungagging" B.
>
> How is this a LARTC problem?
>
> Uhhh... because this approach is seriously abusing the
> entire networking stack. We essentially have all three
> machines running with an identical MAC and IP address
> visible to the outside. The "distinct" address is
> purely internal.
>
> What this requires is a way of tricking all three
> machines into believing that they are the sole
> recipients. This keeps the stacks in a uniform state,
> which means we can fail-over the connections without
> having to either checkpoint or copy stateful
> information, both of which get ugly when you start
> talking about lots of information.
>
> This leads to the second network-related problem. If
> you have two identical machines starting from
> identical states, and processing identical streams,
> then they should end up in identical states - ie:
> crashed.
>
> This is easily fixed. If A is the machine you are
> starting all the processes on, and B is your "mirror",
> then C needs to take up the excess load.
>
> In other words, A+C is a cluster, and B+C is a second
> cluster. Processes migrated to C from A or B aren''t
> mirrored. (This is akin to RAID 5''s partial backup.)
>
> So, the three machines need to be seen from four
> distinct views:
>
> A) From outside, a single machine is visible.
>
> B) From the HA perspective, there is one primary
> machine, one mirror machine and one spare
>
> C) From the load-balance perspective, there are two
> overlapping clusters.
>
> D) From the LAN perspective, there are three distinct,
> uniquely-addressable machines.
>
> Using Linux'' advanced networking, BPF and netfilter
> layers, connection mirroring for HA/load-balancing
> purposes should be straight-forward. Much more so than
> using the "wedge" concept proposed by MIT. Because you
> need send nothing more than flags between the
> machines, it should also be less expensive on the LAN
> and processor.
>
> So... how would you go about doing this?
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail SpamGuard - Read only the mail you want.
> http://antispam.yahoo.com/tools
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
>
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Daniel Chemko

2004-Feb-24 18:48 UTC

head link

RE: Non-traditional Failover Query

They have done the brain work, but I don''t think saru has actually been
implemented.

rubens@etica.net wrote:> Take a look at
> http://www.ultramonkey.org/papers/lvs_jan_2004/stuff/lvs_jan_2004.pdf 
> 
> With the active-active load-balancing technique described, it becomes
> a matter of implementing your policy with the ipt_saru rules. 
> 
> 
> Rubens
> 
> _______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Possibly Parallel Threads

Search for more maybe matching threads

LARTC - Feb 2004 - Non-traditional Failover Query

Non-traditional Failover Query

RE: Non-traditional Failover Query

Re: Non-traditional Failover Query

Re: Non-traditional Failover Query

RE: Non-traditional Failover Query

Possibly Parallel Threads