Hi, Partly because I never like straightforward solutions, I am looking to implement a non-standard failover system that owes its origins to mixing RAID 5 with some beer. The idea is to have machines A, B and C, configured as follows: 1) Any given process is running on TWO machines at the same time. If a process or machine fails, then a new backup is started on the third machine. There is thus a rotation around the nodes. 2) Packets destined for any of the machines should be received by ALL of the machines. 3) If the primary process is on A, then replies from B and C should still be generated, but be transparently dropped. Likewise, if the primary process is on B or C. Let''s say you have an Apache process running on A and B. B is shadowing A on everything. It''s at the same point, has the same connections established, etc. Failover becomes merely "ungagging" B. How is this a LARTC problem? Uhhh... because this approach is seriously abusing the entire networking stack. We essentially have all three machines running with an identical MAC and IP address visible to the outside. The "distinct" address is purely internal. What this requires is a way of tricking all three machines into believing that they are the sole recipients. This keeps the stacks in a uniform state, which means we can fail-over the connections without having to either checkpoint or copy stateful information, both of which get ugly when you start talking about lots of information. This leads to the second network-related problem. If you have two identical machines starting from identical states, and processing identical streams, then they should end up in identical states - ie: crashed. This is easily fixed. If A is the machine you are starting all the processes on, and B is your "mirror", then C needs to take up the excess load. In other words, A+C is a cluster, and B+C is a second cluster. Processes migrated to C from A or B aren''t mirrored. (This is akin to RAID 5''s partial backup.) So, the three machines need to be seen from four distinct views: A) From outside, a single machine is visible. B) From the HA perspective, there is one primary machine, one mirror machine and one spare C) From the load-balance perspective, there are two overlapping clusters. D) From the LAN perspective, there are three distinct, uniquely-addressable machines. Using Linux'' advanced networking, BPF and netfilter layers, connection mirroring for HA/load-balancing purposes should be straight-forward. Much more so than using the "wedge" concept proposed by MIT. Because you need send nothing more than flags between the machines, it should also be less expensive on the LAN and processor. So... how would you go about doing this? __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
This seems like a band-aid front end solution to a back-end Session tracking problem unless I am miss-reading your request.> 1) Any given process is running on TWO machines at the > same time. If a process or machine fails, then a new > backup is started on the third machine. There is thus > a rotation around the nodes.Linux Heartbeat> 2) Packets destined for any of the machines should be > received by ALL of the machines.IPT#1 I am not sure if there''s already an iptables module that does this, but you can expect something like that to go into there, since that''s where most really out there networking stuff goes (connection tracking is a different problem)> 3) If the primary process is on A, then replies from B > and C should still be generated, but be transparently > dropped. Likewise, if the primary process is on B or > C.IPT #2 This again would have to be a very elaborate iptables module that also interacts with a userspace app which keeps track of which backend process is really alive, and which ones are just spinning.> Uhhh... because this approach is seriously abusing the > entire networking stack. We essentially have all three > machines running with an identical MAC and IP address > visible to the outside. The "distinct" address is > purely internal.Iptables -j SNAT --help> What this requires is a way of tricking all three > machines into believing that they are the sole > recipients. This keeps the stacks in a uniform state, > which means we can fail-over the connections without > having to either checkpoint or copy stateful > information, both of which get ugly when you start > talking about lots of information.The iptables module described at IPT #1> This leads to the second network-related problem. If > you have two identical machines starting from > identical states, and processing identical streams, > then they should end up in identical states - ie: > crashed.This can''t be guaranteed, but you would hope that two identical machines doing exactly the same thing would be more or less the same, but it also depends on external hosts servicing the two machines identically and within a reasonable time differential.> This is easily fixed. If A is the machine you are > starting all the processes on, and B is your "mirror", > then C needs to take up the excess load. > > In other words, A+C is a cluster, and B+C is a second > cluster. Processes migrated to C from A or B aren''t > mirrored. (This is akin to RAID 5''s partial backup.)You kind of lost me here.> B) From the HA perspective, there is one primary > machine, one mirror machine and one spareYou are trying to mix two HA paradigms together, Hot and Cold Standby. I doubt this is possible with most current HA software, but prove me wrong. I''d like to know!> C) From the load-balance perspective, there are two > overlapping clusters.This isn''t quite true since the load-balancer does not split the load, it just discriminates as to who''s response is relevant. (PS: When the primary goes belly up, how do you plan on doing a takeover AFTER the secondary has sent out packets that are effectively dropped? It seems that TCP retries would all be sent and dropped and the session would timeout) Having two simultaneously servicing events seems a waste. I would say that statefull hot-standby is you''re best bet. Make it a 3 way hot standby if the HA software supports it. This of course means that the server software you''re using to has to be HA aware.> Using Linux'' advanced networking, BPF and netfilter > layers, connection mirroring for HA/load-balancing > purposes should be straight-forward. Much more so than > using the "wedge" concept proposed by MIT. Because you > need send nothing more than flags between the > machines, it should also be less expensive on the LAN > and processor.Trust me, THIS IS NOT EASY! _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
I don''t see how you can keep all TCP/IP aspects of this in sync when mirroring all packets to 2 servers. I''m sure packet numbers would very quickly start getting out of sync causing one of the machines to drop the packets. And what happens if there''s a dropped packet to one of the servers? Assuming we''re talking about a machine Y (in front of machines A, B and C) that does all this iptables stuff. Why not set it up as a proxy server? Servers A, B and C have normal MAC''s and IP''s (non-publicif you want). Server Y has a public IP. All client connections come in on Y and then Y makes it''s own requests to A and B (using seperate TCP connections) and sends the quickest response back to the client, ignoring the slower response (if it ever comes). This proxy could even do a little caching where possible. Google a little and you''ll find real life examples for such setups. - ----- Original Message ----- From: "Daniel Chemko" <dchemko@smgtec.com> To: "Jonathan Day" <imipak@yahoo.com>; <lartc@mailman.ds9a.nl> Sent: Monday, February 23, 2004 9:56 PM Subject: RE: [LARTC] Non-traditional Failover Query This seems like a band-aid front end solution to a back-end Session tracking problem unless I am miss-reading your request.> 1) Any given process is running on TWO machines at the > same time. If a process or machine fails, then a new > backup is started on the third machine. There is thus > a rotation around the nodes.Linux Heartbeat> 2) Packets destined for any of the machines should be > received by ALL of the machines.IPT#1 I am not sure if there''s already an iptables module that does this, but you can expect something like that to go into there, since that''s where most really out there networking stuff goes (connection tracking is a different problem)> 3) If the primary process is on A, then replies from B > and C should still be generated, but be transparently > dropped. Likewise, if the primary process is on B or > C.IPT #2 This again would have to be a very elaborate iptables module that also interacts with a userspace app which keeps track of which backend process is really alive, and which ones are just spinning.> Uhhh... because this approach is seriously abusing the > entire networking stack. We essentially have all three > machines running with an identical MAC and IP address > visible to the outside. The "distinct" address is > purely internal.Iptables -j SNAT --help> What this requires is a way of tricking all three > machines into believing that they are the sole > recipients. This keeps the stacks in a uniform state, > which means we can fail-over the connections without > having to either checkpoint or copy stateful > information, both of which get ugly when you start > talking about lots of information.The iptables module described at IPT #1> This leads to the second network-related problem. If > you have two identical machines starting from > identical states, and processing identical streams, > then they should end up in identical states - ie: > crashed.This can''t be guaranteed, but you would hope that two identical machines doing exactly the same thing would be more or less the same, but it also depends on external hosts servicing the two machines identically and within a reasonable time differential.> This is easily fixed. If A is the machine you are > starting all the processes on, and B is your "mirror", > then C needs to take up the excess load. > > In other words, A+C is a cluster, and B+C is a second > cluster. Processes migrated to C from A or B aren''t > mirrored. (This is akin to RAID 5''s partial backup.)You kind of lost me here.> B) From the HA perspective, there is one primary > machine, one mirror machine and one spareYou are trying to mix two HA paradigms together, Hot and Cold Standby. I doubt this is possible with most current HA software, but prove me wrong. I''d like to know!> C) From the load-balance perspective, there are two > overlapping clusters.This isn''t quite true since the load-balancer does not split the load, it just discriminates as to who''s response is relevant. (PS: When the primary goes belly up, how do you plan on doing a takeover AFTER the secondary has sent out packets that are effectively dropped? It seems that TCP retries would all be sent and dropped and the session would timeout) Having two simultaneously servicing events seems a waste. I would say that statefull hot-standby is you''re best bet. Make it a 3 way hot standby if the HA software supports it. This of course means that the server software you''re using to has to be HA aware.> Using Linux'' advanced networking, BPF and netfilter > layers, connection mirroring for HA/load-balancing > purposes should be straight-forward. Much more so than > using the "wedge" concept proposed by MIT. Because you > need send nothing more than flags between the > machines, it should also be less expensive on the LAN > and processor.Trust me, THIS IS NOT EASY! _______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Take a look at http://www.ultramonkey.org/papers/lvs_jan_2004/stuff/lvs_jan_2004.pdf With the active-active load-balancing technique described, it becomes a matter of implementing your policy with the ipt_saru rules. Rubens On Mon, 23 Feb 2004, Jonathan Day wrote:> Hi, > > Partly because I never like straightforward solutions, > I am looking to implement a non-standard failover > system that owes its origins to mixing RAID 5 with > some beer. > > The idea is to have machines A, B and C, configured as > follows: > > 1) Any given process is running on TWO machines at the > same time. If a process or machine fails, then a new > backup is started on the third machine. There is thus > a rotation around the nodes. > > 2) Packets destined for any of the machines should be > received by ALL of the machines. > > 3) If the primary process is on A, then replies from B > and C should still be generated, but be transparently > dropped. Likewise, if the primary process is on B or > C. > > Let''s say you have an Apache process running on A and > B. B is shadowing A on everything. It''s at the same > point, has the same connections established, etc. > Failover becomes merely "ungagging" B. > > How is this a LARTC problem? > > Uhhh... because this approach is seriously abusing the > entire networking stack. We essentially have all three > machines running with an identical MAC and IP address > visible to the outside. The "distinct" address is > purely internal. > > What this requires is a way of tricking all three > machines into believing that they are the sole > recipients. This keeps the stacks in a uniform state, > which means we can fail-over the connections without > having to either checkpoint or copy stateful > information, both of which get ugly when you start > talking about lots of information. > > This leads to the second network-related problem. If > you have two identical machines starting from > identical states, and processing identical streams, > then they should end up in identical states - ie: > crashed. > > This is easily fixed. If A is the machine you are > starting all the processes on, and B is your "mirror", > then C needs to take up the excess load. > > In other words, A+C is a cluster, and B+C is a second > cluster. Processes migrated to C from A or B aren''t > mirrored. (This is akin to RAID 5''s partial backup.) > > So, the three machines need to be seen from four > distinct views: > > A) From outside, a single machine is visible. > > B) From the HA perspective, there is one primary > machine, one mirror machine and one spare > > C) From the load-balance perspective, there are two > overlapping clusters. > > D) From the LAN perspective, there are three distinct, > uniquely-addressable machines. > > Using Linux'' advanced networking, BPF and netfilter > layers, connection mirroring for HA/load-balancing > purposes should be straight-forward. Much more so than > using the "wedge" concept proposed by MIT. Because you > need send nothing more than flags between the > machines, it should also be less expensive on the LAN > and processor. > > So... how would you go about doing this? > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail SpamGuard - Read only the mail you want. > http://antispam.yahoo.com/tools > _______________________________________________ > LARTC mailing list / LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/ >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
They have done the brain work, but I don''t think saru has actually been implemented. rubens@etica.net wrote:> Take a look at > http://www.ultramonkey.org/papers/lvs_jan_2004/stuff/lvs_jan_2004.pdf > > With the active-active load-balancing technique described, it becomes > a matter of implementing your policy with the ipt_saru rules. > > > Rubens > >_______________________________________________ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/