I am developing load balancing router, But I have a question about fail over. The follow diagram is my test environment and scripts. ------------------------------------------------------------------- Environment Setting PC1(192.168.10.2) | (LAN) | PC2-eth2(192.168.10.1) + + PC2-eth0(111.111.111.2) PC2-eth1(222.222.222.2 ) | | (WAN1) (WAN2) | | PC3-eth0(111.111.111.1) PC3-eth1( 222.222.222.1) + + PC2-eth2(172.16.0.1) PC2-Linux Kernel 2.6.21 PC2-Iptables 1.3.7 ------------------------------------------------------------------- Iptables rules: iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 111.111.111.2 iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to 222.222.222.2 # table 101 ip route flush table 101 ip route add 192.168.10.0/24 dev eth2 table 101 ip route add default via 111.111.111.1 dev eth0 table 101 # table 102 ip route flush table 102 ip route add 192.168.10.0/24 dev eth2 table 102 ip route add default via 222.222.222.1 dev eth1 table 102 ip rule del fwmark 1 table 101 ip rule del fwmark 2 table 102 ip rule add fwmark 1 table 101 ip rule add fwmark 2 table 102 iptables -t mangle -A PREROUTING -t mangle -j CONNMARK --restore-mark iptables -t mangle -A PREROUTING -m state --state NEW -m statistic --mode nth --every 2 --packet 1 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -m state --state NEW -m statistic --mode nth --every 2 --packet 2 -j MARK --set-mark 2 iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark ----------------------------------------------------------------------------- Test Sequence: 1. Run command "ping 172.16.0.1 -t" on PC1 2. I capture packets on WAN1 and WAN2, it works fine. The ICMP request/response would come out on WAN1 and WAN2 sequentially. 3. I unplug WAN1. Only the packets on WAN1 will lost, but WAN2 should works, right? I should saw "ping Time Out" and "ping OK" on PC1 sequentially. 4. But the both connections all breaks. It always "ping Time Out" on PC1. 5. After caputre the packets on WAN1 and WAN2. I saw a weird behavior. The source IP of packets on WAN2 is 111.111.111.2, but it should be 222.222.222.2 That is why WAN2 breaks. ----------------------------------------------------------------------------- Could you give me a suggestion? Thanks. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 06/24/07 22:07, John Chang wrote:> iptables -t mangle -A PREROUTING -t mangle -j CONNMARK --restore-mark > iptables -t mangle -A PREROUTING -m state --state NEW -m statistic > --mode nth --every 2 --packet 1 -j MARK --set-mark 1 > iptables -t mangle -A PREROUTING -m state --state NEW -m statistic > --mode nth --every 2 --packet 2 -j MARK --set-mark 2 > iptables -t mangle -A POSTROUTING -j CONNMARK --save-markI don''t think these rules are going to do what you anticipate them to do. These rules will alternate which route is used based on sequential entry of packets in to the router. Consider if you have any transaction that will take more than one packet. The connection will be sent out both routes, each with different source IP addresses, thus the two packets are no longer associated with each other thus breaking your connection.> 2. I capture packets on WAN1 and WAN2, it works fine. > The ICMP request/response would come out on WAN1 and WAN2 sequentially.(See the above comment.)> 3. I unplug WAN1. Only the packets on WAN1 will lost, but WAN2 should > works, right? > I should saw "ping Time Out" and "ping OK" on PC1 sequentially.*IF* the rules do work, yes this should be what you see.> 4. But the both connections all breaks. It always "ping Time Out" on PC1.*nod*> 5. After caputre the packets on WAN1 and WAN2. I saw a weird behavior. > The source IP of packets on WAN2 is 111.111.111.2 > but it should be 222.222.222.2 > That is why WAN2 breaks.I don''t know what to say here, other than something is not working right.> Could you give me a suggestion? > Thanks.Do not use this method to load balance. Look in to Equal Cost Multi Path (a.k.a. ECMP) routing and specifying multiple default gateways on one route command. The kernel should try to load balance across the multiple default gateways for you while maintaining connections. Grant. . . .
John Chang написа:> > I am developing load balancing router, But I have a question about > fail over. > The follow diagram is my test environment and scripts. > ------------------------------------------------------------------- > Environment Setting > > PC1(192.168.10.2 <http://192.168.10.2>) > | > (LAN) > | > PC2-eth2( 192.168.10.1 <http://192.168.10.1>) > + + > PC2-eth0(111.111.111.2 <http://111.111.111.2>) PC2-eth1(222.222.222.2 > <http://222.222.222.2> ) > | | > (WAN1) (WAN2) > | | > PC3-eth0(111.111.111.1 <http://111.111.111.1>) PC3-eth1( 222.222.222.1 > <http://222.222.222.1>) > + + > PC2-eth2(172.16.0.1 <http://172.16.0.1>) > > PC2-Linux Kernel 2.6.21 > PC2-Iptables 1.3.7 > > > ------------------------------------------------------------------- > Iptables rules: > > iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 111.111.111.2 > <http://111.111.111.2> > iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to 222.222.222.2 > <http://222.222.222.2> > > # table 101 > ip route flush table 101 > ip route add 192.168.10.0/24 <http://192.168.10.0/24> dev eth2 table 101 > ip route add default via 111.111.111.1 <http://111.111.111.1> dev eth0 > table 101 > > # table 102 > ip route flush table 102 > ip route add 192.168.10.0/24 <http://192.168.10.0/24> dev eth2 table 102 > ip route add default via 222.222.222.1 <http://222.222.222.1> dev eth1 > table 102 > > ip rule del fwmark 1 table 101 > ip rule del fwmark 2 table 102 > ip rule add fwmark 1 table 101 > ip rule add fwmark 2 table 102 > > iptables -t mangle -A PREROUTING -t mangle -j CONNMARK --restore-mark > iptables -t mangle -A PREROUTING -m state --state NEW -m statistic > --mode nth --every 2 --packet 1 -j MARK --set-mark 1 > iptables -t mangle -A PREROUTING -m state --state NEW -m statistic > --mode nth --every 2 --packet 2 -j MARK --set-mark 2 > iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark > > ----------------------------------------------------------------------------- >Well ... I am not sure about it but you may try to do it this way: iptables -t nat -A POSTROUTING -o ! eth2 -m mark --mark 1 -j SNAT --to 111.111.111.2 <http://111.111.111.2> iptables -t nat -A POSTROUTING -o ! eth2 -m mark --mark 2 -j SNAT --to 222.222.222.2 <http://222.222.222.2> iptables -t mangle -A PREROUTING -t mangle -j CONNMARK --restore-mark iptables -t mangle -A PREROUTING -m state --state NEW -m statistic --mode nth --every 2 --packet 1 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -m state --state NEW -m statistic --mode nth --every 2 --packet 2 -j MARK --set-mark 2 iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark This is done without using iproute. There is another solution, but it works only with kernels up to 2.6.10: iptables -t nat -A POSTROUTING -o ! eth2 -j SNAT --to 111.111.111.2 <http://111.111.111.2>,222.222.222.2 <http://222.222.222.2> ".... For those kernels, if you specify more than one source address, either via an address range or multiple --to-source options, a simple round-robin (one after another in cycle) takes place between these addresses. Later Kernels (>= 2.6.11-rc1) don''t have the ability to NAT to multiple ranges anymore. ..."
Grant Taylor wrote:> >> Could you give me a suggestion? >> Thanks. > > Do not use this method to load balance. Look in to Equal Cost Multi > Path (a.k.a. ECMP) routing and specifying multiple default gateways on > one route command. The kernel should try to load balance across the > multiple default gateways for you while maintaining connections. >This is a bad bad advice in this day and age. If there are not enough users route caching will kill him. Here is a recent discussion of this: http://marc.info/?l=lartc&m=117912699505681&w=2 HTH Peter P.S. I am not insisting that netfilter is superior in this regard, I am simply expressing common requirements and looking into ways of achieving them. If someone can point me to how to do this with kernel routes - I am all ears, since I recognize that the netfilter solution is not very elegant, although it works.
Thanks for your advices. Currently my test scripts will make both WAN connections break, when I unplug one WAN connection. So I can not implement the fail-over mechanism. My original idea is to mark all packets as 1 when connection WAN2 breaks or mark all packets as 2 when connection WAN1 breaks. But now one connection breaks will make both connections break. I could not identify which connection breaks? It is weird. ><" ------------------------------------------------------------------------------------------------------ Grant Taylor wrote:> >> Could you give me a suggestion? >> Thanks. > > Do not use this method to load balance. Look in to Equal Cost Multi > Path (a.k.a. ECMP) routing and specifying multiple default gateways on > one route command. The kernel should try to load balance across the > multiple default gateways for you while maintaining connections. >This is a bad bad advice in this day and age. If there are not enough users route caching will kill him. Here is a recent discussion of this: http://marc.info/?l=lartc&m=117912699505681&w=2 HTH Peter P.S. I am not insisting that netfilter is superior in this regard, I am simply expressing common requirements and looking into ways of achieving them. If someone can point me to how to do this with kernel routes - I am all ears, since I recognize that the netfilter solution is not very elegant, although it works. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 06/26/07 01:46, Peter Rabbitson wrote:> This is a bad bad advice in this day and age.I think that is a bit of a bold statement. You are free to have your opinion on what is better for you, as am I.> If there are not enough users route caching will kill him. Here is a > recent discussion of this: > http://marc.info/?l=lartc&m=117912699505681&w=2Um, I just read this discussion and I have a few issues with it. First and foremost: It did not cover the reason "... route caching will kill ..." to my satisfaction like you indicated. Second: It relies on user space processes to alter and maintain things. Thus if for some reason these processes do not run or do not do so in a timely manner, they may not function correctly. Third: You are altering the way a running kernel is operating from user space, not letting the kernel maintain its self. Fourth: Occam''s Razor dictates the use of the simpler and equally effective (equality is debatable) method to achieve the same result. Though the method you site has potential, I think there is just as much room for improvement as there is in the method that I suggested. Each method has its pros and cons.> P.S. I am not insisting that netfilter is superior in this regard, I > am simply expressing common requirements and looking into ways of > achieving them. If someone can point me to how to do this with > kernel routes - I am all ears, since I recognize that the netfilter > solution is not very elegant, although it works.By your own statement, you are indicating that both methods leave something to be desired. Grant. . . .
Try this algol: MANGLE: 1 - restore mark 2 - accept mark 1 accept mark 2 3 - random mark 1 ou 2 4 - save mark NAT 5 - SNAT per interface. Att, Patrick Brandão ----- Original Message ----- From: "Grant Taylor" <gtaylor@riverviewtech.net> To: "Mail List - Linux Advanced Routing and Traffic Control" <lartc@mailman.ds9a.nl> Sent: Tuesday, June 26, 2007 11:37 AM Subject: Re: [LARTC] Load Balance and SNAT problem.> On 06/26/07 01:46, Peter Rabbitson wrote: >> This is a bad bad advice in this day and age. > > I think that is a bit of a bold statement. You are free to have your > opinion on what is better for you, as am I. > >> If there are not enough users route caching will kill him. Here is a >> recent discussion of this: >> http://marc.info/?l=lartc&m=117912699505681&w=2 > > Um, I just read this discussion and I have a few issues with it. > > First and foremost: It did not cover the reason "... route caching will > kill ..." to my satisfaction like you indicated. > > Second: It relies on user space processes to alter and maintain things. > Thus if for some reason these processes do not run or do not do so in a > timely manner, they may not function correctly. > > Third: You are altering the way a running kernel is operating from user > space, not letting the kernel maintain its self. > > Fourth: Occam''s Razor dictates the use of the simpler and equally > effective (equality is debatable) method to achieve the same result. > > Though the method you site has potential, I think there is just as much > room for improvement as there is in the method that I suggested. Each > method has its pros and cons. > >> P.S. I am not insisting that netfilter is superior in this regard, I am >> simply expressing common requirements and looking into ways of achieving >> them. If someone can point me to how to do this with kernel routes - I >> am all ears, since I recognize that the netfilter solution is not very >> elegant, although it works. > > By your own statement, you are indicating that both methods leave > something to be desired. > > > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >
Grant Taylor wrote:> First and foremost: It did not cover the reason "... route caching will > kill ..." to my satisfaction like you indicated.Can you elaborate on this? My only issue with the kernel route balancing is that route caching can not be disabled entirely, so traffic to the same site will leave via the same channel, regardless if the other channel is empty or not. I know that it is technically possible (kernel option CONFIG_IP_ROUTE_MULTIPATH_RANDOM), but it will work only for globally routable addresses, while breaking NAT badly. The reason I made my bold, as you call it, statement, is because 90% of the time when someone is doing NAT, it is for a tightly joined group, with similar interests - hence a lot of traffic duplication. For instance if every user listens to the same online radiostation - how would you work around it? Let me know your thoughts Peter
Hi, I have load balance working on a linux server, balancing between two providers with obvious two different IPs (the customer is not an Autonomous System). It works very well except with some sites that establish a session and then redirects the session to another server. These sessions are usually based on informations like cookies and client IP address, and therefore you must reach the destination with the same IP address (thats why routing cache is there). But when the "session" is redirected to another destination server, another destination IP, sometimes the connection go trought the another link, and so, arrives at the destination with another IP, and then the session becomes invalid. I can''t see anything linux (and any other) could do to deal with it, since it''s a new destination IP. Anyone knows something that could solve this kind of problem ? :: Sorry for the bad english. -- André Guimarães Databras Informática Matriz RJ - 55 (21) 2518-2363 Filial ES - 55 (27) 3233-0098 http://www.databras.com.br
On 6/26/2007 12:44 PM, Peter Rabbitson wrote:> Can you elaborate on this? My only issue with the kernel route > balancing is that route caching can not be disabled entirely, so > traffic to the same site will leave via the same channel, regardless > if the other channel is empty or not. I know that it is technically > possible (kernel option CONFIG_IP_ROUTE_MULTIPATH_RANDOM), but it > will work only for globally routable addresses, while breaking NAT > badly.This is a very good point that was not made in the referenced message. I do not have any rebuttal to this point. This is the type of point that I was hoping to see before but did not. My response to this is that you have a good point, something that in my opinion should be addressed by the kernel at some point.> The reason I made my bold, as you call it, statement, is because 90% > of the time when someone is doing NAT, it is for a tightly joined > group, with similar interests - hence a lot of traffic duplication. > For instance if every user listens to the same online radiostation - > how would you work around it?I don''t know if the 90% as you say is accurate or not. However if you are even remotely in the ball park, you have a good point. I have been around environments with nearly 1000 computers with very little in similarity between all the people. I think this is really based on where NAT is used and how it is used. If you are talking of many to one NAT I would agree with you. However if you are talking about many to many NAT, I''ll disagree with you. I think that the scenarios you are thinking of would be best described as a small office / home office (a.k.a. SOHO), which would definitely qualify with what you are saying. However there are a LOT of uses of NAT outside of SOHOs. Given the prevalence of SOHOs doing NAT, I am willing to bet that you are correct. But, this is why there are different types of solutions to this problem for them.> Let me know your thoughtsWith regard to streaming radio, I personally believe that it should be multicast so that it can be streamed in one time and have multiple recipients hear it. Or there should be some sort of proxy that will download it and pass it back to multiple clients. Of course, this is beyond the scope of this discussion and would be used in larger environments out side of the SOHOs that I think you are referring to. Grant. . . .
On 6/26/2007 3:01 PM, Andre Guimarães wrote:> Anyone knows something that could solve this kind of problem ?I would like to see some control over how the cache matches, i.e. a netmask for the destination IP. Something like cache for matches on /24 or the likes. Grant. . . .
(Sorry, I''m not sure but the answer does impact this discussion.) On 6/26/2007 12:44 PM, Peter Rabbitson wrote:> so traffic to the same site will leave via the same channel, > regardless if the other channel is empty or not.Is the caching per route or per source IP? I''m guessing that it is per route decision such that any and all clients will use the same cached route thus not using additional interfaces. Or is this a clear and concise reason why load balancing via Netfilter would be a better approach? Grant. . . .
On 6/26/2007 9:03 PM, Mohan Sundaram wrote:> The caching would be per destination IP - so it is likely all clients > will use the same route and thus interface.This could be a problem. I was taking the caching to be remembering which route was chosen and believing it to be associated with a specific source IP address. I can see this being a very large issue when trying to do load balancing. In light of this information, I think that better could be done in Netfilter. However if there ever was a way to have route selection per source IP in the kernel, I would be more interested in that. I wonder if route selection caching would be different in different routing tables. In other words use a different routing table for a different (set of) clients. Thus one cached routing decision per routing table which could differ per routing table. Grant. . . .
The caching is per destination and source ip. TOS, fwmark and input interface too, if present. Routing with netfilter does not solve cache problems anyway, cache will still be present, and it will be consulted before routing tables are hit. In my opinion, routing in netfilter gives more flexibility in dynamically choosing weights and such. But multipath routing gives a bit more IP persistence. Both solutions work pretty well; there are die-hard fans for both of the above approaches. Recent archives of lartc have lot of discussions on it.> -----Original Message----- > From: lartc-bounces@mailman.ds9a.nl[mailto:lartc-bounces@mailman.ds9a.nl]> On Behalf Of Grant Taylor > Sent: Wednesday, June 27, 2007 10:08 AM > To: Mail List - Linux Advanced Routing and Traffic Control > Subject: Re: [LARTC] Load Balance and SNAT problem. > > On 6/26/2007 9:03 PM, Mohan Sundaram wrote: > > The caching would be per destination IP - so it is likely allclients> > will use the same route and thus interface. > > This could be a problem. I was taking the caching to be remembering > which route was chosen and believing it to be associated with aspecific> source IP address. I can see this being a very large issue whentrying> to do load balancing. > > In light of this information, I think that better could be done in > Netfilter. However if there ever was a way to have route selectionper> source IP in the kernel, I would be more interested in that. > > I wonder if route selection caching would be different in different > routing tables. In other words use a different routing table for a > different (set of) clients. Thus one cached routing decision per > routing table which could differ per routing table. > > > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 6/26/2007 9:14 PM, Mohan Sundaram wrote:> I remember that route balancing has an option to perform per packet > balancing and not per connection. If that were to work, then route > cache would not be used IMHO.Interesting. Do you have any idea where I can get some more information regarding this?> Per packet balancing is normally not done as it would break > connections especially in NAT''ted scenario.Keep in mind that NATing is not the only place that load balancing is used. I call to mind my recent thread "Redundant internet connections" (http://mailman.ds9a.nl/pipermail/lartc/2007q2/021015.html) where I had globally routable IP addresses in side the DMZ. I could have used per packet load balancing with out a problem except for the fact that I specifically wanted to not use the backup connection unless the primary was down. Grant. . . .
On 6/26/2007 9:22 PM, Salim S I wrote:> The caching is per destination and source ip. TOS, fwmark and input > interface too, if present.Is the caching done on the combination of source and destination or singularly source or singularly destination? If caching is done on the former, then as long as the source IP is different, you could potentially have different cached route choices for different workstations with in a company. Grant. . . .
Well, this is the relevant code in my kernel. (2.4.27) for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) { if (rth->key.dst == key->dst && rth->key.src == key->src && rth->key.iif == 0 && rth->key.oif == key->oif && #ifdef CONFIG_IP_ROUTE_FWMARK rth->key.fwmark == key->fwmark && #endif !((rth->key.tos ^ key->tos) & (IPTOS_RT_MASK | RTO_ONLINK)))> -----Original Message----- > From: lartc-bounces@mailman.ds9a.nl[mailto:lartc-bounces@mailman.ds9a.nl]> On Behalf Of Grant Taylor > Sent: Wednesday, June 27, 2007 10:39 AM > To: Mail List - Linux Advanced Routing and Traffic Control > Subject: Re: [LARTC] Load Balance and SNAT problem. > > On 6/26/2007 9:22 PM, Salim S I wrote: > > The caching is per destination and source ip. TOS, fwmark and input > > interface too, if present. > > Is the caching done on the combination of source and destination or > singularly source or singularly destination? > > If caching is done on the former, then as long as the source IP is > different, you could potentially have different cached route choicesfor> different workstations with in a company. > > > > Grant. . . . > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On 6/26/2007 10:07 PM, Salim S I wrote:> for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) > { > if (rth->key.dst == key->dst && > rth->key.src == key->src && > rth->key.iif == 0 && > rth->key.oif == key->oif && > #ifdef CONFIG_IP_ROUTE_FWMARK > rth->key.fwmark == key->fwmark && > #endif > !((rth->key.tos ^ key->tos) & > (IPTOS_RT_MASK | RTO_ONLINK)))I''m no C programmer, but it looks like the source, destination, in interface, and out interface are all part of the conditional, thus leading us to believe that caching (?) might be per combination of all the above? Grant. . . .
Salim S I wrote:> The caching is per destination and source ip. TOS, fwmark and input > interface too, if present.Interesting... It definitely did not work in my scenario though. I am going to test this again in the near future, and if you are right I will rest my case.> Routing with netfilter does not solve cache problems anyway, cache will > still be present, and it will be consulted before routing tables are > hit.This is true for locally generated traffic only. Any incomming/forwarded traffic can be controlled in the PREROUTING, thus the cache is never consulted.> Both solutions work pretty well; there are die-hard fans for both of the > above approaches. Recent archives of lartc have lot of discussions on > it.I am actually simply jealous that some people apparently get it to work in-kernel, and I can''t seem to. My requirements are pretty simple: o As transparrent as possible DGD, that can detect 2nd and 3rd hop failures o Robust load balancing - connections are distributed over all available links, regardless of source and destination, with the possibility of assigning relative channel priorities o NAT compatible - link hopping is not an option, traffic with a specific SRC/DST must stay where it started.
> This is true for locally generated traffic only. Anyincomming/forwarded> traffic can be controlled in the PREROUTING, thus the cache is never > consulted.The cache will still be consulted, in ip_route_input. That is for input and forwarded traffic. Only if there is no matching entry, routing tables will be employed. If you look in the cache, you can see routes cached for same destination through both wan interfaces. (well, in my case, I can see...).But their fwmarks are different,as evident from ip_conntrack.
On 6/27/2007 12:54 AM, Peter Rabbitson wrote:> I am actually simply jealous that some people apparently get it to > work in-kernel, and I can''t seem to.Ah, so the truth comes out. ;)> My requirements are pretty simple: > o As transparrent as possible DGD, that can detect 2nd and 3rd hop > failuresThink about what you just asked for. "Dead Gateway Detection" is used to detect dead (upstream) (default) gateway(s). Rather it is not meant to detect dead routes beyond your gateway(s). To do this you will need some sort of utility to monitor things for you. I.e. you will not be able to get the kernel to detect that a gateway is good for some things but not for others. Actually if you stop to think about it, this is beyond the scope of what the kernel should do. This is more the scope of a routing protocol and / or a route management daemon. In short, use something to test reachability to destinations and use ip rules to choose routing tables accordingly. I.e. have a default routing table that will try to use any / all interfaces routes and have alternative routing tables that will try fewer interfaces / routes.> o Robust load balancing - connections are distributed over all > available links, regardless of source and destination, with the > possibility of assigning relative channel prioritiesI think this is close to being possible depending on your scenario (NAT or not) and a few other things. It was my understanding that equal cost multi path routing was suppose to accomplish this very thing. I.e. if you had globally routable IP addresses behind the router, you could send traffic out either link, hopefully in such a fashion as to (hopefully) fully utilize all links. ECMP does include weight options to assign ratios to routes. However, after discussion in this thread, I question if ECMP will do this or not.> o NAT compatible - link hopping is not an option, traffic with a > specific SRC/DST must stay where it started.I think this is the simpler of the above "robust load balancing" as you say. In my opinion, this should be the first of the things to be achieved and then try to extend this to be the above. What you have proposed with load balancing via Netfilter should be able to achieve this with out any problems. Or at least I would think such. Grant. . . .
Grant Taylor wrote:> On 6/27/2007 12:54 AM, Peter Rabbitson wrote: >> I am actually simply jealous that some people apparently get it to >> work in-kernel, and I can''t seem to. > > Ah, so the truth comes out. ;)Hehe>> My requirements are pretty simple: >> o As transparrent as possible DGD, that can detect 2nd and 3rd hop >> failures > > Think about what you just asked for. "Dead Gateway Detection" is used > to detect dead (upstream) (default) gateway(s). Rather it is not meant > to detect dead routes beyond your gateway(s). To do this you will need > some sort of utility to monitor things for you. I.e. you will not be > able to get the kernel to detect that a gateway is good for some things > but not for others. Actually if you stop to think about it, this is > beyond the scope of what the kernel should do. This is more the scope > of a routing protocol and / or a route management daemon. > > In short, use something to test reachability to destinations and use ip > rules to choose routing tables accordingly. I.e. have a default routing > table that will try to use any / all interfaces routes and have > alternative routing tables that will try fewer interfaces / routes.This is the most fragile part of my current setup. And DGD based on packet counts IMO is an extremely simple thing to do, I discussed it recently with you. If something like this was present in-kernel the world would be a better place.>> o Robust load balancing - connections are distributed over all >> available links, regardless of source and destination, with the >> possibility of assigning relative channel priorities > > I think this is close to being possible depending on your scenario (NAT > or not) and a few other things. > > It was my understanding that equal cost multi path routing was suppose > to accomplish this very thing. I.e. if you had globally routable IP > addresses behind the router, you could send traffic out either link, > hopefully in such a fashion as to (hopefully) fully utilize all links. > ECMP does include weight options to assign ratios to routes.For globally routable addresses it doesn''t really matter, because you can not usually detect it (things still work).> What you have proposed with load balancing via Netfilter should be able > to achieve this with out any problems. Or at least I would think such.It actually does work in production for quite some time now. But as said before - it is ugly and fragile. I understand that we are coming from different environments, but I still think that my figure of 90% is rather accurate. If you can afford not to do NAT, means that most likely you also have access to the ISPs dynamic routing protocols as well, and the entire discussion becomes pointless. On the contrary if you run NAT, most likely you are a poor-mans-ISP or smaller, running off two consumer DSL links, and all of the above applies. Either way I rest my case here, as we are comparing apples to dinosaurs, and went too far OT :) Peter
On 6/27/2007 1:58 AM, Peter Rabbitson wrote:> And DGD based on packet counts IMO is an extremely simple thing to > do, I discussed it recently with you.(If I recall correctly and / or re-read the appropriate thread correctly.) What you were talking about doing was pinging (of sorts, be it ICMP, testing connections, sending layer 7 traffic, etc.) destinations out side of your upstream gateway. Correct?> If something like this was present in-kernel the world would be a > better place.I agree that if there was a way for the kernel to handle this the world would be a better place. However, I think it silly to expect the kernel to do this. Well let me take a moment to be sure we are thinking the same thing. You want the kernel to be able to realize that one route through a given default gateway is no good for a given destination and use a different default gateway even though the kernel can reach other destinations through the first default gateway? In other words, if the kernel can not reach microsoft.com through ISP1 it should use ISP2 despite the fact that it can reach google.com through ISP1? Grant. . . .
On 6/26/2007 9:14 PM, Mohan Sundaram wrote:> I remember that route balancing has an option to perform per packet > balancing and not per connection. If that were to work, then route cache > would not be used IMHO. Per packet balancing is normally not done as it > would break connections especially in NAT''ted scenario.To quote the man page for ip, it looks like the balancing is not per packet as you indicate, but rather per-flow. """equalize - allow packet by packet randomization on multipath routes. Without this modifier, the route will be frozen to one selected nexthop, so that load splitting will only occur on per-flow base. equalize only works if the kernel is patched.""" Grant. . . .
On 6/27/2007 2:50 AM, Mohan Sundaram wrote:> """equalize - allow packet by packet randomization on multipath > routes. Without this modifier, the route will be frozen to one > selected nexthop, so that load splitting will only occur on per-flow > base. equalize only works if the kernel is patched."""I think we both pasted the same quote. If you do use the "equalize" keyword, you do get a packet by packet / per-packet effect. Where as if you do not use the "equalize" keyword, you get a per-flow effect, which is what I was trying to state is the apparent default. Grant. . . .
On 6/27/2007 2:53 AM, Mohan Sundaram wrote:> Pardon my earlier mail.*nod* Pardon my reply. ;)> This says if equalize patch/keyword is used, packet randomisation > happens. Exactly what we want, is it not?(Referring back to your earlier message...) Yes, I think this is what we want in this scenario. Grant. . . .
Grant Taylor wrote:> Well let me take a moment to be sure we are thinking the same thing. You > want the kernel to be able to realize that one route through a given > default gateway is no good for a given destination and use a different > default gateway even though the kernel can reach other destinations > through the first default gateway? In other words, if the kernel can > not reach microsoft.com through ISP1 it should use ISP2 despite the fact > that it can reach google.com through ISP1? >No, nothing like this. Basically my idea is that a no-packet-seen timer is maintained for every gateway, excluding any packets with a source within the ISPs netblock. This will reliably detect that no traffic is seen beyond the ISP, and therefore pronounce the gateway dead. The only configuration required from the administrator would be an address/netmask pair for every gateway, to use as an exclusion for the counters, and a no-packets-seen timeout, before a gateway is marked as dead. Any incoming activity on the gateway will immediately change its status back to active. So to answer your exact question - I want the kernel to be able to realize that a gateway is no good for any destinations other than the specified netblock. Peter
On 6/27/2007 2:59 AM, Mohan Sundaram wrote:> I think that default makes sense. If we want pkt based balancing, we > enable it explicitly.Agreed. We / people just have to be aware that is what it does so that they don''t have false expectations. Of course, this is a fairly common problem in unix. Grant. . . .
On 6/27/2007 3:03 AM, Peter Rabbitson wrote:> I want the kernel to be able to realize that a gateway is no good for > any destinations other than the specified netblock.Would it be fair to say that you are wanting an administratively configurable "ignore addresses that fall with in this <network>" while deciding if a gateway is dead? Obviously <network> would need to be a bit more than just an ip / netmask combination to make this realistic. If this is what you are wanting, it may be possible to augment the kernel code that is used to detect dead gateways and have it check to see if the networks match a list (from somewhere in proc / sysfs / sysctl?) and not increment traffic counters. I am presuming that it is the traffic counters that have to be incremented for the kernel to think that a route is still alive. So, if you purposfully did not increment the counters, you could probably detect that a given gateway is no good. I think you would have to add an additional route that was to the given network(s) that did not use such a feature to provide a way for the routing code to route to those network(s) that it no longer would get to via a default gateway. What do you think? Grant. . . .
On 6/27/2007 3:22 AM, Mohan Sundaram wrote:> *A word of caution*. My connections went awry more due to out of > order delivery of packets and I had a hell of a time troubleshooting > it as the problem did not appear consistently,:-(. Did not know where > in the whole chain I has the problem. Is like the MTU problem in > PPTP.*nod* This is a warning that you see a LOT of places when you start talking about per packet verses per flow load balancing. Cisco is VERY big in to giving this warning. Despite being aware of this problem, I have yet (knock on wood) to run in to this problem my self. Grant. . . .
On 6/27/2007 3:24 AM, Grant Taylor wrote:> This is a warning that you see a LOT of places when you start talking > about per packet verses per flow load balancing. Cisco is VERY big in > to giving this warning.I wonder how much of packet out of order problem would happen with two parallel links verses two asymmetric routes through the internet core where one packet will take 27 hop route while the other will take a 37 hop route. Grant. . . .
Grant Taylor wrote:> On 6/27/2007 3:03 AM, Peter Rabbitson wrote: >> I want the kernel to be able to realize that a gateway is no good for >> any destinations other than the specified netblock. > > Would it be fair to say that you are wanting an administratively > configurable "ignore addresses that fall with in this <network>" while > deciding if a gateway is dead? > > Obviously <network> would need to be a bit more than just an ip / > netmask combination to make this realistic. > > If this is what you are wanting, it may be possible to augment the > kernel code that is used to detect dead gateways and have it check to > see if the networks match a list (from somewhere in proc / sysfs / > sysctl?) and not increment traffic counters. I am presuming that it is > the traffic counters that have to be incremented for the kernel to think > that a route is still alive. So, if you purposfully did not increment > the counters, you could probably detect that a given gateway is no good.Something along these lines, yes. Except that instead of a packet-counter there is a resettable timer, that gets reset anytime a matching packet comes in. When the timer goes over a specified limit - gateway is dead.> I think you would have to add an additional route that was to the given > network(s) that did not use such a feature to provide a way for the > routing code to route to those network(s) that it no longer would get to > via a default gateway. >This would be a manual task for the administrator, there is no place for this in-kernel.
On 6/27/2007 4:09 AM, Peter Rabbitson wrote:> Something along these lines, yes. Except that instead of a > packet-counter there is a resettable timer, that gets reset anytime a > matching packet comes in. When the timer goes over a specified limit - > gateway is dead.I think this is usually called / treated as a (time until) "Dead (Counter) Time" as in the timer counts down and as soon as it hits zero, the item is considered dead. Any time something passes through and refreshes it, the time to live is placed in the (time until) "Dead (Counter) Timer".> This would be a manual task for the administrator, there is no place for > this in-kernel.Agreed. I will state that I think you are asking for a bit much, but you are free to ask for what ever you want to, or are willing to code your self. ;) Grant. . . .