The LARTC howto correctly describes load balancing and split access for traffic from a machine with multiple ISP connections (http://www.lartc.org/lartc.html#LARTC.RPDB.MULTIPLE-LINKS) -- *provided* the traffic originates from the machine itself (i.e. traffic regularly handled by the INPUT and OUTPUT chains of iptables). When forwarding traffic from an attached local network, the following problems occur with traffic from the local network to internet hosts: 1. The ip rule add from x.x.x.x refers to local IP addresses before NAT, such as 192.168.0.44, rather than the public IP address after NAT (and certainly not both). This is the fundamental problem that causes load balancing and split access to be unreliable. 2. Cached routes are dropped periodically from the route cache, even while in active use: this causes connection reset errors and strange timeouts. 3. To frustrate iptables based work-arounds, routing does not obey marks added with iptables -t mangle -A PREROUTING. It seems that ip fwmark rules are not obeyed if the route is cached, and the cache hash does not include the firewall mark (or maybe it does, but it doesn''t work ?!?). (Interestingly, cached routing *does* obey the TOS bits, which makes creative work-arounds marginally possible. There just aren''t too many TOS values to play with.) Is there a solution to these problems which works with the official kernels? If so, which versions? If not, which patches resolve these problems? &:-) -- Disclaimer: in the event of this disclaimer being incomplete
On Wednesday 21 February 2007 11:10, Andrew McGill wrote:> The LARTC howto correctly describes load balancing and split > access for traffic from a machine with multiple ISP connections > (http://www.lartc.org/lartc.html#LARTC.RPDB.MULTIPLE-LINKS) -- > *provided* the traffic originates from the machine itself (i.e. > traffic regularly handled by the INPUT and OUTPUT chains of > iptables). > > When forwarding traffic from an attached local network, the > following problems occur with traffic from the local network to > internet hosts: > > 1. The ip rule add from x.x.x.x refers to local IP addresses > before NAT, such as 192.168.0.44, rather than the public IP > address after NAT (and certainly not both). This is the > fundamental problem that causes load balancing and split > access to be unreliable.the ''ip rule'' is evaluated before the routing desition so, this is before FORWARD and before POSTROUTING(the place where NAT actualy happens) so far. Hence, refer to the local IP is correct.> 2. Cached routes are dropped periodically from the route cache, > even while in active use: this causes connection reset errors > and strange timeouts.this is true, and to increase the routes timeout does not help cause at the end you have all internet routes cached making your kernel/noswapable RAM to kicks out every single app on the host. The solution is to use CONNTRACK from iptables, full example described in this[1] e-mail from the archive. No patches needed.> 3. To frustrate iptables based work-arounds, routing does not > obey marks added with iptables -t mangle -A PREROUTING. It > seems that ip fwmark rules are not obeyed if the route is > cached,this is not true, the rules are evaluated before the routing desition, so fwmark rules works as expected. You really think that this BUG can be in linux kernel since early 2.3/2.4 versions and be discovered today?> and the cache hash does not include the firewall mark > (or maybe it does, but it doesn''t work ?!?). (Interestingly, > cached routing *does* obey the TOS bits, which makes creative > work-arounds marginally possible. There just aren''t too many > TOS values to play with.)> Is there a solution to these problems which works with the official > kernels? If so, which versions? If not, which patches resolve these > problems?Yes as i pointed out, there is a solution and no patches needed. [1] http://mailman.ds9a.nl/pipermail/lartc/2006q2/018964.html -- Luciano
Ming-Ching Tiew
2007-Feb-22 02:58 UTC
Re: Split access, load balancing AND forwarding: HOW?
From: "Luciano Ruete" <luciano@lugmen.org.ar>> > The solution is to use CONNTRACK from iptables, full example described in > this[1] e-mail from the archive. No patches needed. > > [1] http://mailman.ds9a.nl/pipermail/lartc/2006q2/018964.html >I think you mean CONNMARK ( not CONNTRACK ) from iptables ? The ever popular routing command :-> > #route commands > ip ro add default nexthop via x.x.x.x dev eth1 weight 1 nexthop via y.y.y.y dev eth2 >I personal view is that ***NEVER*** use such a routing statement, or never let the system has a chance to use such a routing statement, especially when you are doing NAT. The email example above included this routing statement but it is not used because the ''ip rule'' takes precedence. The multipath weighted cached based routing is problematic. I would say it would be better to re-order the the iptables command :- #restore mark before ROUTING decision iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark #by-pass rules if it is already MARKed iptables -t mangle -A POSTROUTING -m mark --mark ! 0 -j ACCEPT #1st packets(from a connection) will arrive here iptables -t mangle -A POSTROUTING -o eth1 -j MARK --set-mark 0x1 iptables -t mangle -A POSTROUTING -o eth2 -j MARK --set-mark 0x2 iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark ie restore-mark is moved to the top. I strongly recommend that the LARTC documentation be updated, especially it encourages people to use multipath weighted routing instead of iptables based solution. Cheers.
Ming-Ching Tiew
2007-Feb-22 04:57 UTC
Re: Split access, load balancing AND forwarding: HOW?
From: "Ming-Ching Tiew" <mingching.tiew@redtone.com>> > I would say it would be better to re-order the the iptables command :- > > #restore mark before ROUTING decision > iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark > #by-pass rules if it is already MARKed > iptables -t mangle -A POSTROUTING -m mark --mark ! 0 -j ACCEPT > #1st packets(from a connection) will arrive here > iptables -t mangle -A POSTROUTING -o eth1 -j MARK --set-mark 0x1 > iptables -t mangle -A POSTROUTING -o eth2 -j MARK --set-mark 0x2 > iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark > > ie restore-mark is moved to the top. >On more careful reading, I am wondering why it is using POSTROUTING ? Shouldn''t it all be PREROUTING ? Cheers.
On Wednesday 21 February 2007 23:58, Ming-Ching Tiew wrote:> From: "Luciano Ruete" <luciano@lugmen.org.ar> > > > The solution is to use CONNTRACK from iptables, full example described in > > this[1] e-mail from the archive. No patches needed. > > > > [1] http://mailman.ds9a.nl/pipermail/lartc/2006q2/018964.html > > I think you mean CONNMARK ( not CONNTRACK ) from iptables ?sory a brain-o, but in the email refered is well explained.> > The ever popular routing command :- > > > #route commands > > ip ro add default nexthop via x.x.x.x dev eth1 weight 1 nexthop via > > y.y.y.y dev eth2 > > I personal view is that ***NEVER*** use such a routing statement, or never > let the system has a chance to use such a routing statement, especially > when you are doing NAT.You are ***WRONG*** here :-) The multipath statement works really great, but is connection state-less without the iptables CONNMARK help.> The email example above included this routing > statement but it is not used because the ''ip rule'' takes precedence.WRONG, the first packet of a trackeable connection does get routed by the multipath routing statement. Once routed for one of the weighted gw, it is MARKEd and --saved by CONNMARK. The second(and all the rest) packet from that connection will use always the same gateway. So, ''ip ro nexthop'' does the weighted gw selection and balancing, then i use CONNMARK to ensure that packets from the same flow keep always the same gateway. I got this working in production server in 3 ISPs, and belive me, it works like a swiss clock.> The > multipath weighted cached based routing is problematic.if you do not use something that can track the connection yes, but hey, you have CONNMARK now, and before that you can do the same trick(and still can) with julian''s anastasov patches.> I would say it would be better to re-order the the iptables command :- > > #restore mark before ROUTING decision > iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark > #by-pass rules if it is already MARKed > iptables -t mangle -A POSTROUTING -m mark --mark ! 0 -j ACCEPT > #1st packets(from a connection) will arrive here > iptables -t mangle -A POSTROUTING -o eth1 -j MARK --set-mark 0x1 > iptables -t mangle -A POSTROUTING -o eth2 -j MARK --set-mark 0x2 > iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark > > ie restore-mark is moved to the top.it produces the same result, i think is better to understand if the restore command goes at the end, cause first you talk about the mark, and at the end you talk about restore it. If you put the restore first, the newbie will ask "what the hell i am restoring???". But is a matter of taste.> I strongly recommend that the LARTC documentation be updated, especially it > encourages people to use multipath weighted routing instead of iptables > based solution.The docs are outdated but tecnically ok, they where wrote by people who really know about the matters. It is more dangerous that you say things like the ones you wrote in this email (wich are enourmosly wrong) and google indexed them. -- Luciano
On Thursday 22 February 2007 01:57, Ming-Ching Tiew wrote:> From: "Ming-Ching Tiew" <mingching.tiew@redtone.com> > > > I would say it would be better to re-order the the iptables command :- > > > > #restore mark before ROUTING decision > > iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark > > #by-pass rules if it is already MARKed > > iptables -t mangle -A POSTROUTING -m mark --mark ! 0 -j ACCEPT > > #1st packets(from a connection) will arrive here > > iptables -t mangle -A POSTROUTING -o eth1 -j MARK --set-mark 0x1 > > iptables -t mangle -A POSTROUTING -o eth2 -j MARK --set-mark 0x2 > > iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark > > > > ie restore-mark is moved to the top. > > On more careful reading, I am wondering why it is using POSTROUTING ? > > Shouldn''t it all be PREROUTING ?_NO_, cause i need that ''multipath routing'' makes the ''weighted routing decision'' in the first packet of each new connection. Once it is routed, all the other packets from same flow are hacked in PREROUTING, they mark are resotred and ip rule garantize that they will go by the same gateway as the first packet. This solution works in theory and in practice, so plz, get your hands dirty before you post your next great idea. -- Luciano
Hello, Can anyone tell me what is the effect, in terms of latency, if I have: 1. 1 thousand defined HTB classes which represents 1 thousand users. I filter the traffic based on source IP address. 2. multiple class hierarchy inside a class. Is there a limit on the depth of hierarchy in HTB? Thank you very much in advance. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Ming-Ching Tiew
2007-Feb-23 07:23 UTC
Re: Split access, load balancing AND forwarding: HOW?
From: "Luciano Ruete" <luciano@lugmen.org.ar>> > This solution works in theory and in practice, so plz, get your hands dirty > before you post your next great idea. >I understand your explanation fully but believe me I also have got hand-on experience with using the alternative, ie 1. I don''t use multipath weight routing. 2. I use PREROUTING all the way, ie I don''t use POSTROUTING. Instead, I use iptables ''recent'' and ''statistics''/''random'' match to achieve load sharing. I have use this for many years already, believe me I am not theoretical. It''s just a matter of different ways to doing things. If you search the web it will come upon many others using the same method I used. Cheers
I''ve put together scripts to save the routing tables in text or xml form. This message explains 1. how they are used 2. the approach I have taken 3. what is needed next. This is related to other work which converts iptables to xml and back (now in iptables 1.3.7), htb files into xml and xml to tc commands (complete but not yet released) and ebtables to xml and back (to be released any day now). I find that having iptables, ebtables, tc and ip route as xml permits some very useful manipulations. This first draft is thorough, but maybe not correct, so I welcome feedback; please see STRATEGY further down: I''ve made use of bash''s <( ) to package some external resource files (large sed script and xslt) in the same bash script so there is only one file to distribute and try out. You will need xsltproc to restore xml routes Basic Usage ========== iproute save [-x][-t][-s] [filename]xml will save the rules and routes in text [-t] format (default), xml [-x] format, and with [-s] or without the /proc/sys/net/ipv4/route routing paramters. The save is atomic. xml iproute restore [-x][-t][-s] [filename] behaves similarly. If iproute restore fails to restore the routes, it attempts to rollback to the previous set of routes. Conversion to xml is managed with sed; conversion from xml requires xsltproc, a command line C based xslt 1.0 processor. iproute flush will pretty much flush the rules and routes; this is also done before restore or rollback. See Strategy. Formats - TEXT -------------- The text format is roughly the same as: ( ip -o rule show ; ip -o route show table all ) but has a notation like:> error_burst 1250for the sysctl paramters; which are converted to echo 1250 > /proc/sys/net/ipv4/route/error_burst Formats - XML ------------- The xml format is along these lines: <ip-route version="1.0" error_burst="1250" gc_elasticity="8" ... > <rules> ... <rule priority="298" from="11.22.111.222" table="222"/> ... </rules> <tables> <table id="222"> <route prefix="1.1.0.0/8" mtu="100" metric="40"> <next-hop via="192.168.0.23" dev="eth3" weight="2"/> <next-hop via="10.1.1.1" dev="eth4" weight="3" type="onlink"/> </route> ... </table> ... </tables> </ip-route> Stategy ====== Flushing -------- Before applying routes and rules (or rolling back previous routes and rules on failure) some kind of routing table flush is needed. The difficulty I am under is knowing what should be flushed and what should not. I guess if someone is using a routing daemon, they won''t be using this - although I could be wrong; thus I only need to be careful where proto kernel, or I can be liberal where proto = boot (or not specified). As the kernel only fiddles with routes in the main and default tables, I don''t bother to check proto generally. Also I don''t want to fiddle with the local table. I''ve decided not to fiddle with the default table (is this right?) I empty every other table totally, except the main table where I preserve routes with scope=link, as to my dull mind messing with these would be silly. I delete all rules except 0, 32766 and 32767 So, yes, I also delete the default gateway. If someone also wanted to use this on a system with a routing daemon running, we should probably pay more attention to the proto parameter; I would need to talk with someone who knows more about mixing routing daemons with non-daemon routes. I flush tables using the "ip route flush" command. Parsing / Restoring ------------------- I flush or restore rules by some sed which converts the initial priority and colon to the keyword priority; combined with the rest of the output these seems parsable as input. similarly for tables, prefixing route add or route delete to the results of a line from: ip -o route show table all seems a good way to delete or restore a route individually. dev entry --------- When restoring routes it seems prudent to remove the dev entry for any via unless it is an onlink route; for the reason that generally the dev entry was not specified initially but divined from the device tables and netmasks and such, and it would be a good idea to let ip route do this again in case the interface addressing has changed. Currently the dev entry is removed during xml restore and not text restore. It should probably be removed during save, so that it may be specified strictly as part of a fresh xml routing profile without being ignored. This would have the disadvantage that when save were used for diagnostic or informational purposes the dev information would be missing. Local Main Default ------------------ Likewise, for informational reasons, the default, local and main table (and their rules) are saved; but skipped when a restore takes place. It is presumed that whatever set these tables up in the first place (and their rules) has done the same again. The only exception is that routes for default are processed if they have no scope (text restore) or they are default routes (xml restore). This difference is historical and should be removed. Converting to XML ----------------- I''ve written a few scripts to convert from various text formats to xml. This one was the most fun yet. I do rely on the fact that ip route spits out simple text that does not need any xml escaping (my other conversions do xml escaping properly). Because the output of ip route is structured fluidly (the via information appears in the middle of the per-route information unless there is more than one hop) and because I''m not entirely sure of all the outputs of ip route, I''ve structured the xml conversion to detect when unrecognized parameters exist, and output xml comments to warn of this.today Those who have used sed heavily will know it only has 2 variables: the "current" line, and a "spare". I''ve made cunning use of these in ways that will be hard to grasp and easy to break if you don''t know sed well. It''s not as spaghetti as it looks, I have had do develop certain idioms for converting to xml and once you spot these a new macro-level understanding will coalesce making the whole thing easier to understand. Why did I use sed seeing how complicated it is? I''ve learned that using bash to convert xml is a SLOW mistake, and I really don''t want to introduce dependancies on anything even as big as awk - which probably would have done a better job (what with having more than two variables, and hashes and all!). So what about xsltproc? libxslt and xsltproc are smaller than gawk. Also, I was learning sed for fun and wanted to see how far I could push it. However, Generating XML is generally simple; if what I have done is worthwhile, it will be simple enough to get the ip command to output xml as an alternative format. Converting from XML ------------------- I''ve been generating bash-able output from XML for quite some time and developed a full xslt library of escape functions to prevent injection errors from crafted xml. For this project, and to keep the xslt dependancies simple, I''ve tried a new strategy which is to not generate bash script lines and execute them, but instead to validate the output in bash and use them as parameters instead of commands, reducing the scope for naughtiness. Because the xml saving was done before the text saving, the xslt generates ip route commands directly instead of the text format and then using the text format conversion. The asymmetry is interesting but I don''t plan to do anything about it. Next Steps ========= This work suits me and my employer, but we don''t see why we should be the only ones to benefit. Our benefit is not in having xml routes, but how we process the xml. Those who have an interest, please examine this and see how useful it is. I have a source-rpm as well which I will distriute once we''ve reviewed this once. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
On Fri, Feb 23, 2007 at 03:23:42PM +0800, Ming-Ching Tiew wrote:> From: "Luciano Ruete" <luciano@lugmen.org.ar> > > > > This solution works in theory and in practice, so plz, get your hands dirty > > before you post your next great idea. > > > > I understand your explanation fully but believe me I also have got > hand-on experience with using the alternative, ie > > 1. I don''t use multipath weight routing. > 2. I use PREROUTING all the way, ie I don''t use POSTROUTING. > > Instead, I use iptables ''recent'' and ''statistics''/''random'' match to achieve > load sharing.hi sorry missed the previous bits of the thread, could you post the relevant info, interested to see how this works and why you would pick it over the multipath method> > I have use this for many years already, believe me I am not theoretical. > It''s just a matter of different ways to doing things. If you search the web > it will come upon many others using the same method I used. > > Cheers > > > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >_______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Ming-Ching Tiew
2007-Feb-23 23:59 UTC
Re: Split access, load balancing AND forwarding: HOW?
From: "Alex Samad" <alex@samad.com.au>> hi > > sorry missed the previous bits of the thread, could you post the relevant info, > interested to see how this works and why you would pick it over the multipath > methodPlease note my checking of marked traffic is not ( according to the earlier posts )> iptables -t mangle .... -m mark --mark ! 0 -j ACCEPTHowever, it is :-> iptables -t mangle .... -m mark ! --mark 0 -j ACCEPTI leave it to you guys to decide which is the correct syntax. The code below is taken from part of my bigger code :- Cheers. ---------------------code------------------------------------------- LINK1_MARK=5 LINK2_MARK=7 OUTSIDE_DEV_WEIGHT=0.5 INSIDE_DEVICE=eth0 OUTSIDE_DEVICE=eth1 OUTSIDE_DEVICE2=eth2 SAVEMARK="-m mark ! --mark 0 -j CONNMARK --save-mark" ACCEPTMARK="-m mark ! --mark 0 -j ACCEPT" SETMARK1="-j MARK --set-mark ${LINK1_MARK}" SETMARK2="-j MARK --set-mark ${LINK2_MARK}" # #first, restore and accept the mark if there is any iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark iptables -t mangle -A PREROUTING ${ACCEPTMARK} #handle inbound for link1 iptables -t mangle -A PREROUTING -i ${OUTSIDE_DEVICE} ${SETMARK1} iptables -t mangle -A PREROUTING -i ${OUTSIDE_DEVICE} ${SAVEMARK} iptables -t mangle -A PREROUTING ${ACCEPTMARK} #handle inbound for link2 iptables -t mangle -A PREROUTING -i ${OUTSIDE_DEVICE2} ${SETMARK2} iptables -t mangle -A PREROUTING -i ${OUTSIDE_DEVICE2} ${SAVEMARK} iptables -t mangle -A PREROUTING ${ACCEPTMARK} # (other features implementation snipped ) #handle recent outbound iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} -m recent --name link1 \ --update --second 300 ${SETMARK1} iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} -m recent --name link2 \ --update --second 300 ${SETMARK2} iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} ${SAVEMARK} iptables -t mangle -A PREROUTING ${ACCEPTMARK} # #non-recent outbound randomly allocated # iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} \ -m statistic --mode random --probability ${OUTSIDE_DEV2_WEIGHT} \ -m recent --name link2 --set ${SETMARK2} iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} ${SAVEMARK} iptables -t mangle -A PREROUTING ${ACCEPTMARK} iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} \ -m recent --name link1 --set ${SETMARK1} iptables -t mangle -A PREROUTING -i ${INSIDE_DEVICE} ${SAVEMARK} iptables -t mangle -A PREROUTING ${ACCEPTMARK}