I''ve been reading about multipath routes and found something that no howto I saw mentioned so far: multipath algorithms. The kernel has the followings: # zgrep MULTIPATH_ /proc/config.gz CONFIG_IP_ROUTE_MULTIPATH_CACHED=y CONFIG_IP_ROUTE_MULTIPATH_RR=m CONFIG_IP_ROUTE_MULTIPATH_RANDOM=m CONFIG_IP_ROUTE_MULTIPATH_WRANDOM=m CONFIG_IP_ROUTE_MULTIPATH_DRR=m CONFIG_DM_MULTIPATH_EMC iproute2 also has support for these (at least, it passed them forward to the kernel): static char *mp_alg_names[IP_MP_ALG_MAX+1] = { [IP_MP_ALG_NONE] = "none", [IP_MP_ALG_RR] = "rr", [IP_MP_ALG_DRR] = "drr", [IP_MP_ALG_RANDOM] = "random", [IP_MP_ALG_WRANDOM] = "wrandom" }; The "ip route add" option is "mpath". I quickly tried with an adsl modem on ppp0 and dialup one on ppp1 and using drr seems to have worked, tcpdump showed locally originated traffic going out both interfaces. Anybody else tried these and care to comment?
Seems really interesting, I also noticed this in kernel sources but I didn''t try it since I didn''t see it in any howtos out there. I''m using load balancing with Julian''s patch for 5 dsl lines but I will be adding 20 more in short, so I may try out this. BTW Julian''s patch didn''t work for me with 2.6 kernel, but it did with 2.4.29. Edu PD: what command lines did you use to make load balancing work with that kernel module? On 3/6/06, Andreas Hasenack <ahasenack@terra.com.br> wrote:> I''ve been reading about multipath routes and found something that no howto I > saw mentioned so far: multipath algorithms. > > The kernel has the followings: > # zgrep MULTIPATH_ /proc/config.gz > CONFIG_IP_ROUTE_MULTIPATH_CACHED=y > CONFIG_IP_ROUTE_MULTIPATH_RR=m > CONFIG_IP_ROUTE_MULTIPATH_RANDOM=m > CONFIG_IP_ROUTE_MULTIPATH_WRANDOM=m > CONFIG_IP_ROUTE_MULTIPATH_DRR=m > CONFIG_DM_MULTIPATH_EMC > > iproute2 also has support for these (at least, it passed them forward to the > kernel): > static char *mp_alg_names[IP_MP_ALG_MAX+1] = { > [IP_MP_ALG_NONE] = "none", > [IP_MP_ALG_RR] = "rr", > [IP_MP_ALG_DRR] = "drr", > [IP_MP_ALG_RANDOM] = "random", > [IP_MP_ALG_WRANDOM] = "wrandom" > }; > > The "ip route add" option is "mpath". I quickly tried with an adsl modem on > ppp0 and dialup one on ppp1 and using drr seems to have worked, tcpdump > showed locally originated traffic going out both interfaces. > > Anybody else tried these and care to comment? > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >
On Wed, Mar 15, 2006 at 04:22:06PM +0100, Eduardo Fernández wrote:> Seems really interesting, I also noticed this in kernel sources but I > didn''t try it since I didn''t see it in any howtos out there. I''m using > load balancing with Julian''s patch for 5 dsl lines but I will be > adding 20 more in short, so I may try out this. BTW Julian''s patch > didn''t work for me with 2.6 kernel, but it did with 2.4.29.I never used those patches. For me, not being included in the mainstream kernel for all those years has to mean something is broken somewhere.
On Wed, 2006-03-15 at 13:33 -0300, Andreas Hasenack wrote:> > I never used those patches. For me, not being included in the mainstream > kernel for all those years has to mean something is broken somewhere.Well at the time, back in my 2.2/2.4 years. It was lacking or broken support in the kernel. Sure one could use a multipath gateways but for many reason it just flat out would not work. With Julian''s patches and nat it resolved those problems. Since it''s addressing something most people using Linux will never attempt. Along with many other reasons they have never been included. Like static routes. Ever made a entry in a routing table, to have it disappear when the interface goes down. Only appearing again when the interface is up, if that routing table is re-created. The static patch resolves that. I do not believe the patches would cause harm to introduce into the kernel. But there has been reasons for keeping it out. Not always good or technical ones. I recall discussing it with Julian. I believe Julian has attempted many times to get them into the main stream kernel. In fact after looking into them, I think I might need them to resolve some issues I am having now. Even though I am not using multipath gateway, or dead gateway detection. Just multiple ISP''s multiple networks, routed to different ISP''s. Like my arp issues, yet to be mentioned. -- Sincerely, William L. Thomson Jr. Obsidian-Studios, Inc. http://www.obsidian-studios.com
On 3/15/06, William L. Thomson Jr. <wlt@obsidian-studios.com> wrote:> On Wed, 2006-03-15 at 13:33 -0300, Andreas Hasenack wrote: > > > > I never used those patches. For me, not being included in the mainstream > > kernel for all those years has to mean something is broken somewhere. > > Well at the time, back in my 2.2/2.4 years. It was lacking or broken > support in the kernel. Sure one could use a multipath gateways but for > many reason it just flat out would not work. With Julian''s patches and > nat it resolved those problems. Since it''s addressing something most > people using Linux will never attempt. Along with many other reasons > they have never been included. > > Like static routes. Ever made a entry in a routing table, to have it > disappear when the interface goes down. Only appearing again when the > interface is up, if that routing table is re-created. The static patch > resolves that. > > I do not believe the patches would cause harm to introduce into the > kernel. But there has been reasons for keeping it out. Not always good > or technical ones. I recall discussing it with Julian. I believe Julian > has attempted many times to get them into the main stream kernel. > > In fact after looking into them, I think I might need them to resolve > some issues I am having now. Even though I am not using multipath > gateway, or dead gateway detection. Just multiple ISP''s multiple > networks, routed to different ISP''s. Like my arp issues, yet to be > mentioned. >Personally, I applied the the patches at the time to a 2.6 kernel and the end result was they would not work correctly for multipath routing, and would result in kernel panics after enough route paths got cached. I made sure I was applying the correct patch to correct kernel sources and that I was using a correct route configuration, but noone could offer me any help. In the end the only "solutions" I got was to either change back to a 2.4 kernel, or change to redhat. Neither of which were options I could use. In the end this left my impression of the patches as rather lacking, especially with the kernel panics. Though who knows, maybe if they had been included in the kernel there''d be more interest in testing and maintaining them with current kernel sources. - Jody
On Wed, 2006-03-15 at 15:28 -0500, Jody Shumaker wrote:> > Personally, I applied the the patches at the time to a 2.6 kernelWhat kernel? A vanilla one or a one provide via a distro? Beyond that I would be curious to know what kernel version.> and > the end result was they would not work correctly for multipath > routing, and would result in kernel panics after enough route paths > got cached.Well it''s been quite a while. When I was doing it, I was working with a 2.2 kernel. However, even with my attempts with a 2.4. Without Julian''s patches most things do not work correctly. Like now with my 2.6 kernel and some arp issues. Gentoo''s 2.6.14-hardened-r5> I made sure I was applying the correct patch to correct > kernel sources and that I was using a correct route configuration, but > noone could offer me any help.Did you try to get in touch with Julian?> In the end the only "solutions" I got > was to either change back to a 2.4 kernel, or change to redhat.Change to RH? RH kernel or RH full on distro. Seems like there is somethings else going on there. Like other patches or modifications to kernel, or etc causing conflicts with the patches.> Neither of which were options I could use. In the end this left my > impression of the patches as rather lacking, especially with the > kernel panics.Not sure about that. In my experience, which was load balancing 2 SDSL lines for close to 2 years without any problems or head aches. Julian''s patches and work is very high quality. Now there have been lots of changes in the 2.6 kernel. Also seems the patches are very version specific with 2.6 kernels. You really have to look at the full kernel version not just major and minor levels.> Though who knows, maybe if they had been included in the kernel > there''d be more interest in testing and maintaining them with current > kernel sources.>From memory the reasoning for not including had nothing to do withissues like that. I believe it was more of a demand vs benefit thing. If everything everyone wanted or used went into the kernel it would be huge, slow, and etc. So unless there is a very large demand for things, allot will never be included. Very possible Julian''s patches and work falls into that category. Since in my experience, I have come across little to nobody who has done multipath stuff with the Linux kernel. Or multiple ISP''s on box etc. However it''s quite popular globally, and I would think anyone in the small to medium size business or network would be interested. Still shocked its still not more popular. However allot tend to look for off the shelf solutions they can write a check for :) The ones that work are $. The others are limited solutions. Granted the Linux kernel route is not an elegant one. Since it''s crude load balancing and failover and etc. -- Sincerely, William L. Thomson Jr. Obsidian-Studios, Inc. http://www.obsidian-studios.com
On Wed, 2006-03-15 at 16:33 -0500, Jody Shumaker wrote:> > First attempts were with gentoo 2.6.14 kernel sources, using the > routes-2.6.14-12.diff patch. I had the instability problems, which I > eventually confirmed to have to do with routing. Specifically, if I > wrote a script to run a series of "ip route get" commands for some > range of ip''s, it was guarenteed to cause a kernel panic. Same kernel > panic would occur if I just left it running. I also could never get > multipath routing to use anything but the the last nexthop route > specified. > > Figuring maybe the gentoo patchset was maybe conflicting indirectly(as > there was no errors applying the patch itself) with julian''s patch, i > instead tried the vanilla source package in gentoo. That gave me > identical results.Interesting, good to know since I will be there soon. Kernel panics are a rather sever issue.> I haven''t tried anything without his patches as in the end I just > dropped my older connection as the verizon FiOS connection proved > reliable.Sweet, can''t wait for FiOS to be offered in my market. I am sure that won''t happen for years to come if ever by Verizon.> Everything besides the multiple nexthops was working just > fine, my website could be accessed via either connection and > responses went out over the right link etc. Just the default gateway > selection was broken, and the eventual kernel panic.Hmm> I didn''t try to specifically, but he did eventually respond and only > asked a simple question which had no relevance.Wow, maybe he is onto other things these days or short of time. Julians was very very helpful when I was trying to get things working back in the day. Some of it got quite crazy, with route cache timing, and all kinds of things I was messing with before I got things working. Most all are in this lists archives. Some was off list.> Still i responded > with an answer, and pointed out the symptoms that I had previously > stated which ruled that out as a possible problem. > http://mailman.ds9a.nl/pipermail/lartc/2006q1/017946.htmlHey the question he asked in that is relevant. Because arp stuff is very much related to multipath, or multiple gateways. Since in my case I am having some arp issues due to what I believe are replies going out the wrong device the request came in on. To resolve have made some static arp entries for now. Maybe for ever not sure there. Granted the question is not to relevant to kernel panics :)> My response was not to the lartc list as it only repeated information > already in the thread, namely that if i reversed the order of nexthops > i could have the eth1 favored over ppp0 or the reverse. Shortly after > this the thread died.I assume you were flushing not only the route cache, but the arp cache as well? In between switching them. However with your weights, it should have been using the one much more often then than the other. Regardless of position or order. If you swapped them and did not fully flush everything out, it could explain some of the behavior you were seeing. Granted it does not fully explain why it would always use the second gateway and not the first. I would assume it had to do with some cache or etc. FYI, really in hind site with my past experience, and current trial and error wows at times. I am starting to think when you are messing with this stuff. It''s best to shut down all interfaces, flush out everything. Bringing everything back up from a clean, empty state. Then doing comparisons. Stuff get''s put into cache so fast. That even when you flush, by the time the next command has run. More than likely something has made it into cache.> I checked back and the exact comment was about fedora core 4: > http://mailman.ds9a.nl/pipermail/lartc/2006q1/018121.html > I also thought it could be conflicts with other patches, and hence why > I tried with a vanilla source kernel. Though I do admit i never tried > directly downloading a source tarball myself and using that, I was > always starting from a gentoo emerge.Yeah, gentoo vanilla sources should be untainted. So you definitely covered all bases there.> The patches stated 2.6.14-15. Unfortunately I don''t remember any > subrelease version #''s I was using under that, but i had even reviewed > all patches used for those and all of them were minor security updates > that I believe did not touch the networking code.Yes, just mentioning that because it is stating a range on Julian''s main patch page. http://www.ssi.bg/~ja/#routes-2.6 But pretty sure you were using the correct patches and correct version.> Yeah, I can see the reasons why its not included. It''s not quite > something that could really be done as a toggleable module as it seems > to require modifications all over the place from what I recall looking > at the patches.Yeah and most have no clue about multipath routing etc. Are totally happy with 1 ISP. Most broadband providers seem to have good uptime these days.> In the end, I''m still not sure why the patches would not work for me. > At this point I''m guessing it is entirely possible some of my kernel > config options conflicted with the changes.I was starting to be curious about that myself. Maybe try to make the kernel with no experimental stuff. Which might be impossible depending on what support you need in the kernel ;)> It''s also possible my > config for routes was invalid, but the Kernel panics lead me to think > otherwise, especially when noone had anything to say about my config > on the list.Regardless of configs or etc. You have two issues. One that the multipath gateways did not use both gateways/links. Two that you have kernel panics. Which I would be way more concerned with kernel panics then the routing issues ;)> Maybe someday I''ll have 2 connections again and i''ll actually feel up > to trying to follow the kernel code and debugging the problem myself.Well I might be there sooner than later. If all goes well, sometime this year I will get another T1 from a different provider, and have redundant lines here. I would have done that already as I did in CA using SDSL. But this area only has 1 SDSL provider, Covad, and everyone else re-sells their stuff. Or provides really low bandwidth SDSL lines. In the mean time I might have to apply the patches to resolve some of my arp issues. If I get kernel panics I will be upset, because I have not seen those in years. Since the last time I was messing with all this. But those were boot time panics. For the record it is possible, I have done it. And once done, it''s so great. I literally had no issues, worries, downtime, etc for over a year with it. It was so great, kept the machine around as an internal router. Only recently decommissioned it. Good old LRP install and etc. Way outdated and totally insecure. Now that my linux router is back connected to a wan. I had to update. Not using a ramdisk atm, and booting from and using a HD. Totally un cool. For now ;) -- Sincerely, William L. Thomson Jr. Obsidian-Studios, Inc. http://www.obsidian-studios.com
> > I didn''t try to specifically, but he did eventually respond and only > > asked a simple question which had no relevance. > > Wow, maybe he is onto other things these days or short of time. Julians > was very very helpful when I was trying to get things working back in > the day. Some of it got quite crazy, with route cache timing, and all > kinds of things I was messing with before I got things working. Most all > are in this lists archives. Some was off list. > > > Still i responded > > with an answer, and pointed out the symptoms that I had previously > > stated which ruled that out as a possible problem. > > http://mailman.ds9a.nl/pipermail/lartc/2006q1/017946.html > > Hey the question he asked in that is relevant. Because arp stuff is very > much related to multipath, or multiple gateways. Since in my case I am > having some arp issues due to what I believe are replies going out the > wrong device the request came in on. To resolve have made some static > arp entries for now. Maybe for ever not sure there. > > Granted the question is not to relevant to kernel panics :) >I just figured noone who was knowledgeable enough had time to help which is understandable. I figured he likely knew better than me, so I did answer the question that yes I was running a script that pinged the gateways each minute. However the symptoms he mentioned definately did not match the problem I was having since a simple swap of order of nexthops could cause me to get the exact opposite of what he stated.> I assume you were flushing not only the route cache, but the arp cache > as well? In between switching them. However with your weights, it should > have been using the one much more often then than the other. Regardless > of position or order. > > If you swapped them and did not fully flush everything out, it could > explain some of the behavior you were seeing. Granted it does not fully > explain why it would always use the second gateway and not the first. I > would assume it had to do with some cache or etc. > > FYI, really in hind site with my past experience, and current trial and > error wows at times. I am starting to think when you are messing with > this stuff. It''s best to shut down all interfaces, flush out everything. > Bringing everything back up from a clean, empty state. Then doing > comparisons. > > Stuff get''s put into cache so fast. That even when you flush, by the > time the next command has run. More than likely something has made it > into cache. >Hmm, that is something I was likely not doing as well as I could. I''m fairly sure my script was only flushing the route cache and not the ARP cache. I was however bringing the interface down and then back up in some of my tests. I tended to always try progressively more drastic steps, occasionally going to a full reboot.> Yeah and most have no clue about multipath routing etc. Are totally > happy with 1 ISP. Most broadband providers seem to have good uptime > these days. >Or as you stated, they used some prefabbed hardware solution. There were a few people with such hardware solutions that weren''t entirely happy with them, or wanted other things that linux has to offer like traffic shaping.> > In the end, I''m still not sure why the patches would not work for me. > > At this point I''m guessing it is entirely possible some of my kernel > > config options conflicted with the changes. > > I was starting to be curious about that myself. Maybe try to make the > kernel with no experimental stuff. Which might be impossible depending > on what support you need in the kernel ;) >I did review my kernel options, and i believe disabled as many experimental options as I could. Then I tried with enabling more options that i thought might possibly be needed despite not being listed anywhere as neccassary.> Regardless of configs or etc. You have two issues. One that the > multipath gateways did not use both gateways/links. Two that you have > kernel panics. Which I would be way more concerned with kernel panics > then the routing issues ;)Well the 2 issues were directly inter connected. If I only specified a single nexthop, the kernel panics would not occur.> Well I might be there sooner than later. If all goes well, sometime this > year I will get another T1 from a different provider, and have redundant > lines here. I would have done that already as I did in CA using SDSL. > But this area only has 1 SDSL provider, Covad, and everyone else > re-sells their stuff. Or provides really low bandwidth SDSL lines. > > In the mean time I might have to apply the patches to resolve some of my > arp issues. If I get kernel panics I will be upset, because I have not > seen those in years. Since the last time I was messing with all this. > But those were boot time panics.Should be perfectly safe, as I only saw the panics when I had the multiple nexthops. Also I believe I saw a common thread that others who experienced the problem were also using gentoo. However since I think its safe to rule out the kernel patches used, that leads me to believe it is the kernel config options, as tweaking those is something more likely to be done by a gentoo user. I wonder if i would of fared better with a genkernel setup.> For the record it is possible, I have done it. And once done, it''s so > great. I literally had no issues, worries, downtime, etc for over a year > with it. It was so great, kept the machine around as an internal router. > Only recently decommissioned it. Good old LRP install and etc. Way > outdated and totally insecure. Now that my linux router is back > connected to a wan. I had to update. Not using a ramdisk atm, and > booting from and using a HD. Totally un cool. For now ;)Numerous people did respond they had it working as I asked for their configs to see if i could maybe see through comparison what I was doing wrong. If I do this again, I think I''m going to try and scrounge up another pc to test it on and maybe dedicate to just the routing and traffic shaping. Then completely dropping one distro for another would be an option, along with any other drastic changes to the system.
William L. Thomson Jr. wrote:> From memory the reasoning for not including had nothing to do with > issues like that. I believe it was more of a demand vs benefit thing. If > everything everyone wanted or used went into the kernel it would be > huge, slow, and etc. > > So unless there is a very large demand for things, allot will never be > included. Very possible Julian''s patches and work falls into that > category. Since in my experience, I have come across little to nobody > who has done multipath stuff with the Linux kernel. Or multiple ISP''s on > box etc. However it''s quite popular globally, and I would think anyone > in the small to medium size business or network would be interested. > > Still shocked its still not more popular. However allot tend to look for > off the shelf solutions they can write a check for :) The ones that work > are $. The others are limited solutions. Granted the Linux kernel route > is not an elegant one. Since it''s crude load balancing and failover and > etc.Hi, Just wanted to mention that your discussion got me really interested. And I''m sure a lot of people would be interested in this feature with the latest linux kernels. I''m not a developer and don''t have enough knowledge in C to tinker with the kernel. But as system admin I would gladly help with testing and experimenting. Please keep the discussion and development, if any, on the lartc list. No reply to this mail required. Thanks, Alex