Hello List, I know 4.9 is ancient history, but unfortunately we have several thousand sites installed. We are in the process of moving to 6.1 when it is released. Right now I have an immediate problem where we are going to install two system at a HQ site. Each of the 2 systems will have two gre/vpn/ospf tunnels to a 100 remote sites in the field. The broadband will be a T3 with failover to dialup actiontec dualpc modems. We want to use FreeBSD systems rather than put in Cisco equip which is what we have done for other large customers. The problem: I have been testing between an Athlon 64 3000+ (client) and an Athlon 64 X2 4800+ (server) across a dedicated 100mb lan. When I use nttcp, which is a round trip tcp test, across the gre/vpn the client system, (which goes to 0 percent idle), network stack will eventually stop responding. In trying to track this down I find that net.inet.ip.intr_queue_maxlen which is normally 50 has been reached (I added a sysctl to be able to look at it), but it never drains down. If I increase it things start working again. If I continue to hammer the client I see the intr_queue_maxlen continue to grow until it again reaches the new maximum. Another datapoint if I don't send the data thru the gre tunnel, but only thru the vpn I don't see this problem. I've looked at the gre code til I am blue in the face and can't see where mbufs were not being freed when the quelen is full. If anybody could give some direction as where to look or how to better trouble shoot this problem it would be greatly appreciated. Thanks for being such a great list, Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson)
On Tuesday 18 April 2006 16:50, Stephen Clark wrote:> Hello List, > > I know 4.9 is ancient history, but unfortunately we have several > thousand sites installed. We are in the process of moving to 6.1 when it > is released. > > Right now I have an immediate problem where we are going to install two > systemso these are new systems, yet you are going to use 4.9. Why?> at a HQ site. Each of the 2 systems will have two gre/vpn/ospf tunnels to a > 100 remote sites in the > field. The broadband will be a T3 with failover to dialup actiontec > dualpc modems. We want > to use FreeBSD systems rather than put in Cisco equip which is what we > have done for other > large customers. > > The problem: > > I have been testing between an Athlon 64 3000+ (client) and an Athlon > 64 X2 4800+ (server) across a dedicated 100mb lan. When I use nttcp, > which is a round trip tcp test, across the gre/vpn the client system, > (which goes to 0 percent idle), network stack will eventually stop > responding. In trying to track this down I find that > net.inet.ip.intr_queue_maxlen which is normally 50 has been reached (I > added a sysctl to be able to look at it), but it never drains down. If I > increase it things start working again. If I continue to hammer the > client I see the > intr_queue_maxlen continue to grow until it again reaches the new > maximum. Another datapoint if I don't send the data thru the gre tunnel, > but only thru the vpn I don't see this problem. > > I've looked at the gre code til I am blue in the face and can't see > where mbufs were not being freed when the quelen is full. > > If anybody could give some direction as where to look or how to better > trouble shoot this problem it would be greatly appreciated. > > Thanks for being such a great list, > Steve
Kurt Jaeger wrote:>Hi! > > > >>I've looked at the gre code til I am blue in the face and can't see >>where mbufs were not being freed when the quelen is full. >> >>If anybody could give some direction as where to look or how to better >>trouble shoot this problem it would be greatly appreciated. >> >> > >Is it possible to upgrade to 4.11p16 ? > > >The problem with that is I would have to upgrade the "world" - I had tried putting over a 4.11 kernel and netstat -m wouldn't work, plus who knows what else. The recommendation I read was to make buildworld to go with the appropriate kernel and I am concerned if the plethora of userland stuff, (postgres,php,apache,snort,etc,etc - about 50 ports) would have to be recompiled and reinstalled - this would mean we would have to go thru major regression testing and of course we need it now. The 4.11 kernel that I tried which I had cvsuped on 3-24 to 4.11-STABLE still showed the problem. Is this problem fixed in 4.11p16? Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson)
Stephen Clark wrote:>Hello List, > >I know 4.9 is ancient history, but unfortunately we have several >thousand sites installed. We are in the process of moving to 6.1 when it >is released. > >Right now I have an immediate problem where we are going to install two >system at a >HQ site. Each of the 2 systems will have two gre/vpn/ospf tunnels to a >100 remote sites in the >field. The broadband will be a T3 with failover to dialup actiontec >dualpc modems. We want >to use FreeBSD systems rather than put in Cisco equip which is what we >have done for other >large customers. > >The problem: > >I have been testing between an Athlon 64 3000+ (client) and an Athlon >64 X2 4800+ (server) across a dedicated 100mb lan. When I use nttcp, >which is a round trip tcp test, across the gre/vpn the client system, >(which goes to 0 percent idle), network stack will eventually stop >responding. In trying to track this down I find that >net.inet.ip.intr_queue_maxlen which is normally 50 has been reached (I >added a sysctl to be able to look at it), but it never drains down. If I >increase it things start working again. If I continue to hammer the >client I see the >intr_queue_maxlen continue to grow until it again reaches the new >maximum. Another datapoint if I don't send the data thru the gre tunnel, >but only thru the vpn I don't see this problem. > >I've looked at the gre code til I am blue in the face and can't see >where mbufs were not being freed when the quelen is full. > >If anybody could give some direction as where to look or how to better >trouble shoot this problem it would be greatly appreciated. > >Thanks for being such a great list, >Steve > > >I have discovered that if I disable quaqqa/ospfd then I don't lose mbufs! This makes it appear that the mbuf leak is in the multicast routing logic. In fact I lose mbufs even with the both system basically idle but with a 100 vpn/gre with multicast going on thru the gre then the vpn. Any ideas on where to focus my continued investigation? Thanks to everybody who has responded. Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson)