Hi, I realize this is a very old release but we have over a 1000 systems deployed so it is hard to upgrade. Anyway we are using gre/vpn/gif. If I stress test the system with netperf or nttcp I eventually run into the situation where netperf is in sbwait state and it stays there forever. I can't log into the machine with ssh but I can go in thru the console. If I try to ping localhost I get ping: sendto: No buffer space available. The only recourse to get things going again is reboot. If I do and netstat -m it shows I have mbufs available. The system is a Duron 1.6ghz with 256 mb of memory. What I am looking for is some direction on how to further diagnose the problem. this is before the problem $ netstat -m 2/736/131072 mbufs in use (current/peak/max): 2 mbufs allocated to data 0/672/32768 mbuf clusters in use (current/peak/max) 1528 Kbytes allocated to network (1% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines this is after the problem: 965/1376/131072 mbufs in use (current/peak/max): 965 mbufs allocated to data 872/876/32768 mbuf clusters in use (current/peak/max) 2096 Kbytes allocated to network (2% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Any ideas would be greatly appreciated. Steve
damir bikmuhametov wrote:>On Wed, Mar 29, 2006 at 10:55:30PM -0500, Stephen Clark wrote: > > >>>Try to increase net.inet.ip.intr_queue_maxlen and monitor >>>net.inet.ip.intr_queue_drops. >>> >>> >>Increasing net.inet.ip.intr_queue_maxlen to 400 seems to have fixed the >>problem. >> >> > >Could you please report this to the list after some testing? > >Thanks. > > >Do you know if the queue_maxlen is exceeded do the mbufs get lost? It sort of seems that is what happens. This is after a test where I was increment queue_maxlen til queue_drops were not increasing. $ sysctl net.inet.ip.intr_queue_drops net.inet.ip.intr_queue_drops: 27444 X10001:~ At this point the network after my test the network is pretty much quiet. But there are still mbufs allocated for data?? $ netstat -m 689/5504/131072 mbufs in use (current/peak/max): 689 mbufs allocated to data 265/4958/32768 mbuf clusters in use (current/peak/max) 11292 Kbytes allocated to network (11% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines After a reboot: X10001:~ $ sysctl net.inet.ip.intr_queue_drops net.inet.ip.intr_queue_drops: 0 X10001:~ $ netstat -m 2/416/131072 mbufs in use (current/peak/max): 2 mbufs allocated to data 0/44/32768 mbuf clusters in use (current/peak/max) 192 Kbytes allocated to network (0% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines This after a reboot and another test with queue_maxlen=400 Note there are only 3 mbufs allocated to data. sysctl net.inet.ip.intr_queue_drops net.inet.ip.intr_queue_drops: 0 X10001:~ $ netstat -m 3/416/131072 mbufs in use (current/peak/max): 3 mbufs allocated to data 0/44/32768 mbuf clusters in use (current/peak/max) 192 Kbytes allocated to network (0% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Is there some way to see what the current queue length is? Like the queue drops. What do you think? Regards, Steve
damir bikmuhametov wrote:>On Thu, Mar 30, 2006 at 10:13:57AM -0500, Stephen Clark wrote: > > >>Do you know if the queue_maxlen is exceeded do the mbufs get lost? It >>sort of seems that is what happens. >> >> > >I'm sorry I'm unfamiliar with FreeBSD internals. I just noticed that >when I'm using gre tunnels (say, for WCCP or with DVB recievers) at some >time (various) net.inet.ip.intr_queue_drops starts increasing very fast >and I'm losing network connectivity. I've played with queue_maxlen and >found that increased value helps against that issues. > >It seems that only gre suffers from the default (relatively small) queue >length. > > >FreeBSD splnet() equates directly to NetBSD's splsoftnet(); FreeBSD uses splimp() where (for networking) NetBSD would use splnet(). When you mentioned this only seems to happen on gre I looked at the ip_gre.c which comes from NetBSD, it uses splnet() where all the other FreeBSD stuff used splimp(). So I changed gre.c to use splnet() and now I am not getting any drops, even with queue_maxlen=50;