I'm running 8.2-RELEASE-p4 i386 on some web servers that are generally lightly-moderately loaded, but occasionally see some heavy spikes where load average goes way up. When that is happening, but sometimes even when it's not, I get hundreds of this message spewing into the logs: kernel: negative sbsize for uid = 0 I haven't found anything particularly useful by searching for that message, the one reference was to mbufs, but that seems not to be the problem. Here is the output of 'netstat -m' during one of the load spikes: 598/1712/2310 mbufs in use (current/cache/total) 559/1533/2092/32768 mbuf clusters in use (current/cache/total/max) 559/1105 mbuf+clusters out of packet secondary zone in use (current/cache) 0/528/528/16384 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/8192 9k jumbo clusters in use (current/cache/total/max) 0/0/0/4096 16k jumbo clusters in use (current/cache/total/max) 1267K/5606K/6873K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/2239/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 809790 requests for I/O initiated by sendfile 0 calls to protocol drain routines So is this message something to worry about? If so, how can I diagnose what's happening, and how do I fix it? Doug
In the last episode (Dec 13), Doug Barton said:> I'm running 8.2-RELEASE-p4 i386 on some web servers that are generally > lightly-moderately loaded, but occasionally see some heavy spikes where > load average goes way up. When that is happening, but sometimes even when > it's not, I get hundreds of this message spewing into the logs: > > kernel: negative sbsize for uid = 0 > > I haven't found anything particularly useful by searching for that > message, the one reference was to mbufs, but that seems not to be the > problem. Here is the output of 'netstat -m' during one of the load > spikes:[...]> So is this message something to worry about? If so, how can I diagnose > what's happening, and how do I fix it?I've seen it ocassionally too. The error message is printed in /sys/kern/kern_resource.c when the ui_sbsize resource counter goes negative. There's probably insufficient locking somewhere in the functions that call chgsbsize. The increment/decrement is done atomically, but the data pointed to by the "hiwat" argument is read then updated later without an explicit lock, so if that value changes while the function is executing, it could cause problems. ui_sbsize is only used by the resource limiting code, though, so unless you're enforcing an sbsize rlimit, it should be harmless. -- Dan Nelson dnelson@allantgroup.com