I have a machine with FreeBSD 8.0-RELEASE-p2 which has a big ZFS file system and serves as file server (NFS (newnfs)).>From time to time however it seems to lose all network connectivity. Themachine isn't down; from the console (an IPMI console) it works fine. I have tried things like bringing nfe0 down and up again, turning off things like checksum offload, and none of them really seem to work (although apparently sometimes by accident, a thing I try seems to help, but a short time later connectivity is lost again). Carrier status and things like that seem all normal: nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4> ether 00:30:48:xx:xx:xx inet 131.174.xx.xx netmask 0xffffff00 broadcast 131.174.xx.xxx media: Ethernet autoselect (1000baseT <full-duplex>) status: active One time when I was doing an "ifconfig nfe0 up" I got the message "initialization failed: no memory for rx buffers", so I am currently thinking in the direction of mbuf starvation (with something requiring too many mbufs to make any progress; I've seen such a thing with inodes once). Here is the output of netstat -m while the problem was going on: 25751/1774/27525 mbufs in use (current/cache/total) 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max) 23254/532 mbuf+clusters out of packet secondary zone in use (current/cache) 0/95/95/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 56407K/2053K/58461K bytes allocated to network (current/cache/total) 0/2084/1031 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 10 requests for I/O initiated by sendfile 0 calls to protocol drain routines while here are the figures a short time after a reboot (a reboot always "fixes" the problem): 2133/2352/4485 mbufs in use (current/cache/total) 1353/2205/3558/25600 mbuf clusters in use (current/cache/total/max) 409/871 mbuf+clusters out of packet secondary zone in use (current/cache) 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 3239K/5138K/8377K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Is there a way to increase the maximum number of mbufs, or better yet, limit the use by whatever is using them too much? Thanks in advance, -Olaf. --
On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert wrote:> Is there a way to increase the maximum number of mbufs, or better yet, > limit the use by whatever is using them too much?Regarding your first question: I believe kern.ipc.nmbclusters controls what you want. This is a loader.conf tunable so you'll need to reboot. Network buffer tuning is documented, Section 11.13.2. Please read this before adjusting the tunable. http://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html It would probably be more effective in the long run to find out why your mbuf count is so high and determine if said situation is caused by a problem with the NIC driver, or if there's something going on on your machine that's causing it. Regarding your 2nd question: not to my knowledge. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert wrote:> I have a machine with FreeBSD 8.0-RELEASE-p2 which has a big ZFS file > system and serves as file server (NFS (newnfs)). > > >From time to time however it seems to lose all network connectivity. The > machine isn't down; from the console (an IPMI console) it works fine. > > I have tried things like bringing nfe0 down and up again, turning off > things like checksum offload, and none of them really seem to work > (although apparently sometimes by accident, a thing I try seems to help, > but a short time later connectivity is lost again). > > Carrier status and things like that seem all normal: > > nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4> > ether 00:30:48:xx:xx:xx > inet 131.174.xx.xx netmask 0xffffff00 broadcast 131.174.xx.xxx > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > > One time when I was doing an "ifconfig nfe0 up" I got the message > "initialization failed: no memory for rx buffers", so I am currently > thinking in the direction of mbuf starvation (with something requiring > too many mbufs to make any progress; I've seen such a thing with inodes > once). > > Here is the output of netstat -m while the problem was going on: > > 25751/1774/27525 mbufs in use (current/cache/total) > 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max)^^^^^^^^^^^^^^^^^^^^^ As Jeremy said, it seems you're hitting mbuf shortage situation. I think nfe(4) is dropping received frames in that case. See how many packets were dropped due to mbuf shortage from the output of "netstat -ndI nfe0". You can also use "sysctl dev.nfe.0.stats" to see MAC statistics maintained in nfe(4) if your MCP controller supports hardware MAC counters.> 23254/532 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/95/95/12800 4k (page size) jumbo clusters in use (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 56407K/2053K/58461K bytes allocated to network (current/cache/total) > 0/2084/1031 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 10 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > while here are the figures a short time after a reboot (a reboot always > "fixes" the problem): > > 2133/2352/4485 mbufs in use (current/cache/total) > 1353/2205/3558/25600 mbuf clusters in use (current/cache/total/max) > 409/871 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 3239K/5138K/8377K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > Is there a way to increase the maximum number of mbufs, or better yet, > limit the use by whatever is using them too much? >You already hit the mbuf limit so nfe(4) might have started to drop incoming frames.
I do have the same problem with my nic but I do seem to get no IP address from time to time after reboot. Ifconfig nfe0 down follow by dhclient nfe0 couple of times does usually the trick. Will look at the mbuf counters next times it looses connection in the middle of an afpd session tho.
On Thu 27 May 2010 at 10:42:11 -0700, Pyun YongHyeon wrote:> On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert wrote: > > Here is the output of netstat -m while the problem was going on: > > > > 25751/1774/27525 mbufs in use (current/cache/total) > > 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max) > ^^^^^^^^^^^^^^^^^^^^^ > As Jeremy said, it seems you're hitting mbuf shortage situation. I > think nfe(4) is dropping received frames in that case. See how many > packets were dropped due to mbuf shortage from the output of > "netstat -ndI nfe0". You can also use "sysctl dev.nfe.0.stats" to > see MAC statistics maintained in nfe(4) if your MCP controller > supports hardware MAC counters.The sysctl command gives me (among other figures): dev.nfe.0.stats.rx.drops: 338180 so indeed frames seem to be dropped. Jeremy Chadwick mentioned that one can tune kern.ipc.nmbclusters in boot.conf, but apparently it is also changeable at runtime with sysctl. Since the problem recurred today, I increased the value from 25600 to 32768, the maximum recommended value in the Handbook. (I can probably go higher if needed; the box has 8 GB of RAM, although up to half of it is eaten by ZFS) I do get the impression there is a mbuf leak somehow. On a much older file server (FreeBSD 6.1, serves a bit of NFS but has no ZFS) the mbuf cluster useage is much lower, despite a longer uptime: 256/634/890/25600 mbuf clusters in use (current/cache/total/max) Also, it shows signs that measures are taken in case of mbuf shortage: 2259806/466391/598621 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 1016 calls to protocol drain routines whereas the FreeBSD 8.0 machine has zero or very low numbers: 0/3956/1959 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0 calls to protocol drain routines and useage keeps growing: 26122/1782/27904/32768 mbuf clusters in use (current/cache/total/max) -Olaf. --