thr3ads.net - freebsd stable - nfe0 loses network connectivity (8.0-RELEASE-p2) [May 2010]

If this information is useful, please help other people find it:
Share via:

Olaf Seibert

2010-May-27 13:13 UTC

nfe0 loses network connectivity (8.0-RELEASE-p2)

I have a machine with FreeBSD 8.0-RELEASE-p2 which has a big ZFS file
system and serves as file server (NFS (newnfs)).
>From time to time however it seems to lose all network connectivity. Themachine isn't down; from the console (an IPMI console) it works fine.

I have tried things like bringing nfe0 down and up again, turning off
things like checksum offload, and none of them really seem to work
(although apparently sometimes by accident, a thing I try seems to help,
but a short time later connectivity is lost again). 

Carrier status and things like that seem all normal:

nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
       
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:30:48:xx:xx:xx
        inet 131.174.xx.xx netmask 0xffffff00 broadcast 131.174.xx.xxx
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

One time when I was doing an "ifconfig nfe0 up" I got the message
"initialization failed: no memory for rx buffers", so I am currently
thinking in the direction of mbuf starvation (with something requiring
too many mbufs to make any progress; I've seen such a thing with inodes
once).

Here is the output of netstat -m while the problem was going on:

25751/1774/27525 mbufs in use (current/cache/total)
24985/615/25600/25600 mbuf clusters in use (current/cache/total/max)
23254/532 mbuf+clusters out of packet secondary zone in use (current/cache)
0/95/95/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
56407K/2053K/58461K bytes allocated to network (current/cache/total)
0/2084/1031 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
10 requests for I/O initiated by sendfile
0 calls to protocol drain routines

while here are the figures a short time after a reboot (a reboot always
"fixes" the problem):

2133/2352/4485 mbufs in use (current/cache/total)
1353/2205/3558/25600 mbuf clusters in use (current/cache/total/max)
409/871 mbuf+clusters out of packet secondary zone in use (current/cache)
0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
3239K/5138K/8377K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Is there a way to increase the maximum number of mbufs, or better yet,
limit the use by whatever is using them too much?

Thanks in advance,
-Olaf.
--

Jeremy Chadwick

2010-May-27 15:48 UTC

head link

nfe0 loses network connectivity (8.0-RELEASE-p2)

On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert
wrote:> Is there a way to increase the maximum number of mbufs, or better yet,
> limit the use by whatever is using them too much?
Regarding your first question: I believe kern.ipc.nmbclusters controls
what you want.  This is a loader.conf tunable so you'll need to reboot.
Network buffer tuning is documented, Section 11.13.2.  Please read this
before adjusting the tunable.

http://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html

It would probably be more effective in the long run to find out why your
mbuf count is so high and determine if said situation is caused by a
problem with the NIC driver, or if there's something going on on your
machine that's causing it.

Regarding your 2nd question: not to my knowledge.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |

Pyun YongHyeon

2010-May-27 17:43 UTC

head link

nfe0 loses network connectivity (8.0-RELEASE-p2)

On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert
wrote:> I have a machine with FreeBSD 8.0-RELEASE-p2 which has a big ZFS file
> system and serves as file server (NFS (newnfs)).
> 
> >From time to time however it seems to lose all network connectivity.
The
> machine isn't down; from the console (an IPMI console) it works fine.
> 
> I have tried things like bringing nfe0 down and up again, turning off
> things like checksum offload, and none of them really seem to work
> (although apparently sometimes by accident, a thing I try seems to help,
> but a short time later connectivity is lost again). 
> 
> Carrier status and things like that seem all normal:
> 
> nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
1500
>        
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
>         ether 00:30:48:xx:xx:xx
>         inet 131.174.xx.xx netmask 0xffffff00 broadcast 131.174.xx.xxx
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
> 
> One time when I was doing an "ifconfig nfe0 up" I got the message
> "initialization failed: no memory for rx buffers", so I am
currently
> thinking in the direction of mbuf starvation (with something requiring
> too many mbufs to make any progress; I've seen such a thing with inodes
> once).
> 
> Here is the output of netstat -m while the problem was going on:
> 
> 25751/1774/27525 mbufs in use (current/cache/total)
> 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max)  ^^^^^^^^^^^^^^^^^^^^^
As Jeremy said, it seems you're hitting mbuf shortage situation. I
think nfe(4) is dropping received frames in that case. See how many
packets were dropped due to mbuf shortage from the output of
"netstat -ndI nfe0". You can also use "sysctl
dev.nfe.0.stats" to
see MAC statistics maintained in nfe(4) if your MCP controller
supports hardware MAC counters.
> 23254/532 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/95/95/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 56407K/2053K/58461K bytes allocated to network (current/cache/total)
> 0/2084/1031 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 10 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> while here are the figures a short time after a reboot (a reboot always
> "fixes" the problem):
> 
> 2133/2352/4485 mbufs in use (current/cache/total)
> 1353/2205/3558/25600 mbuf clusters in use (current/cache/total/max)
> 409/871 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/35/35/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 3239K/5138K/8377K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> Is there a way to increase the maximum number of mbufs, or better yet,
> limit the use by whatever is using them too much?
> 
You already hit the mbuf limit so nfe(4) might have started to drop
incoming frames.

Matthias Gamsjager

2010-Jun-01 12:13 UTC

head link

nfe0 loses network connectivity (8.0-RELEASE-p2)

I do have the same problem with my nic but I do seem to get no IP
address from time to time after reboot. Ifconfig nfe0 down follow by
dhclient nfe0 couple of times does usually the trick.

Will look at the mbuf counters next times it looses connection in the
middle of an afpd session tho.

Olaf Seibert

2010-Jun-07 14:06 UTC

head link

nfe0 loses network connectivity (8.0-RELEASE-p2)

On Thu 27 May 2010 at 10:42:11 -0700, Pyun YongHyeon
wrote:> On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert wrote:
> > Here is the output of netstat -m while the problem was going on:
> > 
> > 25751/1774/27525 mbufs in use (current/cache/total)
> > 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max)
>   ^^^^^^^^^^^^^^^^^^^^^
> As Jeremy said, it seems you're hitting mbuf shortage situation. I
> think nfe(4) is dropping received frames in that case. See how many
> packets were dropped due to mbuf shortage from the output of
> "netstat -ndI nfe0". You can also use "sysctl
dev.nfe.0.stats" to
> see MAC statistics maintained in nfe(4) if your MCP controller
> supports hardware MAC counters.
The sysctl command gives me (among other figures):

    dev.nfe.0.stats.rx.drops: 338180

so indeed frames seem to be dropped.

Jeremy Chadwick mentioned that one can tune kern.ipc.nmbclusters in
boot.conf, but apparently it is also changeable at runtime with sysctl.

Since the problem recurred today, I increased the value from 25600 to
32768, the maximum recommended value in the Handbook. (I can probably go
higher if needed; the box has 8 GB of RAM, although up to half of it is
eaten by ZFS)

I do get the impression there is a mbuf leak somehow. On a much older
file server (FreeBSD 6.1, serves a bit of NFS but has no ZFS) the mbuf
cluster useage is much lower, despite a longer uptime:

    256/634/890/25600 mbuf clusters in use (current/cache/total/max)

Also, it shows signs that measures are taken in case of mbuf shortage:

    2259806/466391/598621 requests for mbufs denied
(mbufs/clusters/mbuf+clusters)
    1016 calls to protocol drain routines

whereas the FreeBSD 8.0 machine has zero or very low numbers:

    0/3956/1959 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0 calls to protocol drain routines

and useage keeps growing:

    26122/1782/27904/32768 mbuf clusters in use (current/cache/total/max)

-Olaf.
--

freebsd stable - May 2010 - nfe0 loses network connectivity (8.0-RELEASE-p2)

nfe0 loses network connectivity (8.0-RELEASE-p2)

nfe0 loses network connectivity (8.0-RELEASE-p2)

nfe0 loses network connectivity (8.0-RELEASE-p2)

nfe0 loses network connectivity (8.0-RELEASE-p2)

nfe0 loses network connectivity (8.0-RELEASE-p2)