Hello, I've updated the PR on this via bug track email (hopefully, it bounced my first email) , but I thought I should bring it to the attention of the list as it's still happening, and the original PR was from March 2012. The PR is here: http://www.freebsd.org/cgi/query-pr.cgi?pr=165903&cat I am experiencing the same mbuf leak on fresh 9.1-RELEASE machines (AMD64). Most of my machines are ESXi 5.1 VM's running the e1000 (em0) NIC. This VM is stock, just one freebsd-update done, nothing custom. I have also experienced this condition on an older 9.0-STABLE from Jul 1st 2012. I did not notice it much before that date, but I can't tell for sure. I have a few machines on that build that I still use, so confirmation was easy. I do not experience the error if I load up vmware tools and use the vmx3f0 adapter, it's just with em0. I have set the mbufs to very high numbers (322144) to buy more time between lockups/crashes. Most often the systems stay functional, they just need a reboot or more mbufs if I add them. Some times they lock up or crash as I ifconfig down/up the adapter or attempt to add more mbufs via sysctl. 1) Is anyone else able to reproduce this problem? The PR is still open, which says to me not all of us can be having this problem or there would be more drive to fix. 2) What do I need to help with to advance this problem? It's not just my systems, as evidenced by the original poster of the PR. Thanks.
On Wed, Apr 10, 2013 at 07:39:31PM +0000, Chris Forgeron wrote:> I've updated the PR on this via bug track email (hopefully, it bounced my first email) , but I thought I should bring it to the attention of the list as it's still happening, and the original PR was from March 2012. > > The PR is here: http://www.freebsd.org/cgi/query-pr.cgi?pr=165903&cat> > I am experiencing the same mbuf leak on fresh 9.1-RELEASE machines (AMD64). Most of my machines are ESXi 5.1 VM's running the e1000 (em0) NIC. This VM is stock, just one freebsd-update done, nothing custom. > > I have also experienced this condition on an older 9.0-STABLE from Jul 1st 2012. I did not notice it much before that date, but I can't tell for sure. I have a few machines on that build that I still use, so confirmation was easy. > > I do not experience the error if I load up vmware tools and use the vmx3f0 adapter, it's just with em0. > > I have set the mbufs to very high numbers (322144) to buy more time between lockups/crashes. Most often the systems stay functional, they just need a reboot or more mbufs if I add them. Some times they lock up or crash as I ifconfig down/up the adapter or attempt to add more mbufs via sysctl. > > 1) Is anyone else able to reproduce this problem? The PR is still open, which says to me not all of us can be having this problem or there would be more drive to fix. > 2) What do I need to help with to advance this problem? It's not just my systems, as evidenced by the original poster of the PR.1. This PR does not contain output from "dmesg" nor "pciconf -lvbc", nor does your Email. Output from this matters. 2. Please try 9.1-STABLE and see if there is an improvement; there have been a huge number of changes/fixes to em(4) between 9.1-RELEASE and now. You can try this: https://pub.allbsd.org/FreeBSD-snapshots/amd64-amd64/9.1-RELENG_9-r249290-JPSNAP/ -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Hi. On 11.04.2013 01:39, Chris Forgeron wrote:> I do not experience the error if I load up vmware tools and use the vmx3f0 adapter, it's just with em0. > > I have set the mbufs to very high numbers (322144) to buy more time between lockups/crashes. Most often the systems stay functional, they just need a reboot or more mbufs if I add them. Some times they lock up or crash as I ifconfig down/up the adapter or attempt to add more mbufs via sysctl. > > 1) Is anyone else able to reproduce this problem? The PR is still open, which says to me not all of us can be having this problem or there would be more drive to fix. > 2) What do I need to help with to advance this problem? It's not just my systems, as evidenced by the original poster of the PR. >(I'm the author of the PR). I was experiencing this on 9.0 'till some -STABLE, after that the leak was gone on the exactly same configuration. This server is equipped with bce(4) interfaces only, so I don't see any connection with interface driver. I think it's more configuration related. I created this pr in order to investigate why one of my 9.x servers hangs periodically. Since that I tried lots of 9-STABLE snapshots, none of them fixed my problem. Last month I decided to switch to 10.x. The uptime is 37 days so far, none of my 9.x snapshots was able to stand that long. Even if this machine will crash while I write this - this definitely means that 10.x right now is at least equally stable as 9.x is, and can run as smoothly as 9.x does. My advice - use 10.0-CURRENT, 9.0 and all of it's descendant versions are broken beyond repair, imo. Switching to 10.x was a hard decision for me too, I was too scared by the '-CURRENT' karma. Seems like it's not that creepy. Eugene.
On Tue, Apr 16, 2013 at 04:43:49PM +0000, Chris Forgeron wrote:> Thanks, I've applied it, and am rebuilding now. I should know tonight/tomorrow > > I take it this proves that I don't have the latest source with cvsup, or is this work in progress? > > Thanks again.The patch Gleb provided is not committed anywhere (not even HEAD/current) -- it's a patch for you to test. :-) The sources you have via csup/cvsup seem to be recent enough (all I can go off of is your legacy em(4) driver version being 1.0.5). -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Wed, Apr 17, 2013 at 05:38:12PM +0000, Chris Forgeron wrote:> Hello, > > I'm happy to report that the patch from Gleb has fixed the problem. > > My system had 256 mbuf clusters in use at boot, and after a day, still only has 256 mbuf clusters in use. > > From the patch, I see we are now dropping these packets (?) - Was the issue that the packets were being queued up for further work, but nothing was being done with them?Not exactly. Please open up the source file and follow along. At line 538, a call to mtod() is performed, which is what allocates the memory for the mbuf used for the ARP header. Now go to lines 543 and 549. These are error checks for certain kinds of ARP headers which are either malformed (line 543) or should not be honoured (line 549). When these error checks proved true, the code simply did "return" to get out of the function it was in (in_arpinput()), but never issued m_freem() to free the previously-allocated mbuf, hence leaking mbufs. The patch changes the "return" into "goto drop". The drop label is at line 873, which is where you'll find the m_freem(), followed immediately by the function returning. -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |