Alan Jay
2005-Mar-07 15:03 UTC
UPDATE 5.3-STABLE was Re: Possible problems with Broadcom BCM5704C 10/100/1000 on TyanThunder K8S pro S2882 twin Operteron
Hi, Well after upgrading to the latest -STABLE via cvsup and makeworld makekernel etc we have been doing some more tests over the weekend. One of our databases ran fine all weekend so we took the plunge on Sunday to try our big heavily accessed database. It ran fine until 7.45 Monday morning - when I checked at 7.30am it was using around 6 of the 8Gb of RAM the server then logged: Mar 7 07:42:47 flappy kernel: bge1: discard frame w/o leading ethernet header (len 4294967292 pkt len 4294967292) Followed by: Mar 7 07:42:47 flappy kernel: Fatal trap 12: pag Mar 7 07:42:47 flappy kernel: e f Mar 7 07:42:47 flappy kernel: ault Mar 7 07:42:47 flappy kernel: wh Mar 7 07:42:47 flappy kernel: ile in Mar 7 07:42:47 flappy kernel: k Mar 7 07:42:47 flappy kernel: er Mar 7 07:42:47 flappy kernel: ne Mar 7 07:42:47 flappy kernel: l mode Mar 7 07:42:47 flappy kernel: Mar 7 07:42:47 flappy kernel: cp Mar 7 07:42:47 flappy kernel: ui Mar 7 07:42:47 flappy kernel: d Mar 7 07:42:47 flappy kernel: Mar 7 07:42:47 flappy kernel: 1; Mar 7 07:42:47 flappy kernel: a Mar 7 07:42:47 flappy kernel: pi Mar 7 07:42:47 flappy kernel: c Mar 7 07:42:47 flappy kernel: i Mar 7 07:42:47 flappy kernel: d Mar 7 07:42:47 flappy kernel: Mar 7 07:42:47 flappy kernel: 01 Mar 7 07:42:47 flappy kernel: Mar 7 07:42:47 flappy kernel: fa Mar 7 07:42:47 flappy kernel: ul Mar 7 07:42:47 flappy kernel: t Mar 7 07:42:47 flappy kernel: vi Subsequently to that it has crashed a number of times and on a couple of occasions has reported: kernel: fxp0: can't map mbuf (error 12) To my uninitiated eye it looks like this might have something to do with the Network Performance Project which seems to be tinkering in this area but I would appreciate any thoughts anyone might have regarding this. By the way over the weekend the latest -STABLE which is marked 5.4-PRERELEASE 2 seemed much better than 5.3 had and the initial problems took much longer to appear. Though once the problems started to appear, they repeated themselves rebooting every 1-2hrs until we removed the tests data. Thanks for the guidance, ALan
Doug White
2005-Mar-10 03:22 UTC
UPDATE 5.3-STABLE was Re: Possible problems with BroadcomBCM5704C 10/100/1000 on TyanThunder K8S pro S2882 twin Operteron
On Mon, 7 Mar 2005, Alan Jay wrote:> Well after upgrading to the latest -STABLE via cvsup and makeworld makekernel > etc we have been doing some more tests over the weekend.When did you run this cvsup?> One of our databases ran fine all weekend so we took the plunge on Sunday to > try our big heavily accessed database. > > It ran fine until 7.45 Monday morning - when I checked at 7.30am it was using > around 6 of the 8Gb of RAM the server then logged: > > Mar 7 07:42:47 flappy kernel: bge1: discard frame w/o leading ethernet header > (len 4294967292 pkt len 4294967292)Hm, unsigned -1. That message is printed by ether_input() if it get handed a bum mbuf.> Followed by: > > Mar 7 07:42:47 flappy kernel: Fatal trap 12: pagUnfortunately this is not useful. We need the entire panic messsage and ideally a backtrace and crashdump. Can you connect a serial console to this system and log the output?> Subsequently to that it has crashed a number of times and on a couple of > occasions has reported: > > kernel: fxp0: can't map mbuf (error 12)Error 12 is ENOMEM and thats coming from bus_dmamap_load_mbuf(). That can be returned if you're running out of space for bounce buffers, or kmem in general. scottl has been working on busdma issues in HEAD and recently committed a fix for i386 for bounce page allocation issues. kmem depletion would be more insidious. Have you been getting other message that indicates failure to allocate memory or error 12?> By the way over the weekend the latest -STABLE which is marked 5.4-PRERELEASE > 2 seemed much better than 5.3 had and the initial problems took much longer to > appear. Though once the problems started to appear, they repeated themselves > rebooting every 1-2hrs until we removed the tests data.That behavior sounds a lot like thermal issues. It takes a while to warm up to the critcal point and once it hits that point it really starts to malfunction. Unless the test run starts out slow or something. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org