rwsrv05> dmesg | grep bge bge0: <Broadcom BCM5705 A3, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 miibus1: <MII bus> on bge0 bge0: Ethernet address: 00:0b:cd:e7:70:19 bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP This is happening, on average, once per day. It happens when the bge0 interface is under load. I cannot reproduce it at will. I posted here about a month ago when I was seeing this problem under SCHED_ULE. http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029079.ht ml Having been duly castigated for using SCHED_ULE, I reverted to SCHED_4BSD and kept quiet. The symptoms are back! (less frequently) under SCHED_4BSD - but the kernel now has lots of extras. In order to help with testing 6.2-PRERELEASE, I've been loading up drivers for bits of the hardware which I don't even use. That has brought to light a shared interrupt which may or may not have some relevance. I'm also now running SMP. I've also compiled in INVARIANTS on the understanding that it's supposed to provide helpful debugging information for this issue (but I don't know how to use it - and I haven't seen any extra clues). Hardware: hp ProLiant ML110 rwsrv05> vmstat -i interrupt total rate irq1: atkbd0 546 0 irq6: fdc0 9 0 irq14: ata0 156756 2 irq15: ata1 47 0 irq17: bge0+ 18518341 309 irq24: fxp0 78098 1 irq26: mpt0 851102 14 cpu0: timer 119569853 2000 cpu1: timer 119555276 1999 Total 258730028 4327 rwsrv05> dmesg | grep 'irq 17' bge0: <Broadcom BCM5705 A3, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 ichsmb0: <Intel 6300ESB (ICH) SMBus controller> port 0x1440-0x145f irq 17 at device 31.3 on pci0 rwsrv05> sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge kern.version: FreeBSD 6.2-PRERELEASE #0: Tue Oct 31 21:30:38 AEDT 2006 root@rwsrv05.mby.riverwillow.net.au:/spare/obj/usr/src/sys/RWSRV05 kern.sched.name: 4BSD kern.sched.quantum: 100000 kern.sched.ipiwakeup.enabled: 1 kern.sched.ipiwakeup.requested: 2 kern.sched.ipiwakeup.delivered: 2 kern.sched.ipiwakeup.usemask: 1 kern.sched.ipiwakeup.useloop: 0 kern.sched.ipiwakeup.onecpu: 0 kern.sched.ipiwakeup.htt2: 0 kern.sched.followon: 0 kern.sched.pfollowons: 0 kern.sched.kgfollowons: 0 kern.sched.preemption: 1 kern.sched.runq_fuzz: 1 kern.smp.maxcpus: 16 kern.smp.active: 1 kern.smp.disabled: 0 kern.smp.cpus: 2 kern.smp.forward_signal_enabled: 1 kern.smp.forward_roundrobin_enabled: 1 hw.machine: i386 hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz dev.bge.0.%desc: Broadcom BCM5705 A3, ASIC rev. 0x3003 dev.bge.0.%driver: bge dev.bge.0.%location: slot=4 function=0 dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c subdevice=0x1654 class=0x020000 dev.bge.0.%parent: pci4 rwsrv05> Here's what I've added to the kernel config since 4th October... rwsrv05> rcsdiff -u -r1.9 -r1.18 RWSRV05 | grep ^+ ==================================================================RCS file: RCS/RWSRV05,v retrieving revision 1.9 retrieving revision 1.18 diff -u -r1.9 -r1.18 +++ RWSRV05 2006/10/31 10:24:01 1.18 +# $Id: RWSRV05,v 1.18 2006/10/31 10:24:01 john Exp $ +options INVARIANT_SUPPORT +options INVARIANTS +options SMP # Symmetric MultiProcessor Kernel +#options SCHED_ULE # ULE scheduler +options SCHED_4BSD # 4BSD scheduler + +options NFSSERVER # Network File System server +options NFSCLIENT # Network File System client + +# USB support +device usb # General USB code (mandatory for USB) +device uhci # UHCI controller +device ehci # EHCI controller + +# SMB bus +device smbus # Bus support, required for smb below. +# ichsmb Intel ICH SMBus controller chips (82801AA, 82801AB, 82801BA) +device ichsmb +device smb + +# AGP GART support +device agp + +# Direct Rendering modules for 3D acceleration +device drm # DRM core module required by DRM drivers +device mach64drm # ATI Rage Pro, Rage Mobility P/M, Rage XL + +# ichwd: Intel ICH watchdog timer +device ichwd rwsrv05> I'm not actually using this extra stuff. I just thought it might be helpful (to FreeBSD) to find drivers for all my hardware to see if anything was broken. John Marshall.
Is it causing stuck connections or other messy problems? Also, is it any worse than 6.1? Scott John Marshall wrote:> rwsrv05> dmesg | grep bge > bge0: <Broadcom BCM5705 A3, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff > irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:70:19 > bge0: link state changed to UP > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > This is happening, on average, once per day. It happens when the bge0 > interface is under load. I cannot reproduce it at will. > > I posted here about a month ago when I was seeing this problem under > SCHED_ULE. > http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029079.ht > ml > Having been duly castigated for using SCHED_ULE, I reverted to > SCHED_4BSD and kept quiet. > > The symptoms are back! (less frequently) under SCHED_4BSD - but the > kernel now has lots of extras. > > In order to help with testing 6.2-PRERELEASE, I've been loading up > drivers for bits of the hardware which I don't even use. That has > brought to light a shared interrupt which may or may not have some > relevance. I'm also now running SMP. I've also compiled in INVARIANTS on > the understanding that it's supposed to provide helpful debugging > information for this issue (but I don't know how to use it - and I > haven't seen any extra clues). > > Hardware: hp ProLiant ML110 > > rwsrv05> vmstat -i > interrupt total rate > irq1: atkbd0 546 0 > irq6: fdc0 9 0 > irq14: ata0 156756 2 > irq15: ata1 47 0 > irq17: bge0+ 18518341 309 > irq24: fxp0 78098 1 > irq26: mpt0 851102 14 > cpu0: timer 119569853 2000 > cpu1: timer 119555276 1999 > Total 258730028 4327 > > rwsrv05> dmesg | grep 'irq 17' > bge0: <Broadcom BCM5705 A3, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff > irq 17 at device 4.0 on pci4 > ichsmb0: <Intel 6300ESB (ICH) SMBus controller> port 0x1440-0x145f irq > 17 at device 31.3 on pci0 > > rwsrv05> sysctl kern.version kern.sched kern.smp hw.machine hw.model > dev.bge > kern.version: FreeBSD 6.2-PRERELEASE #0: Tue Oct 31 21:30:38 AEDT 2006 > root@rwsrv05.mby.riverwillow.net.au:/spare/obj/usr/src/sys/RWSRV05 > > kern.sched.name: 4BSD > kern.sched.quantum: 100000 > kern.sched.ipiwakeup.enabled: 1 > kern.sched.ipiwakeup.requested: 2 > kern.sched.ipiwakeup.delivered: 2 > kern.sched.ipiwakeup.usemask: 1 > kern.sched.ipiwakeup.useloop: 0 > kern.sched.ipiwakeup.onecpu: 0 > kern.sched.ipiwakeup.htt2: 0 > kern.sched.followon: 0 > kern.sched.pfollowons: 0 > kern.sched.kgfollowons: 0 > kern.sched.preemption: 1 > kern.sched.runq_fuzz: 1 > kern.smp.maxcpus: 16 > kern.smp.active: 1 > kern.smp.disabled: 0 > kern.smp.cpus: 2 > kern.smp.forward_signal_enabled: 1 > kern.smp.forward_roundrobin_enabled: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705 A3, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > rwsrv05> > > Here's what I've added to the kernel config since 4th October... > > rwsrv05> rcsdiff -u -r1.9 -r1.18 RWSRV05 | grep ^+ > ==================================================================> RCS file: RCS/RWSRV05,v > retrieving revision 1.9 > retrieving revision 1.18 > diff -u -r1.9 -r1.18 > +++ RWSRV05 2006/10/31 10:24:01 1.18 > +# $Id: RWSRV05,v 1.18 2006/10/31 10:24:01 john Exp $ > +options INVARIANT_SUPPORT > +options INVARIANTS > +options SMP # Symmetric MultiProcessor > Kernel > +#options SCHED_ULE # ULE scheduler > +options SCHED_4BSD # 4BSD scheduler > + > +options NFSSERVER # Network File System server > +options NFSCLIENT # Network File System client > + > +# USB support > +device usb # General USB code (mandatory > for USB) > +device uhci # UHCI controller > +device ehci # EHCI controller > + > +# SMB bus > +device smbus # Bus support, required for smb below. > +# ichsmb Intel ICH SMBus controller chips (82801AA, 82801AB, > 82801BA) > +device ichsmb > +device smb > + > +# AGP GART support > +device agp > + > +# Direct Rendering modules for 3D acceleration > +device drm # DRM core module required by DRM > drivers > +device mach64drm # ATI Rage Pro, Rage Mobility P/M, Rage > XL > + > +# ichwd: Intel ICH watchdog timer > +device ichwd > rwsrv05> > > I'm not actually using this extra stuff. I just thought it might be > helpful (to FreeBSD) to find drivers for all my hardware to see if > anything was broken. > > John Marshall. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> Is it causing stuck connections or other messy problems? Also, is it > any worse than 6.1?I am also seeing the same thing - hard to tell if it is better or worse under 6.2 than 6.1 as I dont have 6.2 out on any production webservers so the only machine running it is a lot more lightly loaded (but does show less watchdog timeouts) -pcf.
"John Marshall" <John.Marshall@riverwillow.com.au> writes:> rwsrv05> dmesg | grep bge > bge0: <Broadcom BCM5705 A3, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff > irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:70:19 > bge0: link state changed to UP > bge0: watchdog timeout -- resettingI have a Tyan S2850 with the same (dual) LAN-chip; I increased BGE_TIMEOUT to 500000 (due to reboot problems on a good-old 3com 100Mbps-hub which occasionaly gave me : bge1: firmware handshake timed out bge1: RX CPU self-diagnostics failed! ) This box occasionaly freezes under heavy load; with the above change AND compiling in DEVICE_POLLING but not enabling it, I do not have any problem for the time being (though the freeze is very hard to reproduce). Arno
On 2 Nov 2006, at 23:50, John Marshall wrote:> bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UPI'm seeing similar behaviour on a HP DL360g4 running 6.1-RELEASE and a GENERIC kernel. It had similar problems with 5.4. I have 2 other similar machines running 6.1, which don't record these errors, however they never see any sustained throughput whereas this machine does. bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2100> mem 0xfde70000-0xfde7ffff irq 25 at device 2.0 on pci2 IRQ is not shared vmstat -i interrupt total rate irq1: atkbd0 1668 0 irq6: fdc0 87 0 irq14: ata0 46 0 irq16: uhci0 49865767 5 irq24: ciss0 6645080 0 irq25: bge0 336582162 34 irq26: bge1 313542372 32 irq48: mpt0 49865839 5 cpu0: timer 2055753320 210 Total 2812256341 287> Scott Long said> Is it causing stuck connections or other messy problems? Also, is it > any worse than 6.1?Fortunately all it seems to be doing is bothering my log files. I'm taking an interest as I have two important production machines using the bge driver both about to be upgraded from 5.3R to 6.1R. I've just grepped the logs on those machines and they both have a sprinkling of timeouts (5 over 18 months on one quite heavily trafficked webserver, the other which continously serves http downloads is clean apart from 1 24hour period when it was having 200GB of files uploaded to it). My concern is that upgrading to 6.1 or 6.2 and then enabling device polling is going to make the issue worse. I'm happy to supply more info. best. greg.