$ dmesg | grep bge bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 miibus1: <MII bus> on bge0 bge0: Ethernet address: 00:0b:cd:e7:51:ba bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP I initially pronounced the network cable dead and replaced it. Then I suspected the FastEthernet switch port and relocated to a different port. Watchdog timeouts persisted. I concluded that the bge hardware must be flaky until I read a recent thread on em device watchdog timeouts which led me to wonder about CPU scheduling. The server experiencing the bge timeouts was using SCHED_ULE. I built 6.2-PRERELEASE on a spare disk and booted the problem server from that disk - bge problem persisted. We have a second (identical) problem-free server configured with SCHED_4BSD. I reconfigured both machines so that the first machine (now 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) uses SCHED_ULE. Both machines are configured with PREEMPTION. +-----------------------------------------------+ | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | +-----------------------------------------------+ The machines are hp ProLiant ML110 servers. There is nothing sharing the interrupt with the bge device. No USB drivers are loaded. $ vmstat -i interrupt total rate irq1: atkbd0 70 0 irq6: fdc0 9 0 irq14: ata0 1234430 6 irq15: ata1 47 0 irq17: bge0 17543591 93 irq26: fxp0 70832 0 cpu0: timer 376381765 1999 Total 395230744 2099 $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 kern.sched.name: ule kern.sched.slice_min: 10 kern.sched.slice_max: 142 kern.sched.preemption: 1 kern.smp.maxcpus: 1 kern.smp.active: 0 kern.smp.disabled: 0 kern.smp.cpus: 1 hw.machine: i386 hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 dev.bge.0.%driver: bge dev.bge.0.%location: slot=4 function=0 dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c subdevice=0x1654 class=0x020000 dev.bge.0.%parent: pci4 Is there any other information I ought to post to help with diagnosis - or is this a known problem? (I've only subscribed recently) John Marshall.
John Marshall wrote:> $ dmesg | grep bge > bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > I initially pronounced the network cable dead and replaced it. Then I > suspected the FastEthernet switch port and relocated to a different > port. Watchdog timeouts persisted. I concluded that the bge hardware > must be flaky until I read a recent thread on em device watchdog > timeouts which led me to wonder about CPU scheduling. > > The server experiencing the bge timeouts was using SCHED_ULE. I built > 6.2-PRERELEASE on a spare disk and booted the problem server from that > disk - bge problem persisted. > > We have a second (identical) problem-free server configured with > SCHED_4BSD. I reconfigured both machines so that the first machine (now > 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) > uses SCHED_ULE. Both machines are configured with PREEMPTION. > > +-----------------------------------------------+ > | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | > +-----------------------------------------------+ > > The machines are hp ProLiant ML110 servers. > > There is nothing sharing the interrupt with the bge device. No USB > drivers are loaded. > > > $ vmstat -i > interrupt total rate > irq1: atkbd0 70 0 > irq6: fdc0 9 0 > irq14: ata0 1234430 6 > irq15: ata1 47 0 > irq17: bge0 17543591 93 > irq26: fxp0 70832 0 > cpu0: timer 376381765 1999 > Total 395230744 2099 > > > $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge > kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 > > kern.sched.name: ule > kern.sched.slice_min: 10 > kern.sched.slice_max: 142 > kern.sched.preemption: 1 > kern.smp.maxcpus: 1 > kern.smp.active: 0 > kern.smp.disabled: 0 > kern.smp.cpus: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > > Is there any other information I ought to post to help with diagnosis - > or is this a known problem? (I've only subscribed recently) > > John Marshall.Very interesting data point. I wonder if this accounts for some of the inconsistency in the reporting from others. In any case, SCHED_ULE is still considered to be highly experimental. Hopefully it will get some more attention in the near future to bring it closer to production quality. Scott
On Wed, Oct 04, 2006 at 02:34:16PM +1000, John Marshall wrote:> $ dmesg | grep bge > bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UPAs far as SCHED_ULE goes, if you have issues with it, use SCHED_4BSD. 4BSD is still the default, and definitely works. I've run into too many issues (in the past; maybe some have since been dealt with) with ULE, so I stick purely with 4BSD. Now, about watchdog timeouts in general -- there's a pending issue which is still under investigation. Please see this thread: http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028792.html Yes, it's long, but it does pertain to bge (despite the subject stating em). After you read it all, or most of it, you should probably partake in the convo there. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Hi all, I have few servers that have Intel and Broadcom (em&bge) giga NICs running FreeBSD RELENG_6 (from 6.1-R to 6.2-PRERELEASE). And (luckily) there are no such problems like watchdog timeouts. So may be something is different in our configurations, do you want my kernel confs or something else ? I have usb enabled on 3 servers (Serial HUBs, and usb dvd-burners connected), but the load on the servers rarely goes more then 2.x Do you want me to check something else? :) John Marshall wrote:> $ dmesg | grep bge > bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > I initially pronounced the network cable dead and replaced it. Then I > suspected the FastEthernet switch port and relocated to a different > port. Watchdog timeouts persisted. I concluded that the bge hardware > must be flaky until I read a recent thread on em device watchdog > timeouts which led me to wonder about CPU scheduling. > > The server experiencing the bge timeouts was using SCHED_ULE. I built > 6.2-PRERELEASE on a spare disk and booted the problem server from that > disk - bge problem persisted. > > We have a second (identical) problem-free server configured with > SCHED_4BSD. I reconfigured both machines so that the first machine (now > 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) > uses SCHED_ULE. Both machines are configured with PREEMPTION. > > +-----------------------------------------------+ > | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | > +-----------------------------------------------+ > > The machines are hp ProLiant ML110 servers. > > There is nothing sharing the interrupt with the bge device. No USB > drivers are loaded. > > > $ vmstat -i > interrupt total rate > irq1: atkbd0 70 0 > irq6: fdc0 9 0 > irq14: ata0 1234430 6 > irq15: ata1 47 0 > irq17: bge0 17543591 93 > irq26: fxp0 70832 0 > cpu0: timer 376381765 1999 > Total 395230744 2099 > > > $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge > kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 > > kern.sched.name: ule > kern.sched.slice_min: 10 > kern.sched.slice_max: 142 > kern.sched.preemption: 1 > kern.smp.maxcpus: 1 > kern.smp.active: 0 > kern.smp.disabled: 0 > kern.smp.cpus: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > > Is there any other information I ought to post to help with diagnosis - > or is this a known problem? (I've only subscribed recently) > > John Marshall. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >-- Best Wishes, Stefan Lambrev ICQ# 24134177