I've been running 7.X on a Tyan S4881 (4 dual-core Opteron CPUs) since nearly the beginning of the 7.X cycle, and have just started to see watchdog timeouts on the Broadcom bge0 GigE port. This occurs with a kernel and world compiled on 11/22, and also with a kernel compiled on 11/11 with the 11/22 world. The errors occur when copying a large number (200-300GB) of files over a GigE network to a WD USB drive attached to a PC running XP. There are no Ethernet errors other than those caused by the timeouts. The system uses an nVidia PCI-Express video board and an LSA 300-8X SATA card which have not given any problems. I have not been able, however, to add a SCSI card and I've tried every Adaptec and LSI PCI, PCI-X, and PCI-Express card available to me. The problem appears to be a mismatch of the interrupt expected by the card and the interrupt provided by the board. I haven't found a solution for that problem (yet). The error is the usual watchdog timeout: Nov 27 15:34:11 superxeon kernel: bge0: watchdog timeout -- resetting Nov 27 15:34:11 superxeon kernel: bge0: link DOWN Nov 27 15:34:11 superxeon kernel: bge0: link state changed to DOWN Nov 27 15:34:15 superxeon kernel: bge0: link state changed to UP I intend to switch over to an Intel Pro1000 card since I saw the same problem some timea ago on another box and switching GigE hardware solved the problem. uname -a: FreeBSD superxeon.familysquires.net 7.4-PRERELEASE FreeBSD 7.4-PRERELEASE #12: Mon Nov 22 15:45:36 EST 2010 root@superxeon.familysquires.net:/usr/obj/usr/src/sys/OPTERON8 amd64 dmesg output for bge0/1: bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002003> mem 0 xd0110000-0xd011ffff,0xd0100000-0xd010ffff irq 26 at device 2.0 on pci17 bge0: Reserved 0x10000 bytes for rid 0x10 type 3 at 0xd0110000 bge0: CHIP ID 0x00002003; ASIC REV 0x02; CHIP REV 0x20; PCI-X miibus0: <MII bus> on bge0 bge0: bpf attached bge0: Ethernet address: 00:e0:81:58:2d:e3 bge0: [MPSAFE] bge0: [ITHREAD] bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002003> mem 0 xd0130000-0xd013ffff,0xd0120000-0xd012ffff irq 27 at device 2.1 on pci17 bge1: Reserved 0x10000 bytes for rid 0x10 type 3 at 0xd0130000 bge1: CHIP ID 0x00002003; ASIC REV 0x02; CHIP REV 0x20; PCI-X miibus1: <MII bus> on bge1 bge1: bpf attached bge1: Ethernet address: 00:e0:81:58:2d:e4 bge1: [MPSAFE] bge1: [ITHREAD] bge0: link state changed to UP Mike Squires mikes@siralan.org
At this moment I hiring a FreeBSD server running FreeBSD 7.2. After running cvsup, updating sources and ports, compiling the complete system, installing kernel and the "new world", rebooting the system gives the old sytem and not the freshly compiled FreeBSD 7.4-PRERELASE. What is wrong? Where to look for? Thanks Jack Raats
----- Original Message ----- From: "Jack Raats" <jack@jarasoft.net> Subject: Old system keeps coming back> At this moment I hiring a FreeBSD server running FreeBSD 7.2. > After running cvsup, updating sources and ports, compiling the complete > system, installing kernel and the "new world", rebooting the system gives > the old sytem and not the freshly > compiled FreeBSD 7.4-PRERELASE.It seems that dmesg gives all system info. At he end it gives the FreeBSD 7.4-PRELEASE message. What the do so that dmesg gives the latest info? /etc/motd also is not updated. How to solve this? Thanks Jack Raats
Problem is watchdog timeouts with a Broadcom GigE interface on a Tyan S4881 using 7.4-PRERELEASE as of 11/22 and 7.3-STABLE as of 11/11. I've done the following, with no success: (1) Tried the second port, bge1, in case the first had gone bad, and (2) Recompiled samba34 (failure occurs when copying large files using samba) I'm currently cvup'ing Release 7.3 and will see if compiling and installing that version eliminates the problem. The ultimate solution may involve replacing the Tyan S4881 with a S4882 so I can install other PCI-X cards (I'm unable to get the S4881 to assign interrupts to PCI-X or PCI-E cards, a problem I've never seen before) Mike Squires mikes@siralan.org
The Tyan S4881 works perfectly with 7.3-RELEASE-p3; I'll be reinstalling 7.4 and providing the requested diagnostics once I have a backup made. (Broadcom bge watchdog timeouts under moderate (25% max of GigE) load) Mike Squires mikes@siralan.org
Michael L. Squires
2010-Dec-30 23:34 UTC
bge driver regression in 7.4-PRERELEASE, Tyan S4881
I'm having watchdod timeout problems with the bge driver in 7.4-PRERELEASE on a Tyan S4881 motherboard (the S4881 has 4 Opteron sockets, 2 PCI-E slots, and 3 PCI-X slots, plus VGA and dual Broadcom "bge" GigE ports.) I don't know if this problem also affects the similar Tyan S4882 motherboard, which appears to be much more common. I have a S4882 and can swap it for the S4881 and test this, if anyone is interested (there are a lot things to unplug and unscrew, so I'd rather not if there's no reason for it). I've been able to isolate the problem to the following patches Edit src/sys/dev/bge/if_bge.c Add delta 1.198.2.48 2010.10.08.18.46.02 yongari Add delta 1.198.2.49 2010.10.08.18.51.28 yongari Edit src/sys/dev/bge/if_bgereg.h Add delta 1.73.2.22 2010.10.08.18.46.02 yongari Edit src/sys/dev/et/if_et.c Add delta 1.1.2.9 2010.10.08.19.25.46 yongari Edit src/sys/dev/mii/brgphy.c Add delta 1.70.2.14 2010.10.08.19.00.36 yongari Edit src/sys/dev/mii/brgphyreg.h Add delta 1.10.2.3 2010.10.08.19.00.36 yongari Add delta 1.7.2.6 2010.10.08.20.13.42 yongari Add delta 1.7.2.7 2010.10.08.20.27.51 yongari Add delta 1.7.2.8 2010.10.08.20.31.35 yongari Add delta 1.7.2.9 2010.10.08.20.37.13 yongari Add delta 1.7.2.10 2010.10.08.20.41.15 yongari Add delta 1.7.2.11 2010.10.08.20.44.35 yongari Add delta 1.7.2.12 2010.10.08.20.49.44 yongari Add delta 1.7.2.13 2010.10.08.20.52.47 yongari Add delta 1.7.2.14 2010.10.08.23.14.21 yongari Add delta 1.7.2.15 2010.10.08.23.29.45 yongari Add delta 1.7.2.16 2010.10.08.23.34.45 yongari Reverting if_bge.c, if_bgerec.h, brgphy.c, and brgphyreg.h to the version that existed on 10/7/2010 allows me to do a 7.4-PRERELEASE buildworld/ buildkernel/installkernel cycle and the resulting kernel does not exhibit the watchdog timeout problem. The problem appears only on samba shares exported to a Windows XP Pro client (I've been copying a directory with 17GB of "Ghost" images as a test). The dmesg for a kernel that failed (containing the patches above) is as follows: bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002003> mem 0xd0110000-0xd011ffff,0xd0100000-0xd010ffff irq 26 at device 2.0 on pci17 miibus0: <MII bus> on bge0 bge0: Ethernet address: 00:e0:81:58:2d:e3 bge0: [ITHREAD] bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002003> mem 0xd0130000-0xd013ffff,0xd0120000-0xd012ffff irq 27 at device 2.1 on pci17 miibus1: <MII bus> on bge1 bge1: Ethernet address: 00:e0:81:58:2d:e4 bge1: [ITHREAD] bge1: link state changed to UP bge1: watchdog timeout -- resetting bge1: link state changed to DOWN bge1: link state changed to UP bge1: watchdog timeout -- resetting bge1: link state changed to DOWN bge1: link state changed to UP bge1: watchdog timeout -- resetting bge1: link state changed to DOWN bge1: link state changed to UP bge1: watchdog timeout -- resetting bge1: link state changed to DOWN bge1: link state changed to UP Mike Squires mikes@siralan.org