Hi, I've recently put into production 2 web servers with 6.0-STABLE from mid january and was bitten by the bge watchdog timeouts problems. I cvsupped the 2 boxes with the latest -stable (latest if_bge.c, rev 1.91.2.17) but the problem still persists :( Server hardware is Dell poweredge 2550 with SMP kernel. Relevant portion of dmesg : bge0: <Broadcom BCM5700 B2, ASIC rev. 0x7102> mem 0xfeb00000-0xfeb0ffff irq 17 at device 8.0 on pci1 miibus0: <MII bus> on bge0 brgphy0: <BCM5401 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:06:5b:1a:7f:4a On the first box, the load is quite light so the problem as not yet re-appaeared since the upgrade. On the 2d box, which usually outputs 10-15 Mbit/s, the timeouts came back very shortly after the ugprade. Extract of logs from the 2d box : (uptime < 1h) Sep 11 01:19:50 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:19:50 www1 kernel: bge0: link state changed to DOWN Sep 11 01:19:54 www1 kernel: bge0: link state changed to UP Sep 11 01:26:10 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:26:10 www1 kernel: bge0: link state changed to DOWN Sep 11 01:26:13 www1 kernel: bge0: link state changed to UP Sep 11 01:27:32 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:27:32 www1 kernel: bge0: link state changed to DOWN Sep 11 01:27:35 www1 kernel: bge0: link state changed to UP Sep 11 01:28:52 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:28:52 www1 kernel: bge0: link state changed to DOWN Sep 11 01:28:55 www1 kernel: bge0: link state changed to UP Sep 11 01:31:12 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:31:12 www1 kernel: bge0: link state changed to DOWN Sep 11 01:31:15 www1 kernel: bge0: link state changed to UP Sep 11 01:33:57 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:33:57 www1 kernel: bge0: link state changed to DOWN Sep 11 01:34:00 www1 kernel: bge0: link state changed to UP Sep 11 01:34:16 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:34:16 www1 kernel: bge0: link state changed to DOWN Sep 11 01:34:19 www1 kernel: bge0: link state changed to UP Sep 11 01:34:41 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:34:41 www1 kernel: bge0: link state changed to DOWN Sep 11 01:34:44 www1 kernel: bge0: link state changed to UP Sep 11 01:35:06 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:35:06 www1 kernel: bge0: link state changed to DOWN Sep 11 01:35:09 www1 kernel: bge0: link state changed to UP Sep 11 01:36:17 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:36:17 www1 kernel: bge0: link state changed to DOWN Sep 11 01:36:20 www1 kernel: bge0: link state changed to UP Sep 11 01:37:47 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:37:47 www1 kernel: bge0: link state changed to DOWN Sep 11 01:37:50 www1 kernel: bge0: link state changed to UP Sep 11 01:38:53 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:38:53 www1 kernel: bge0: link state changed to DOWN Sep 11 01:38:56 www1 kernel: bge0: link state changed to UP Sep 11 01:39:56 www1 kernel: bge0: watchdog timeout -- resetting Sep 11 01:39:56 www1 kernel: bge0: link state changed to DOWN Sep 11 01:39:59 www1 kernel: bge0: link state changed to UP I've removed 'options SMP' from the kernel config of the loaded box but the timeouts continue to happen. What can I do to help resolve this bug ? -- Herve Boulouis
On Mon, Sep 11, 2006 at 02:17:22AM +0200, Herve Boulouis wrote: H> Hi, H> H> I've recently put into production 2 web servers with 6.0-STABLE from H> mid january and was bitten by the bge watchdog timeouts problems. H> H> I cvsupped the 2 boxes with the latest -stable (latest if_bge.c, H> rev 1.91.2.17) but the problem still persists :( H> H> Server hardware is Dell poweredge 2550 with SMP kernel. H> H> Relevant portion of dmesg : H> H> bge0: <Broadcom BCM5700 B2, ASIC rev. 0x7102> mem 0xfeb00000-0xfeb0ffff irq 17 at device 8.0 on pci1 H> miibus0: <MII bus> on bge0 H> brgphy0: <BCM5401 10/100/1000baseTX PHY> on miibus0 H> brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto H> bge0: Ethernet address: 00:06:5b:1a:7f:4a Is it integrated or not? I've got exactly the same NIC and I can try to reproduce the problem if you describe the workload. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE