Mike Andrews
2007-Jan-16 18:04 UTC
6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
I have a strange issue with em0 watchdog timeouts that I think is not the same as the ones everyone was having during the 6.2 beta cycle... I have six systems, each with two Intel GigE ports onboard: Systems A and B: Supermicro PDSMi+ Systems C and D: Supermicro PDSMi (without the plus) System E: Tyan S2730U3GN System F: Supermicro X5DPA-GG On each system: em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch. em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch. All six run FreeBSD 6.2-RELEASE i386, even though the first four are capable of running amd64. They all have 2 GB of memory, except E which has 4 GB. The kernel configs are all identical, and are not that far from GENERIC + SMP. Several times a day, em0 will go down, give a watchdog timeout error on the console, then come right back up on its own a few seconds later. But here's the weird twist: it ONLY happens on systems A and B, and ONLY when running at gigabit speed. If I knock the two switch ports down to 100 meg, the problem goes away. The other four systems C thru F never have watchdog timeout issues; they always work perfectly even at gigabit speed. So I'm trying to figure out if there are any other obvious hardware differences between the plus and non-plus version of the PDSMi that would be causing issues on the plus version. Fortunately, at the moment we are not (yet) pushing anywhere near even 100 meg worth of traffic through these ports, so it's a tolerable workaround... just kinda annoying. :) The chipset is a bit different: the PDSMi is the Intel E7230 chipset for Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo support. But apparently the NIC chips are identical: 82573V for em0 and 82573L for em1. The BIOS is identical too, so the chipsets must be pretty similar. Nothing shares an IRQ with the NICs. (USB is disabled in the BIOS.) They do have different disk systems; A and B are SATA gmirror setups, while C and D use LSI Megaraid SCSI cards for their mirrors. I have tried the obvious switching the cables out. No difference at all. I have NOT yet tried a different gigabit switch. Hopefully that's enough detail to start; I can get into more specifics as needed. (Kernel configs, dmesg output, IRQ details, disk details, IPMI, running apps, serial console access if needed...)
Jack Vogel
2007-Jan-16 22:22 UTC
6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
On 1/16/07, Mike Andrews <mandrews@bit0.com> wrote:> I have a strange issue with em0 watchdog timeouts that I think is not the > same as the ones everyone was having during the 6.2 beta cycle... > > I have six systems, each with two Intel GigE ports onboard: > > Systems A and B: Supermicro PDSMi+ > Systems C and D: Supermicro PDSMi (without the plus) > System E: Tyan S2730U3GN > System F: Supermicro X5DPA-GG > > On each system: > em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch. > em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch. > > All six run FreeBSD 6.2-RELEASE i386, even though the first four are > capable of running amd64. They all have 2 GB of memory, except E which > has 4 GB. The kernel configs are all identical, and are not that far from > GENERIC + SMP. > > Several times a day, em0 will go down, give a watchdog timeout error on > the console, then come right back up on its own a few seconds later. But > here's the weird twist: it ONLY happens on systems A and B, and ONLY when > running at gigabit speed. If I knock the two switch ports down to 100 > meg, the problem goes away. > > The other four systems C thru F never have watchdog timeout issues; they > always work perfectly even at gigabit speed. > > So I'm trying to figure out if there are any other obvious hardware > differences between the plus and non-plus version of the PDSMi that would > be causing issues on the plus version. Fortunately, at the moment we are > not (yet) pushing anywhere near even 100 meg worth of traffic through > these ports, so it's a tolerable workaround... just kinda annoying. :) > > The chipset is a bit different: the PDSMi is the Intel E7230 chipset for > Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo > support. But apparently the NIC chips are identical: 82573V for em0 and > 82573L for em1. The BIOS is identical too, so the chipsets must be pretty > similar. Nothing shares an IRQ with the NICs. (USB is disabled in the > BIOS.) They do have different disk systems; A and B are SATA gmirror > setups, while C and D use LSI Megaraid SCSI cards for their mirrors. > > I have tried the obvious switching the cables out. No difference at all. > > I have NOT yet tried a different gigabit switch. > > Hopefully that's enough detail to start; I can get into more specifics as > needed. (Kernel configs, dmesg output, IRQ details, disk details, IPMI, > running apps, serial console access if needed...)There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Let me know if it does fix it please. Jack -------------- next part -------------- A non-text attachment was scrubbed... Name: dcgdis.ThisIsZip Type: application/octet-stream Size: 158727 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070116/50a088dc/dcgdis-0001.obj
Jeremy Chadwick
2007-Jan-17 02:59 UTC
6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote:> There are some management related issues with this NIC, first if you > have not done so make a DOS bootable device, and run this app I > am enclosing, it fixes the prom setting that is wrong on some devices. > It will do no harm, and it may solve things.Jack, Can you expand on what this application changes in the PROM? I have an Intel motherboard which suffers from similar to what the OP has reported (em0 watchdog timeouts), and was curious what the utility does before firing up the board and trying it. Others may be curious to know, too. Thanks, as always. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
NISHIMURA Yutaka
2007-Feb-02 15:12 UTC
6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
Hello. This is Yutaka. I am unskilled using english. If you don't understand my english , please teach for me. I came from FreeBSD-users-jp (Japanese mailing-list) which has been guidanced. http://home.jp.freebsd.org/cgi-bin/showmail/FreeBSD-users-jp/90318 This thired.> 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike Andrews > * 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jack Vogel > o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike Andrews > o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jeremy Chadwick > + 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jack Vogel > # 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) John Baldwin > o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike AndrewsI know this phenomenon. My environment generate it. Changed setting with problem has improved. I was setting disable USB in BIOS. Having been generated problem by 3 times after reboot. however, no-problem last 3 days. enable USB, this problem 4-10 times per hour. disable USB, this problem 3 times after reboot. # pciconf -l -v em0@pci0:8:0: class=0x020000 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class = network subclass = ethernet disable USB, dmesg.boot log. Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE #0: Sat Jan 20 12:12:56 JST 2007 root@example.net:/usr/src/sys/i386/compile/NATBOX ACPI APIC Table: <AMIINT VIA_K7 > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Sempron(tm) (1403.19-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow+,3DNow> real memory = 805240832 (767 MB) avail memory = 774430720 (738 MB) ioapic0 <Version 0.3> irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <AMIINT VIA_K7> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <VIA 8377 (Apollo KT400/KT400A/KT600) host to PCI bridge> mem 0xe0000000-0xe7ffffff at device 0.0 on pci0 pcib1: <PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci0: <display, VGA> at device 5.0 (no driver attached) atapci0: <Promise PDC40518 SATA150 controller> port 0xec00-0xec7f,0xe800-0xe8ff mem 0xdfffb000-0xdfffbfff,0xdffc0000-0xdffdffff irq 17 at device 6.0 on pci0 ata2: <ATA channel 0> on atapci0 ata3: <ATA channel 1> on atapci0 ata4: <ATA channel 2> on atapci0 ata5: <ATA channel 3> on atapci0 em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xe400-0xe43f mem 0xdff80000-0xdff9ffff,0xdff60000-0xdff7ffff irq 18 at device 8.0 on pci0 em0: Ethernet address: 00:07:e9:xx:x:xx xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe000-0xe07f mem 0xdfffaf80-0xdfffafff irq 17 at device 10.0 on pci0 miibus0: <MII bus> on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:10:5a:xx:xx:xx atapci1: <VIA 6420 SATA150 controller> port 0xdc00-0xdc07,0xd800-0xd803,0xd400-0xd407,0xd000-0xd003,0xcc00-0xcc0f,0xc800-0xc8ff irq 20 at device 15.0 on pci0 ata6: <ATA channel 0> on atapci1 ata7: <ATA channel 1> on atapci1 atapci2: <VIA 8237 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: <ATA channel 0> on atapci2 ata1: <ATA channel 1> on atapci2 isab0: <PCI-ISA bridge> at device 17.0 on pci0 isa0: <ISA bus> on isab0 pci0: <multimedia, audio> at device 17.5 (no driver attached) acpi_button1: <Sleep Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 fdc0: <floppy drive controller> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A fdc0: <floppy drive controller> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xccfff,0xcd000-0xce7ff,0xe0000-0xe0fff on isa0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Thank you. -- NISHIMURA,Yutaka. <forml@aypio.net>