All, I'm seeing some patterns here with all of the network driver problem reports, but I need more information to help narrow it down further. I ask all of you who are having problems to take a minute to fill out this survey and return it to Kris Kennaway (on cc:) and myself. Thanks. 1. Are you experiencing network hangs and/or "timeout" messages on the console? If yes, please provide a _brief_ description of the problem. 2. What version of FreeBSD is experiencing this problem? When did the problem start? 3. What network card is experiencing the problem? If there is more than one network card/port in the system, please state which ones are having problems. Optionally, describe the model of the network card, or attach a dmesg output that shows it. 4. Is this an SMP or UP machine? Is your kernel configured for SMP? 5. Is this machine capable of Hyperthreading (note that this is not the same as multi-core, like the newer AMD and Intel chips)? If so, is Hyperthreading enabled or disabled in the BIOS? If it is enabled, is hyperthreading enabled or disabled in the kernel? 6. Is the 'apic' device configured in your kernel? If so, does it appear to be active in the system? An easy way to tell this is if there are interrupt numbers greater than 15. Another way to tell is if there are 'ioapic' device messages early in the boot. 7. Are you using any code patches or non-default configuration options? 8. Are you using the 4BSD or the ULE scheduler? You are also welcome to attach a copy of your 'dmesg' output as well as the output from 'vmstat -i'. Please note that a verbose dmesg is not needed. I appreciate everyone's help here. Kris and I may not be able to respond to every survey entry on this, but your input is still very valuable and appreciated. Thanks! Scott
On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:> All, > > I'm seeing some patterns here with all of the network driver problem > reports, but I need more information to help narrow it down further. > I ask all of you who are having problems to take a minute to fill > out this survey and return it to Kris Kennaway (on cc:) and myself. > Thanks. > > 1. Are you experiencing network hangs and/or "timeout" messages on the > console? If yes, please provide a _brief_ description of the problem.After Bill Moran set me up with access to his machine that is experiencing bce timeouts, I determined that his instance of the problem is due to a bce driver bug (which triggered an INVARIANTS sanity check when configured, and panicked). I suspect there are several seperate problem here though (this one did not involve em at all). Nevertheless, a similar driver bug may be to blame in other cases too. So, can everyone who is seeing some kind of driver watchdog timeout problem please recompile their kernels with the following options: option INVARIANTS option INVARIANT_SUPPORT and confirm (in addition to the information previously requested) whether or not their kernel panics in conjunction with the timeout. If yes, then you can follow the information in the developers' handbook chapter on kernel debugging to proceed with gathering the information we'll need to proceed. Kris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061005/4d332505/attachment.pgp
On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:> All, > > I'm seeing some patterns here with all of the network driver problem > reports, but I need more information to help narrow it down further. > I ask all of you who are having problems to take a minute to fill > out this survey and return it to Kris Kennaway (on cc:) and myself. > Thanks. > > 1. Are you experiencing network hangs and/or "timeout" messages on the > console? If yes, please provide a _brief_ description of the problem.OK, next question, to all em users: If your em device is using a shared interrupt, and you are NOT experiencing timeout problems when using this device, please let me know: dalki# vmstat -i interrupt total rate irq4: sio0 2071 0 irq6: fdc0 10 0 irq14: ata0 47 0 irq20: ahd0 21755 4 irq23: em0 124751 23 <-- not a shared interrupt irq24: ahd1 15 0 cpu0: timer 10453509 1999 Total 10602158 2027 tyan# vmstat -i interrupt total rate irq14: ata0 58 0 irq16: em0 fxp1 332832 851 <-- shared interrupt irq18: fxp0 973 2 irq19: atapci1 132883 339 cpu0: timer 774308 1980 cpu1: timer 777136 1987 Total 2018190 5161 So far all of the em problems I have seen involve shared interrupts, and conversely all em systems I have seen that do not have timeout problems are not shared. Kris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061005/3b32b93d/attachment.pgp
Hi, We do experience some trouble with em. 1. em0: watchdog timeout resetting Network hangs, server becomes unreachable, and after a few minutes, console doesn't respond anymore. We tried to type something at the console prompt and we got some "chinese" characters :) We had to reboot badly. twice. Then we switched to an xl network card. The problem occurred under load (heavy for our use) : nightly nfs backup that wasn't completed, samba home directory for 300 users, squid just 5 minutes after users began to work at the office. Really bad. 2. We are using a 6.1 release. We were using an old 5.2.1 before that. And we already had some network hang followed by server hang that's why we upgraded to 6.1. really bad idea. The problem occurs once a week now. 3. em cards are experiencing the problem. We bought a brand new one, and it stills occurs. So we are now using the integrated xl card. It works fine, but at 100Mbps. 4. It is an SMP machine. We are using the default kernel. 5. machine not hyperthreading capable 6.ioapic0 <Version 1.1> irqs 0-23 on motherboard 7. no code patch, default configuration, default kernel 8. 4BSD scheduler I can't do a lot of test with this server, as it is a production one. Hope this will help you to fix. See below, vmwtat, dmesg and pciconf Regards Stephane # vmstat -i interrupt total rate irq1: atkbd0 606 0 irq6: fdc0 3 0 irq14: ata0 47 0 irq19: xl0 19094529 540 irq20: em0 0 0 irq21: em1 twe0 654923 18 cpu0: timer 70651494 1999 cpu1: timer 70651376 1999 Total 161052978 4558 #dmesg Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE #0: Sun May 7 04:42:56 UTC 2006 root@opus.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) MP 2600+ (2133.42-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow+,3DNow> real memory = 536346624 (511 MB) avail memory = 515244032 (491 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 1 cpu1 (AP): APIC ID: 0 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <PTLTD RSDT> on motherboard acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <AMD 762 host to AGP bridge> port 0x10a0-0x10a3 mem 0xd2000000-0xd3ffffff,0xd0d00000-0xd0d00fff at device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <display, VGA> at device 5.0 (no driver attached) isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <AMD 768 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pci0: <bridge> at device 7.3 (no driver attached) em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x1000-0x103f mem 0xd0880000-0xd089ffff,0xd0800000-0xd083ffff irq 20 at device 8.0 on pci0 em0: Ethernet address: 00:04:23:d5:40:80 em1: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x1040-0x107f mem 0xd08a0000-0xd08bffff,0xd0840000-0xd087ffff irq 21 at device 8.1 on pci0 em1: Ethernet address: 00:04:23:d5:40:81 twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x1090-0x109f mem 0xd0000000-0xd07fffff irq 21 at device 9.0 on pci0 twe0: [GIANT-LOCKED] twe0: 8 ports, Firmware FE7X 1.05.00.056, BIOS BE7X 1.08.00.046 pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0 pci2: <ACPI PCI bus> on pcib2 xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x3000-0x307f mem 0xd0a01000-0xd0a0107f irq 19 at device 8.0 on pci2 miibus0: <MII bus> on xl0 ukphy0: <Generic IEEE 802.3u media interface> on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:e0:81:21:81:33 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xcffff,0xd0000-0xd07ff,0xd0800-0xd17ff,0xe0000-0xe3fff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM <SR244W/T01A> at ata0-master UDMA33 twed0: <Unit 0, TwinStor, Normal> on twe0 twed0: 39265MB (80416192 sectors) twed1: <Unit 2, RAID5, Normal> on twe0 twed1: 471192MB (965002240 sectors) SMP: AP CPU #1 Launched! # pciconf -lv agp0@pci0:0:0: class=0x060000 card=0x00000000 chip=0x700c1022 rev=0x11 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-762 CPU to PCI Bridge (SMP chipset)' class = bridge subclass = HOST-PCI pcib1@pci0:1:0: class=0x060400 card=0x00000000 chip=0x700d1022 rev=0x00 hdr=0x01 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-762 CPU to PCI Bridge (AGP 4x)' class = bridge subclass = PCI-PCI isab0@pci0:7:0: class=0x060100 card=0x00000000 chip=0x74401022 rev=0x05 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 PCI to ISA/LPC Bridge' class = bridge subclass = PCI-ISA atapci0@pci0:7:1: class=0x01018a card=0x74411022 chip=0x74411022 rev=0x04 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 EIDE Controller' class = mass storage subclass = ATA none0@pci0:7:3: class=0x068000 card=0x74431022 chip=0x74431022 rev=0x03 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 System Management' class = bridge em0@pci0:8:0: class=0x020000 card=0x11798086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class = network subclass = ethernet em1@pci0:8:1: class=0x020000 card=0x11798086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class = network subclass = ethernet twe0@pci0:9:0: class=0x010400 card=0x100113c1 chip=0x100113c1 rev=0x01 hdr=0x00 vendor = '3ware Inc.' device = '7000/8000 series ATA-133 Storage Controller' class = mass storage subclass = RAID pcib2@pci0:16:0: class=0x060400 card=0x00000000 chip=0x74481022 rev=0x05 hdr=0x01 vendor = 'Advanced Micro Devices (AMD)' device = 'AMD-768 PCI Bridge' class = bridge subclass = PCI-PCI none1@pci1:5:0: class=0x030000 card=0x03251039 chip=0x03251039 rev=0x00 hdr=0x00 vendor = 'Silicon Integrated Systems (SiS)' device = 'SiS325 2D/3D Accelerator' class = display subclass = VGA xl0@pci2:8:0: class=0x020000 card=0x246610f1 chip=0x920010b7 rev=0x78 hdr=0x00 vendor = '3COM Corp, Networking Division' device = '3C905C-TX Fast EtherLink for PC Management NIC' class = network subclass = ethernet
mats.lindberg@se.transport.bombardier.com
2006-Oct-24 06:12 UTC
em, bge, network problems survey.
Hello This are our findings regarding the survey. Mats Lindberg Bombardier 1. Are you experiencing network hangs and/or "timeout" messages on the console? If yes, please provide a _brief_ description of the problem.>The system runs on an IBM i386 xSeries server and polls every 100milliseconds another machine using ethernet.>We had a watchdog timeout in the network card and our application wentdown:>Oct 22 04:14:26 fspa2 kernel: bge0: watchdog timeout -- resetting }2. What version of FreeBSD is experiencing this problem? When did the problem start?> FreeBSD 5.4, the problem appears randomly.3. What network card is experiencing the problem? If there is more than one network card/port in the system, please state which ones are having problems. Optionally, describe the model of the network card, or attach a dmesg output that shows it.> There are two ethernet cards in the system, only bge0 is used, see dmesgoutput below 4. Is this an SMP or UP machine? Is your kernel configured for SMP?>UniProcessor >See the attached sysctl -a output.5. Is this machine capable of Hyperthreading (note that this is not the same as multi-core, like the newer AMD and Intel chips)? If so, is Hyperthreading enabled or disabled in the BIOS? If it is enabled, is hyperthreading enabled or disabled in the kernel?>This is the Kernel config file. >include GENERIC >ident PREEMPTION-GENERIC >options PREEMPTION # kernel preemption >options SEMMNS=150 >options SEMMSL=1506. Is the 'apic' device configured in your kernel? If so, does it appear to be active in the system? An easy way to tell this is if there are interrupt numbers greater than 15. Another way to tell is if there are 'ioapic' device messages early in the boot. There are irq > 15 and there are 'ioapic' device messages early in the boot: Oct 19 07:08:04 fspa2 kernel: ioapic2 <Version 2.0> irqs 48-71 on motherboard Oct 19 07:08:04 fspa2 kernel: ioapic1 <Version 2.0> irqs 24-47 on motherboard Oct 19 07:08:04 fspa2 kernel: ioapic0 <Version 2.0> irqs 0-23 on motherboard 7. Are you using any code patches or non-default configuration options? No 8. Are you using the 4BSD or the ULE scheduler? 4BSD Sheduler You are also welcome to attach a copy of your 'dmesg' output as well as the output from 'vmstat -i'. Please note that a verbose dmesg is not needed. _______________________________________________________________________________________________________________ This e-mail communication (and any attachment/s) may contain confidential or privileged information and is intended only for the individual(s) or entity named above and to others who have been specifically authorized to receive it. If you are not the intended recipient, please do not read, copy, use or disclose the contents of this communication to others. Please notify the sender that you have received this e-mail in error by reply e-mail, and delete the e-mail subsequently. Please note that in order to protect the security of our information systems an AntiSPAM solution is in use and will browse through incoming emails. Thank you. _________________________________________________________________________________________________________________ Ce message (ainsi que le(s) fichier/s), transmis par courriel, peut contenir des renseignements confidentiels ou prot?g?s et est destin? ? l?usage exclusif du destinataire ci-dessus. Toute autre personne est par les pr?sentes avis?e qu?il est strictement interdit de le diffuser, le distribuer ou le reproduire. Si vous l?avez re?u par inadvertance, veuillez nous en aviser et d?truire ce message. Veuillez prendre note qu'une solution antipollupostage (AntiSPAM) est utilis?e afin d'assurer la s?curit? de nos systems d'information et qu'elle fur?tera les courriels entrant. Merci. _________________________________________________________________________________________________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: vmstat.output.gz Type: application/octet-stream Size: 190 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061024/e59fc29d/vmstat.output.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: dmesg.output.gz Type: application/octet-stream Size: 2216 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061024/e59fc29d/dmesg.output.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: sysctl-a.gz Type: application/octet-stream Size: 11646 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061024/e59fc29d/sysctl-a.obj