This morning I had an idea about what the source of the watchdog problem is. Also, we have repro'd at least one type of watchdog inhouse. One question, is this problem only happening for those running STABLE with the 6.6.6 merged driver? We found the problem does not seem to happen on 7.0. Right now my suspicion is that the FAST irq handling is again causing a problem. I am experimenting with variations to the code today to be sure whats going on, and hopefully fixing it. Cheers, Jack
Jack Vogel wrote:> This morning I had an idea about what the source of the watchdog > problem is. Also, we have repro'd at least one type of watchdog > inhouse. > > One question, is this problem only happening for those running > STABLE with the 6.6.6 merged driver? > > We found the problem does not seem to happen on 7.0. > > Right now my suspicion is that the FAST irq handling is > again causing a problem. I am experimenting with variations > to the code today to be sure whats going on, and hopefully > fixing it. >I see it on HEAD and releng7. Sam
On 31/10/2007, at 6:16 AM, Jack Vogel wrote:> This morning I had an idea about what the source of the watchdog > problem is. Also, we have repro'd at least one type of watchdog > inhouse. > > One question, is this problem only happening for those running > STABLE with the 6.6.6 merged driver? > > We found the problem does not seem to happen on 7.0. >Sorry to burst your bubble, but it just happened to me on a 7.0-BETA1 machine running 6.5.3. This machine (Supermicro P4SC8 board) had been running without problems for almost a year previously. It's still happening often on a 6.2-STABLE machine (as previously reported). Was doing around 40Mbit/sec of Rsync traffic at the time. Again, em0 shares an interrupt (this time with atapci, not uhci) dmesg: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-BETA1 #0: Mon Oct 29 23:34:49 NZDT 2007 root@zzxyz.open2view.net:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3194.56-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x441d<SSE3,RSVD2,MON,DS_CPL,CNXT-ID,xTPR> real memory = 1072562176 (1022 MB) avail memory = 1035980800 (987 MB) ACPI APIC Table: <IntelR AWRDACPI> ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <IntelR AWRDACPI> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 3fde0000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: <ACPI CPU> on acpi0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection Version - 6.5.3> port 0xc000-0xc01f mem 0xf2100000-0xf211ffff irq 18 at device 1.0 on pci1 em0: Ethernet address: 00:30:48:81:e1:9e em0: [FILTER] pcib2: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <PCI-PCI bridge> at device 1.0 on pci2 pci3: <PCI bus> on pcib3 arcmsr0: <Areca SATA Host Adapter RAID Controller > mem 0xf2010000-0xf2010fff irq 26 at device 14.0 on pci3 ARECA RAID ADAPTER0: Driver Version 1.20.00.14 2007-2-05 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.36 2005-3-31 arcmsr0: [ITHREAD] uhci0: <UHCI (generic) USB controller> port 0xe100-0xe11f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0xe000-0xe01f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 2 ports with 2 removable, self powered pci0: <base peripheral> at device 29.4 (no driver attached) ehci0: <Intel 6300ESB USB 2.0 controller> mem 0xf2200000-0xf22003ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb2: EHCI version 1.0 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: <Intel 6300ESB USB 2.0 controller> on ehci0 usb2: USB revision 2.0 uhub2: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb2 uhub2: 4 ports with 4 removable, self powered pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci4: <ACPI PCI bus> on pcib4 vgapci0: <VGA-compatible display> port 0xd000-0xd0ff mem 0xf0000000-0xf0ffffff,0xf1040000-0xf1040fff irq 16 at device 9.0 on pci4 em1: <Intel(R) PRO/1000 Network Connection Version - 6.5.3> port 0xd100-0xd13f mem 0xf1000000-0xf101ffff irq 19 at device 10.0 on pci4 em1: Ethernet address: 00:30:48:81:e1:9f em1: [FILTER] isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 6300ESB UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] atapci1: <Intel 6300ESB SATA150 controller> port 0xe200-0xe207,0xe300-0xe303,0xe400-0xe407,0xe500-0xe503,0xe600-0xe60f irq 18 at device 31.2 on pci0 atapci1: [ITHREAD] ata2: <ATA channel 0> on atapci1 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci1 ata3: [ITHREAD] pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_tz0: <Thermal Zone> on acpi0 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] pmtimer0 on isa0 orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff pnpid ORM0000 on isa0 ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 ppi0: <Parallel I/O> on ppbus0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 3194562256 Hz quality 800 Timecounters tick every 1.000 msec Waiting 5 seconds for SCSI devices to settle acd0: CDROM <CD-224E/1.9A> at ata0-master UDMA33 da0 at arcmsr0 bus 0 target 0 lun 0 da0: <Areca ARC-1110-VOL#00 R001> Fixed Direct Access SCSI-3 device da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit) da0: 76293MB (156249600 512 byte sectors: 255H 63S/T 9726C) da1 at arcmsr0 bus 0 target 0 lun 1 da1: <Areca ARC-1110-VOL#01 R001> Fixed Direct Access SCSI-3 device da1: 166.666MB/s transfers (83.333MHz, offset 32, 16bit) da1: 2784728MB (5703123456 512 byte sectors: 255H 63S/T 355003C) GEOM_JOURNAL: Journal 490099366: da1p1 contains data. GEOM_JOURNAL: Journal 490099366: da1p1 contains journal. Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted GEOM_JOURNAL: Journal da1p1 consistent. WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted arplookup 210.55.230.210 failed: host is not on local network ichsmb0: <Intel 6300ESB (ICH) SMBus controller> port 0x500-0x51f irq 17 at device 31.3 on pci0 ichsmb0: [GIANT-LOCKED] ichsmb0: [ITHREAD] smbus0: <System Management Bus> on ichsmb0 smb0: <SMBus generic I/O> on smbus0 em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP
Hi Jack, In my case, yes but only under higher loads, specifically when Amanda started receive the backup data. At lower loads, like the check or sizing, no problems. Best regards, G?ran L --On Tuesday, October 30, 2007 10:16 -0700 Jack Vogel <jfvogel@gmail.com> wrote:> This morning I had an idea about what the source of the watchdog > problem is. Also, we have repro'd at least one type of watchdog > inhouse. > > One question, is this problem only happening for those running > STABLE with the 6.6.6 merged driver? > > We found the problem does not seem to happen on 7.0. > > Right now my suspicion is that the FAST irq handling is > again causing a problem. I am experimenting with variations > to the code today to be sure whats going on, and hopefully > fixing it. > > Cheers, > > Jack > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"................................................... the future isMobile Goran Lowkrantz <goran.lowkrantz@ismobile.com> System Architect, isMobile AB Sandviksgatan 81, PO Box 58, S-971 03 Lule?, Sweden Mobile: +46(0)70-587 87 82 http://www.ismobile.com ...............................................
On 31/10/2007, Jack Vogel <jfvogel@gmail.com> wrote:> This morning I had an idea about what the source of the watchdog > problem is. Also, we have repro'd at least one type of watchdog > inhouse. > > One question, is this problem only happening for those running > STABLE with the 6.6.6 merged driver?This doesn't happen for me on my local testing network with RELENG_6 built from October 14 sources. These boxes generally throw about 350mbit in/out of TCP between each other. I've recently upgraded one to RELENG_7 to test so I can't provide a dmesg+interrupt count for it. Note that under load the interrupts are around 8000/sec for em0 but I haven't run the tests for a few days and thus the "rate" figure for em0 is a bit misleading. wendy# vmstat -i interrupt total rate irq1: atkbd0 76 0 irq6: fdc0 1 0 irq15: ata1 46 0 irq17: em0 fwohci0 1503419876 1866 irq19: atapci1 766603 0 irq20: ohci0 ohci+ 0 0 cpu0: timer 1611280255 2000 Total 3115466857 3867 pciconf for the ethernet card: em0@pci1:6:0: class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 GT' class = network subclass = ethernet Dmesg: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-STABLE #6: Sun Oct 14 23:51:09 WST 2007 adrian@wendy.home.cacheboy.net:/usr/obj/usr/src/sys/WENDY Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 140 (1400.01-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0xf58 Stepping = 8 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!> real memory = 2147221504 (2047 MB) avail memory = 2096099328 (1998 MB) ACPI APIC Table: <A M I OEMAPIC > ioapic0 <Version 1.1> irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <A M I OEMRSDT> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <NVIDIA nForce3 AGP Controller> mem 0xf0000000-0xf7ffffff at device 0.0 on pci0 isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus, SMBus> at device 1.1 (no driver attached) ohci0: <nVidia nForce3 USB Controller> mem 0xfebfd000-0xfebfdfff irq 20 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <nVidia nForce3 USB Controller> on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: <nVidia nForce3 USB Controller> mem 0xfebfe000-0xfebfefff irq 20 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: <nVidia nForce3 USB Controller> on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered ehci0: <NVIDIA nForce3 USB 2.0 controller> mem 0xfebffc00-0xfebffcff irq 20 at device 2.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 4 ports each: usb0 usb1 usb2: <NVIDIA nForce3 USB 2.0 controller> on ehci0 usb2: USB revision 2.0 uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 6 ports with 6 removable, self powered nve0: <NVIDIA nForce MCP3 Networking Adapter> port 0xec00-0xec07 mem 0xfebfc000-0xfebfcfff irq 21 at device 5.0 on pci0 nve0: Ethernet address 00:0c:6e:98:55:5c miibus0: <MII bus> on nve0 rlphy0: <RTL8201L 10/100 media interface> on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto nve0: Ethernet address: 00:0c:6e:98:55:5c pci0: <multimedia, audio> at device 6.0 (no driver attached) atapci0: <nVidia nForce3 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 8.0 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pcib1: <ACPI PCI-PCI bridge> at device 10.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection Version - 6.6.6> port 0xdc00-0xdc3f mem 0xfc9e0000-0xfc9fffff,0xfc9c0000-0xfc9dffff irq 17 at device 6.0 on pci1 em0: Ethernet address: 00:0e:0c:b9:42:d8 atapci1: <Promise PDC20378 SATA150 controller> port 0xd800-0xd83f,0xd400-0xd40f,0xd000-0xd07f mem 0xfc99f000-0xfc99ffff,0xfc960000-0xfc97ffff irq 19 at device 8.0 on pci1 ata2: <ATA channel 0> on atapci1 ata3: <ATA channel 1> on atapci1 ata4: <ATA channel 2> on atapci1 fwohci0: <Texas Instruments TSB43AB22/A> mem 0xfc99e800-0xfc99efff,0xfc998000-0xfc99bfff irq 17 at device 9.0 on pci1 fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:00:2e:05:96 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: <IEEE1394(FireWire) bus> on fwohci0 fwe0: <Ethernet over FireWire> on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:2e:05:96 fwe0: Ethernet address: 02:e0:18:2e:05:96 fwe0: if_start running deferred for Giant sbp0: <SBP-2/SCSI over FireWire> on firewire0 fwohci0: Initiate bus reset fwohci0: BUS reset fwohci0: node_id=0xc000ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pci2: <display, VGA> at device 0.0 (no driver attached) acpi_button0: <Power Button> on acpi0 fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xcffff,0xd0000-0xd0fff on isa0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1400008080 Hz quality 800 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding enabled, default to accept, logging disabled acd0: CDRW <MATSHITA CD-RW CW-7586/1A17> at ata1-master PIO4 ad4: 76319MB <WDC WD800JD-00MSA1 10.01E01> at ata2-master SATA150 hwpmc: TSC/1/0x20<REA> K8/4/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA> Trying to mount root from ufs:/dev/ad4s1a HTH, -- Adrian Chadd - adrian@freebsd.org