Guido Falsi
2010-Feb-08 14:33 UTC
ATA_CAM + ZFS gives short 1-2 seconds system freeze on disk load
Hi! I'm seeing this problem on my machine at work. It's an HP DC 7800, mounts an ich9 chipset(not ahci capable). I'm attaching the dmesg. I noticed this in the past, but it got evident(and very annoying) while recompiling many ports today after the jpeg-8 update. It looks like it freezes the system for the second or two it takes to flush buffers to disk when there are big outputs. This happens when decompressiong big distfiles, mainly. The openoffice port triggers this almost continuosly every few seconds during compilation. I've also seen this when working with big files(for example graphic images in uncompressed formats). It gets very annoying and I don't remember this happening before activating the ATA_CAM flag. There was some slowdown with big disk access, but not a total freeze. The input I give during the freeze is not lost though, but replayed in a bunch as soon as the system becomes responsive again. BTW there's another thing that shows up on this machine. Lately, this too after putting the option ATA_CAM in the kernel, during boot there is a long pause(exactly one minute, as the message below states) in this point of the dmesg: uhub6: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus6 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 4 ports with 4 removable, self powered uhub3: 6 ports with 6 removable, self powered ugen2.2: <Logitech> at usbus2 ums0: <Logitech USB Gaming Mouse, class 0/0, rev 2.00/52.00, addr 2> on usbus2 ums0: 16 buttons and [XYZ] coordinates ID=0 uhid0: <Logitech USB Gaming Mouse, class 0/0, rev 2.00/52.00, addr 2> on usbus2 ugen1.2: <vendor 0x0df6> at usbus1 ubt0: <vendor 0x0df6 Sitecom USB bluetooth2.0 class 2 dongle CN-512, class 224/1 , rev 2.00/19.58, addr 2> on usbus1 ---------- 60 seconds pause happens here ------------ run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config ada0 at ata0 bus 0 scbus2 target 0 lun 0 ada0: <ST3160815AS 3.CHF> ATA-7 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO size 8192bytes)cd0 at ata1 bus 0 scbus3 target 0 lun 0 This is not a big problem but quite annoying too. Maybe they're related, maybe not. I don't relly know. Hope these indications help. If needed I could provide more information and test patches. Thanks to the developers for they're great work on FreeBSD. Hope this email helps make it even better! Thanks in advance for any attention to this problem! -- Guido Falsi <mad@madpilot.net> -------------- next part -------------- Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.0-STABLE #38: Mon Feb 8 10:27:55 CET 2010 root@vwg82.xxx.it:/usr/obj/usr/src/sys/VWG82 amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz (2327.51-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6fb Stepping = 11 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant real memory = 4299161600 (4100 MB) avail memory = 4037332992 (3850 MB) ACPI APIC Table: <COMPAQ BEARLAKE> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard netsmb_dev: loaded iscsi: version 2.1.0 kbd1 at kbdmux0 acpi0: <HPQOEM SLIC-BPC> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xf808-0xf80b on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0x1230-0x1237 mem 0xf0100000-0xf017ffff,0xe0000000-0xefffffff,0xf0000000-0xf00fffff irq 16 at device 2.0 on pci0 agp0: <Intel Q35 SVGA controller> on vgapci0 agp0: detected 6140k stolen memory agp0: aperture size is 256M drm0: <Intel Q35> on vgapci0 info: [drm] MSI enabled 1 message(s) vgapci0: child drm0 requested pci_enable_busmaster info: [drm] AGP at 0xe0000000 256MB info: [drm] Initialized i915 1.6.0 20080730 pci0: <simple comms> at device 3.0 (no driver attached) atapci0: <Intel ATA controller> port 0x1238-0x123f,0x1270-0x1273,0x1240-0x1247,0x1274-0x1277,0x11e0-0x11ef irq 18 at device 3.2 on pci0 atapci0: [ITHREAD] ata2: <ATA channel 0> on atapci0 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci0 ata3: [ITHREAD] pci0: <simple comms, UART> at device 3.3 (no driver attached) em0: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0x1100-0x111f mem 0xf0180000-0xf019ffff,0xf01a5000-0xf01a5fff irq 19 at device 25.0 on pci0 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 00:1e:0b:ab:e6:a7 uhci0: <Intel 82801I (ICH9) USB controller> port 0x1120-0x113f irq 20 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: <Intel 82801I (ICH9) USB controller> on uhci0 uhci1: <Intel 82801I (ICH9) USB controller> port 0x1140-0x115f irq 21 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: <Intel 82801I (ICH9) USB controller> on uhci1 uhci2: <Intel 82801I (ICH9) USB controller> port 0x1160-0x117f irq 22 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: <Intel 82801I (ICH9) USB controller> on uhci2 ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xf01a6000-0xf01a63ff irq 22 at device 26.7 on pci0 ehci0: [ITHREAD] usbus3: EHCI version 1.0 usbus3: <Intel 82801I (ICH9) USB 2.0 controller> on ehci0 hdac0: <Intel 82801I High Definition Audio Controller> mem 0xf01a0000-0xf01a3fff irq 21 at device 27.0 on pci0 hdac0: HDA Driver Revision: 20100122_0141 hdac0: [ITHREAD] pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci32: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 21 at device 28.1 on pci0 pci48: <ACPI PCI bus> on pcib2 uhci3: <Intel 82801I (ICH9) USB controller> port 0x1180-0x119f irq 20 at device 29.0 on pci0 uhci3: [ITHREAD] usbus4: <Intel 82801I (ICH9) USB controller> on uhci3 uhci4: <Intel 82801I (ICH9) USB controller> port 0x11a0-0x11bf irq 21 at device 29.1 on pci0 uhci4: [ITHREAD] usbus5: <Intel 82801I (ICH9) USB controller> on uhci4 ehci1: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xf01a6400-0xf01a67ff irq 20 at device 29.7 on pci0 ehci1: [ITHREAD] usbus6: EHCI version 1.0 usbus6: <Intel 82801I (ICH9) USB 2.0 controller> on ehci1 pcib3: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci7: <ACPI PCI bus> on pcib3 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel ICH9 SATA300 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x11f0-0x11ff,0x1200-0x120f irq 18 at device 31.2 on pci0 ata0: <ATA channel 0> on atapci1 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci1 ata1: [ITHREAD] atapci2: <Intel ICH9 SATA300 controller> port 0x1260-0x1267,0x1280-0x1283,0x1268-0x126f,0x1284-0x1287,0x1210-0x121f,0x1220-0x122f irq 18 at device 31.5 on pci0 atapci2: [ITHREAD] ata4: <ATA channel 0> on atapci2 ata4: [ITHREAD] ata5: <ATA channel 1> on atapci2 ata5: [ITHREAD] acpi_button0: <Power Button> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: <Parallel port> port 0x378-0x37f,0x778-0x77d irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/13 bytes threshold ppc0: [ITHREAD] ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 plip0: [ITHREAD] lpt0: <Printer> on ppbus0 lpt0: [ITHREAD] lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] cpu0: <ACPI CPU> on acpi0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14 Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0xfc offMax=0x25a Waiting 5 seconds for SCSI devices to settle hdac0: HDA Codec #0: Analog Devices AD1884 pcm0: <HDA Analog Devices AD1884 PCM #0 Analog> at cad 0 nid 1 on hdac0 pcm1: <HDA Analog Devices AD1884 PCM #1 Analog> at cad 0 nid 1 on hdac0 usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 480Mbps High Speed USB v2.0 ugen0.1: <Intel> at usbus0 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <Intel> at usbus2 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen3.1: <Intel> at usbus3 uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 ugen4.1: <Intel> at usbus4 uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <Intel> at usbus5 uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 ugen6.1: <Intel> at usbus6 uhub6: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus6 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 4 ports with 4 removable, self powered uhub3: 6 ports with 6 removable, self powered ugen2.2: <Logitech> at usbus2 ums0: <Logitech USB Gaming Mouse, class 0/0, rev 2.00/52.00, addr 2> on usbus2 ums0: 16 buttons and [XYZ] coordinates ID=0 uhid0: <Logitech USB Gaming Mouse, class 0/0, rev 2.00/52.00, addr 2> on usbus2 ugen1.2: <vendor 0x0df6> at usbus1 ubt0: <vendor 0x0df6 Sitecom USB bluetooth2.0 class 2 dongle CN-512, class 224/1, rev 2.00/19.58, addr 2> on usbus1 run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config ada0 at ata0 bus 0 scbus2 target 0 lun 0 ada0: <ST3160815AS 3.CHF> ATA-7 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO size 8192bytes)cd0 at ata1 bus 0 scbus3 target 0 lun 0 cd0: <TSSTcorp CDDVDW TS-H653N HB01> Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO size 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed ada0: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C) ada1 at ata4 bus 0 scbus4 target 0 lun 0 ada1: <ST3160815AS 3.CHF> ATA-7 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO size 8192bytes) ada1: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C) SMP: AP CPU #1 Launched! Trying to mount root from zfs:tank em0: link state changed to UP
Ivan Voras
2010-Feb-08 14:41 UTC
ATA_CAM + ZFS gives short 1-2 seconds system freeze on disk load
On 02/08/10 15:33, Guido Falsi wrote:> It looks like it freezes the system for the second or two it takes > to flush buffers to disk when there are big outputs. This happens > when decompressiong big distfiles, mainly. The openoffice port > triggers this almost continuosly every few seconds during compilation. > I've also seen this when working with big files(for example graphic > images in uncompressed formats). > > It gets very annoying and I don't remember this happening before > activating the ATA_CAM flag. There was some slowdown with big disk > access, but not a total freeze.I think ZFS does this all the time, i.e. regardless of underlying device drivers. Can you test your theory by going to an older kernel and keeping *everything* else the same?
Jeremy Chadwick
2010-Feb-08 14:51 UTC
ATA_CAM + ZFS gives short 1-2 seconds system freeze on disk load
On Mon, Feb 08, 2010 at 03:33:29PM +0100, Guido Falsi wrote:> Hi! > > I'm seeing this problem on my machine at work. It's an HP DC 7800, > mounts an ich9 chipset(not ahci capable). I'm attaching the dmesg. > > I noticed this in the past, but it got evident(and very annoying) > while recompiling many ports today after the jpeg-8 update. > > It looks like it freezes the system for the second or two it takes > to flush buffers to disk when there are big outputs. This happens > when decompressiong big distfiles, mainly. The openoffice port > triggers this almost continuosly every few seconds during compilation. > I've also seen this when working with big files(for example graphic > images in uncompressed formats). > > It gets very annoying and I don't remember this happening before > activating the ATA_CAM flag. There was some slowdown with big disk > access, but not a total freeze.This happens without ATA_CAM (e.g. using ataahci(4) or any other controller driver). The behaviour you're describing (bursty heavy disk I/O that stalls the subsystem) is pretty much the norm on all FreeBSD systems I've seen with ZFS. When it starts happening, it's easy to notice/follow using "zpool iostat 1" or "gstat -I500ms". Lots of I/O will happen (read or write) and the ARC is essentially being thrashed -- said utilities won't show any I/O counters incrementing until some threshold is reached, where you'll see a massive amount of I/O reported, during which time the system is sluggish (beyond acceptable levels, IMHO). A few seconds later, the I/O counters start reporting 0 as the ARC gets used, then a few seconds massive I/O, rinse lather repeat. I've seen Solaris 10 systems which behave the same way, and others which don't. I don't know what causes things to start behaving this way.> BTW there's another thing that shows up on this machine. Lately, this > too after putting the option ATA_CAM in the kernel, during boot there is > a long pause(exactly one minute, as the message below states) in this > point of the dmesg:This should probably be discussed in a different thread. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |