Pete French
2012-Jan-10 10:56 UTC
Odd zpool problem - always one disc offline, maybe controller related ?
I upgraded my system to -stable on January 6th, and since then I have noticed a very odd problem. I have a zpool with 4 drives in it, and one of them is always 'OFFLINE' - if I put it online and it styarts resolvering then another one immediately goes offline. It's the same two drives alternating as well - very perplexing. I have checked all the cabling (they are eSATA drives), and it is all pushed home solid. It looks from dmesg like the drive is disconnecting and reconnecting briefly, but thats triggering it being dropped out of the zpool. I must admit that though I noticed thos on the 6th, I cant tell you whhether it was working on the version I was runnign previously, as I dont check the zpool on that machine as ofetn as I shiuld. Am recompiling an earlier version now though to see. Details of what happens are below: -pete. ------ [pete@skerry ~]$ zpool status pool: cube state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 6.41G in 2h27m with 0 errors on Mon Jan 9 23:23:27 2012 config: NAME STATE READ WRITE CKSUM cube DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 ada1 ONLINE 0 0 0 8890308235385361660 REMOVED 0 0 0 was /dev/ada0 errors: No known data errors [pete@skerry ~]$ su Password: skerry# zpool online ada0 missing device name usage: online [-e] <pool> <device> ... skerry# zpool online cube ada0 skerry# zpool status pool: cube state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jan 10 09:03:58 2012 1.02G scanned out of 1.42T at 80.6M/s, 5h8m to go 492M resilvered, 0.07% done config: NAME STATE READ WRITE CKSUM cube DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 ada2 ONLINE 0 0 0 6739201713000599902 REMOVED 0 0 0 was /dev/ada3 mirror-1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 (resilvering) errors: No known data errors skerry# ...and from dmesg at the point I did that: (ada3:siisch3:0:0:0): lost device (ada3:siisch3:0:0:0): removing device entry ada3 at siisch3 bus 0 scbus3 target 0 lun 0 ada3: <WDC WD1002FBYS-02A6B0 03.00C06> ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) here is the boot dmesg: Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-STABLE #0: Fri Jan 6 12:41:32 GMT 2012 pete@skerry.drayhouse:/usr/obj/usr/src/sys/GENERIC amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2992.52-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant real memory = 4299161600 (4100 MB) avail memory = 4024582144 (3838 MB) ACPI APIC Table: <COMPAQ BEARLAKE> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <HPQOEM SLIC-BPC> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xf808-0xf80b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1 pci2: <ACPI PCI bus> on pcib2 siis0: <SiI3124 SATA controller> port 0x3100-0x310f mem 0xf0308000-0xf030807f,0xf0300000-0xf0307fff irq 16 at device 0.0 on pci2 siis0: [ITHREAD] siisch0: <SIIS channel> at channel 0 on siis0 siisch0: [ITHREAD] siisch1: <SIIS channel> at channel 1 on siis0 siisch1: [ITHREAD] siisch2: <SIIS channel> at channel 2 on siis0 siisch2: [ITHREAD] siisch3: <SIIS channel> at channel 3 on siis0 siisch3: [ITHREAD] vgapci0: <VGA-compatible display> port 0x4240-0x4247 mem 0xf0100000-0xf017ffff,0xe0000000-0xefffffff,0xf0000000-0xf00fffff irq 16 at device 2.0 on pci0 agp0: <Intel Q33 SVGA controller> on vgapci0 agp0: aperture size is 256M, detected 6140k stolen memory pci0: <simple comms> at device 3.0 (no driver attached) em0: <Intel(R) PRO/1000 Network Connection 7.2.3> port 0x4100-0x411f mem 0xf0180000-0xf019ffff,0xf01a4000-0xf01a4fff irq 19 at device 25.0 on pci0 em0: Using an MSI interrupt em0: [FILTER] em0: Ethernet address: 00:1f:29:d3:51:be uhci0: <Intel 82801I (ICH9) USB controller> port 0x4120-0x413f irq 20 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: <Intel 82801I (ICH9) USB controller> on uhci0 uhci1: <Intel 82801I (ICH9) USB controller> port 0x4140-0x415f irq 21 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: <Intel 82801I (ICH9) USB controller> on uhci1 uhci2: <Intel 82801I (ICH9) USB controller> port 0x4160-0x417f irq 22 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: <Intel 82801I (ICH9) USB controller> on uhci2 ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xf01a5000-0xf01a53ff irq 22 at device 26.7 on pci0 ehci0: [ITHREAD] usbus3: EHCI version 1.0 usbus3: <Intel 82801I (ICH9) USB 2.0 controller> on ehci0 pci0: <multimedia, HDA> at device 27.0 (no driver attached) pcib3: <ACPI PCI-PCI bridge> irq 20 at device 28.0 on pci0 pci32: <ACPI PCI bus> on pcib3 siis1: <SiI3132 SATA controller> port 0x1100-0x117f mem 0xf0404000-0xf040407f,0xf0400000-0xf0403fff irq 16 at device 0.0 on pci32 siis1: [ITHREAD] siisch4: <SIIS channel> at channel 0 on siis1 siisch4: [ITHREAD] siisch5: <SIIS channel> at channel 1 on siis1 siisch5: [ITHREAD] pcib4: <ACPI PCI-PCI bridge> irq 21 at device 28.1 on pci0 pci48: <ACPI PCI bus> on pcib4 uhci3: <Intel 82801I (ICH9) USB controller> port 0x4180-0x419f irq 20 at device 29.0 on pci0 uhci3: [ITHREAD] usbus4: <Intel 82801I (ICH9) USB controller> on uhci3 uhci4: <Intel 82801I (ICH9) USB controller> port 0x41a0-0x41bf irq 21 at device 29.1 on pci0 uhci4: [ITHREAD] usbus5: <Intel 82801I (ICH9) USB controller> on uhci4 uhci5: <Intel 82801I (ICH9) USB controller> port 0x41c0-0x41df irq 22 at device 29.2 on pci0 uhci5: [ITHREAD] usbus6: <Intel 82801I (ICH9) USB controller> on uhci5 ehci1: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xf01a5400-0xf01a57ff irq 20 at device 29.7 on pci0 ehci1: [ITHREAD] usbus7: EHCI version 1.0 usbus7: <Intel 82801I (ICH9) USB 2.0 controller> on ehci1 pcib5: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci7: <ACPI PCI bus> on pcib5 em1: <Intel(R) PRO/1000 Legacy Network Connection 1.0.3> port 0x2100-0x213f mem 0xf0200000-0xf021ffff irq 20 at device 4.0 on pci7 em1: [FILTER] em1: Ethernet address: 00:07:e9:10:d8:86 em2: <Intel(R) PRO/1000 Legacy Network Connection 1.0.3> port 0x2140-0x217f mem 0xf0220000-0xf023ffff irq 21 at device 4.1 on pci7 em2: [FILTER] em2: Ethernet address: 00:07:e9:10:d8:87 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel ICH9 SATA300 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x4200-0x420f,0x4210-0x421f irq 18 at device 31.2 on pci0 ata0: <ATA channel> at channel 0 on atapci0 ata0: [ITHREAD] ata1: <ATA channel> at channel 1 on atapci0 ata1: [ITHREAD] atapci1: <Intel ICH9 SATA300 controller> port 0x4258-0x425f,0x4270-0x4273,0x4260-0x4267,0x4274-0x4277,0x4220-0x422f,0x4230-0x423f irq 18 at device 31.5 on pci0 atapci1: [ITHREAD] ata2: <ATA channel> at channel 0 on atapci1 ata2: [ITHREAD] ata3: <ATA channel> at channel 1 on atapci1 ata3: [ITHREAD] acpi_button0: <Power Button> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] acpi_hpet1: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 device_attach: acpi_hpet1 attach returned 12 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 (noperiph:siisch0:0:-1:-1): rescan already queued (noperiph:siisch1:0:-1:-1): rescan already queued (noperiph:siisch2:0:-1:-1): rescan already queued (noperiph:siisch3:0:-1:-1): rescan already queued (noperiph:siisch4:0:-1:-1): rescan already queued ZFS filesystem version 5 ZFS storage pool version 28 Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0x168 offMax=0x40b usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen0.1: <Intel> at usbus0 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <Intel> at usbus2 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen3.1: <Intel> at usbus3 uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 ugen4.1: <Intel> at usbus4 uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <Intel> at usbus5 uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 ugen6.1: <Intel> at usbus6 uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6 ugen7.1: <Intel> at usbus7 uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 2 ports with 2 removable, self powered acd0: DVDR <HL-DT-ST DVD-RAM GSA-H60L/R90C> at ata1-master UDMA100 SATA 1.5Gb/s uhub3: 6 ports with 6 removable, self powered uhub7: 6 ports with 6 removable, self powered ugen7.2: <Generic> at usbus7 umass0: <Bulk-In, Bulk-Out, Interface> on usbus7 umass0: SCSI over Bulk-Only; quirks = 0x4000 umass0:7:0:-1: Attached to scbus7 acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sks=0x40 0x00 0x01 (probe1:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe1:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe1:umass-sim0:0:0:0): SCSI status: Check Condition (probe1:umass-sim0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) ugen1.2: <vendor 0x1241> at usbus1 ukbd0: <vendor 0x1241 USB Keyboard, class 0/0, rev 1.10/2.90, addr 2> on usbus1 kbd2 at ukbd0 uhid0: <vendor 0x1241 USB Keyboard, class 0/0, rev 1.10/2.90, addr 2> on usbus1 (probe0:umass-sim0:0:0:1): TEST UNIT READY. CDB: 0 20 0 0 0 0 (probe0:umass-sim0:0:0:1): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:1): SCSI status: Check Condition (probe0:umass-sim0:0:0:1): SCSI sense: NOT READY asc:3a,0 (Medium not present) (probe0:umass-sim0:0:0:2): TEST UNIT READY. CDB: 0 40 0 0 0 0 (probe0:umass-sim0:0:0:2): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:2): SCSI status: Check Condition (probe0:umass-sim0:0:0:2): SCSI sense: NOT READY asc:3a,0 (Medium not present) (probe0:umass-sim0:0:0:3): TEST UNIT READY. CDB: 0 60 0 0 0 0 (probe0:umass-sim0:0:0:3): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:3): SCSI status: Check Condition (probe0:umass-sim0:0:0:3): SCSI sense: NOT READY asc:3a,0 (Medium not present) ada0 at siisch0 bus 0 scbus0 target 0 lun 0 ada0: <WDC WD1002FBYS-02A6B0 03.00C06> ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1 at siisch1 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD1002FBYS-02A6B0 03.00C06> ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2 at siisch2 bus 0 scbus2 target 0 lun 0 ada2: <WDC WD1002FBYS-02A6B0 03.00C06> ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at siisch3 bus 0 scbus3 target 0 lun 0 ada3: <WDC WD1002FBYS-02A6B0 03.00C06> ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada4 at siisch4 bus 0 scbus4 target 0 lun 0 ada4: <OCZ-ONYX 1.6> ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) ada4: Command Queueing enabled ada4: 30533MB (62533296 512 byte sectors: 16H 63S/T 16383C) da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 da0: <Generic- Compact Flash 1.00> Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present cd0 at ata1 bus 0 scbus6 target 0 lun 0 cd0: <HL-DT-ST DVD-RAM GSA-H60L R90C> Removable CD-ROM SCSI-0 device cd0: 100.000MB/s transfers cd0: cd present [3008 x 2048 byte records] da1 at umass-sim0 bus 0 scbus7 target 0 lun 1 da1: <Generic- SM/xD-Picture 1.00> Removable Direct Access SCSI-0 device da1: 40.000MB/s transfers da1: Attempt to query device size failed: NOT READY, Medium not presentSMP: AP CPU #1 Launched! da2 at umass-sim0 bus 0 scbus7 target 0 lun 2 da2: <Generic- SD/MMC 1.00> Removable Direct Access SCSI-0 device da2: 40.000MB/s transfers da2: Attempt to query device size failed: NOT READY, Medium not present da3 at umass-sim0 bus 0 scbus7 target 0 lun 3 da3: <Generic- MS/MS-Pro 1.00> Removable Direct Access SCSI-0 device da3: 40.000MB/s transfers da3: Attempt to query device size failed: NOT READY, Medium not present Trying to mount root from ufs:/dev/gpt/skerry-root Setting hostuuid: 0071dfa5-eaab-11df-88e2-02dc1053ff3a. Setting hostid: 0xe54799ad. Entropy harvesting: interrupts ethernet point_to_point kickstart . Starting file system checks: /dev/gpt/skerry-root: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/gpt/skerry-root: clean, 7792879 free (65439 frags, 965930 blocks, 0.6% fragmentation) Mounting local file systems: