Lutz Schumann
2010-Jan-08 14:30 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Hello, today I wanted to test that the failure of the L2ARC device is not crucial to the pool. I added a Intel X25-M Postville (160GB) as cache device to a 54 disk mittor pool. Then I startet a SYNC iozone on the pool: iozone -ec -r 32k -s 2048m -l 2 -i 0 -i 2 -o Pool: pool mirror-0 disk1 disk2 mirror-1 disk3 disk4 cache intel-postville-ssd Then I pulled the power cable of the SSD device (not the sata connector) and from that moment on, al pool related commands hang. (e.g. zpool iostat -v) I''ve waited 20 minutes now - still hangs :( I can login to the system itself (after some time - so the whole is system is sluggish), so the syspool (which is a seperate device) is ok. Release is svn_104. dmesg shows: Jan 8 15:21:42 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 14,1/ide at 1/cmdk at 1,0 (Disk6): Jan 8 15:21:42 nexenta Error for command ''write sector'' Error Level: Informational Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted command Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 Jan 8 15:21:47 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select failed Jan 8 15:21:47 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 14,1/ide at 1/cmdk at 0,0 (Disk5): Jan 8 15:21:47 nexenta Error for command ''write sector'' Error Level: Informational Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted command Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 Jan 8 15:21:52 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select failed Jan 8 15:21:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 14,1/ide at 1 (ata9): lspci: 00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge 00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A) 00:05.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port B) 00:06.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port C) 00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port F) 00:11.0 IDE interface: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE/7200 GS] (rev a1) 02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) 05:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) 05:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) Anyone seen something like this ? Hardware is a standard Gigabyte Mainboard with on-Soard sata. Regards, -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-08 14:48 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok, I now waited 30 minutes - still hung. After that I pulled the SATA cable to the L2ARC device also - still no success (I waited 10 minutes). After 10 minutes I put the L2ARC device back (SATA + Power) 20 seconds after that the system continues to run. dmesg shows: Jan 8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 14,1/ide at 1 (ata9): Jan 8 15:41:57 nexenta timeout: early timeout, target=1 lun=0 Jan 8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 14,1/ide at 1/cmdk at 1,0 (Disk6): Jan 8 15:41:57 nexenta Error for command ''write sector'' Error Level: Informational Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted command Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 Jan 8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:01 nexenta EVENT-TIME: Fri Jan 8 15:41:59 CET 2010 Jan 8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e Jan 8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:01 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:01 nexenta will be made to activate a hot spare if available. Jan 8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:01 nexenta REC-ACTION: Run ''zpool status -x'' and replace the bad device. Jan 8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:13 nexenta EVENT-TIME: Fri Jan 8 15:42:12 CET 2010 Jan 8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06 Jan 8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:13 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:13 nexenta will be made to activate a hot spare if available. Jan 8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:13 nexenta REC-ACTION: Run ''zpool status -x'' and replace the bad device. .. the deivce is seen as faulted: pool: data state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Jan 8 15:42:03 2010 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 512 resilvered mirror ONLINE 0 0 0 c3d1 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 cache c6d1 FAULTED 0 499 0 too many errors .. however zpool iostat -v still shows the device .... root at nexenta:/export/home/admin# zpool iostat -v 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- data 209G 1.61T 0 129 0 4.64M mirror 104G 824G 0 64 0 2.34M c3d0 - - 0 64 0 2.34M c6d0 - - 0 64 0 2.34M mirror 104G 824G 0 64 0 2.31M c3d1 - - 0 64 0 2.31M c4d0 - - 0 64 0 2.31M cache - - - - - - c6d1 137M 149G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- syspool 2.18G 462G 0 0 0 0 c4d1s0 2.18G 462G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- So this seems to be a hardware issue. I would expect that there is some "general in kernel timeout" for I/O''s so that strangly failing and not reacting device (and real failures are like this) are killed. Did I miss something ? Is there a tunable (/etc/system) ? Thanks for your responses :) -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-08 15:32 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok, after browsing I found that the sata disks are not shown via cfgadm. I found http://opensolaris.org/jive/message.jspa?messageID=287791&tstart=0 which states that you have to set the mode to "AHCI" to enable hot-plug etc. However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail. I looked in the BIOS and it seems the disks are in "IDE mode". There is a AHCI mode, however I dod not know if I can switch without reinstalling. Is it possible to set AHCI without reinstalling OSol ? Regards -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-09 16:12 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
I finally managed to resolve this. I received some useful info from Richard Elling (without List CC):>> (ME) However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail.>(Richard) Yes, this is a little bit odd. The sd driver should be in the stack abovethe IDE driver and the sd driver tends to manage timeouts as well. Could you send the "prtconf -D" output?>> (ME) prtconf -D:: >> >> System Configuration: Sun Microsystems i86pc >> Memory size: 8191 Megabytes >> System Peripherals (Software Nodes): >> >> i86pc (driver name: rootnex) >> scsi_vhci, instance #0 (driver name: scsi_vhci) >> isa, instance #0 (driver name: isa) >> asy, instance #0 (driver name: asy) >> lp, instance #0 (driver name: ecpp) >> i8042, instance #0 (driver name: i8042) >> keyboard, instance #0 (driver name: kb8042) >> motherboard >> pit_beep, instance #0 (driver name: pit_beep) >> pci, instance #0 (driver name: npe) >> pci1002,5957 >> pci1002,5978, instance #0 (driver name: pcie_pci) >> display, instance #1 (driver name: vgatext) >> pci1002,597b, instance #1 (driver name: pcie_pci) >> pci8086,1083, instance #1 (driver name: e1000g) >> pci1002,597c, instance #2 (driver name: pcie_pci) >> pci8086,1083, instance #2 (driver name: e1000g) >> pci1002,597f, instance #3 (driver name: pcie_pci) >> pci1458,e000 (driver name: gani) >> pci-ide, instance #3 (driver name: pci-ide) >> ide, instance #6 (driver name: ata) >> cmdk, instance #1 (driver name: cmdk)> (Richard) Here is where you see the driver stack. Inverted it looks like:> cmdk > ata > pci-ide > npe>I/O from the file system will go directly to the cmdk driver. >I''m not familiar with that driver, mostly because if you change >to AHCI, then you will see something more like: > > sd > ahci > pci-ide > npe>The important detail to remember is that ZFS does not have any >timeouts. It will patiently wait for a response from cmdk or sd. >The cmdk and sd drivers manage timeouts and retries farther >down the stack.> For sd, I know that disk selection errors are propagated quickly.>> (ME) Is it possible to set AHCI without reinstalling OSol ?> (Richard) Yes. But you might need to re-import the non-syspool pools manually.---- OK so I wanted to switch from IDE to AHCI while keeping my Installation and Test again. When setting the Mode for my IDE devices to AHCI in the BIOS, the machine paniced with "Error could not import root volume: error 19" in Grub. So the machine could not boot. Afer some googeling I found: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6795637 ( Implement full root disk portability, boot must not be dependent on install hardware config) and http://defect.opensolaris.org/bz/show_bug.cgi?id=5785 (guide how to change the boot device for P2V, which is actually similar) So I did as described in the guide. Maybe this is of use for someone else finding this. --- Overview: My Storage Server''s disk mode was set to IDE (If your server is set to SATA or IDE can be tested with cfgadm. If the devices are not shown, you are in IDE mode). To Enable AHCI / SATA mode for you drives, you have to go to the BIOS and set the mode to AHCI. However, after you have done this - your machine will (may?) not boot anymore. You will get a Panic after GRUB saying "cannot mount rootfs" (actally this screen is there only VERY SHORT. To actually see it add "-k -v" to the grub boot options and you will fall into the debugger to read the message) IDE MODE: * NO hot plug * The system hangs 100% if a cache or non device is removed (see thread above) * NO NCQ available AHCI Mode: * Full support for NCQ (?) * Full support for Hot Plug (devices shown via cfgadm as sata/X:disk) To switch from IDE mode to AHCI for a running installation of NexentaStor I did the following: * Create a checkpoint just to be sure * node (write down) which checkpoint is the safety checkpoint you just created * note (write down) which checkpoint is currently bootet * export your data volumes * reboot * Enter BIOS and set mode to AHCI * Boot rescue CD (USB CDROM not working, must be IDE, PXE maybe later added) * In the Rescue CD do (login root / passwd empty): o mkdir -mnt o zfs import -f syspool o mount -F zfs syspool/rootfs-nms-XXX /mnt (this is the active snapshot / clone you are booting normally, not the rescue checkpoint you created) o mv /mnt/etc/path_to_inst /mnt/etc/path_to_inst.ORG o touch /mnt/etc/path_to_inst o devfsadm -C -r /mnt o devfsadm -c disk -r /mnt o devfsadm -i e1000g -r /mnt o cp -a /mnt/etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache.ORG o cp -a /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache o touch /mnt/reconfigure o bootadm update-archive -v -R /mnt o umount /mnt o sync o reboot * Now your system should come up. * Verify that your SATA drives can be seen with cfgadm (cfgadm should list sata/X:disk) After doing this I tested the PULL of the power cable of the L2ARC SSD again. No hang here and the device was detected as failed "immediatly". I could also reenable the device by removing it, adding the power again, "cfgadm -c configure devicename" and zpool grow -- Consclusion (this is build 104): - do not use IDE :) - ZFS does not have timeouts for commands but relies on the hardware layer (as with cache flush command) - switching from IDE 2 AHCI requires some manual steps Regards, Robert p.s. Thanks Richard for the tips. -- This message posted from opensolaris.org