Lutz Schumann
2010-Jan-08 14:30 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Hello,
today I wanted to test that the failure of the L2ARC device is not crucial to
the pool. I added a Intel X25-M Postville (160GB) as cache device to a 54 disk
mittor pool. Then I startet a SYNC iozone on the pool:
iozone -ec -r 32k -s 2048m -l 2 -i 0 -i 2 -o
Pool:
pool
mirror-0
disk1
disk2
mirror-1
disk3
disk4
cache
intel-postville-ssd
Then I pulled the power cable of the SSD device (not the sata connector) and
from that moment on, al pool related commands hang. (e.g. zpool iostat -v)
I''ve waited 20 minutes now - still hangs :(
I can login to the system itself (after some time - so the whole is system is
sluggish), so the syspool (which is a seperate device) is ok. Release is
svn_104.
dmesg shows:
Jan 8 15:21:42 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 14,1/ide at 1/cmdk at 1,0 (Disk6):
Jan 8 15:21:42 nexenta Error for command ''write
sector'' Error Level: Informational
Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted
command
Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
Jan 8 15:21:47 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select
failed
Jan 8 15:21:47 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 14,1/ide at 1/cmdk at 0,0 (Disk5):
Jan 8 15:21:47 nexenta Error for command ''write
sector'' Error Level: Informational
Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted
command
Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
Jan 8 15:21:52 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select
failed
Jan 8 15:21:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 14,1/ide at 1 (ata9):
lspci:
00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0
port A)
00:05.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express
gpp port B)
00:06.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express
gpp port C)
00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express
gpp port F)
00:11.0 IDE interface: ATI Technologies Inc SB700/SB800 SATA Controller [IDE
mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE/7200
GS] (rev a1)
02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet
Controller (Copper) (rev 06)
03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet
Controller (Copper) (rev 06)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 02)
05:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
Controller (rev 05)
05:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000
Controller (PHY/Link)
Anyone seen something like this ?
Hardware is a standard Gigabyte Mainboard with on-Soard sata.
Regards,
--
This message posted from opensolaris.org
Lutz Schumann
2010-Jan-08 14:48 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok,
I now waited 30 minutes - still hung. After that I pulled the SATA cable to the
L2ARC device also - still no success (I waited 10 minutes).
After 10 minutes I put the L2ARC device back (SATA + Power)
20 seconds after that the system continues to run.
dmesg shows:
Jan 8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 14,1/ide at 1 (ata9):
Jan 8 15:41:57 nexenta timeout: early timeout, target=1 lun=0
Jan 8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 14,1/ide at 1/cmdk at 1,0 (Disk6):
Jan 8 15:41:57 nexenta Error for command ''write
sector'' Error Level: Informational
Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Sense Key: aborted
command
Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
Jan 8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD,
TYPE: Fault, VER: 1, SEVERITY: Major
Jan 8 15:42:01 nexenta EVENT-TIME: Fri Jan 8 15:41:59 CET 2010
Jan 8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta
Jan 8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan 8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e
Jan 8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS
device exceeded
Jan 8 15:42:01 nexenta acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-FD for more information.
Jan 8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked
as faulted. An attempt
Jan 8 15:42:01 nexenta will be made to activate a hot spare if
available.
Jan 8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan 8 15:42:01 nexenta REC-ACTION: Run ''zpool status -x'' and
replace the bad device.
Jan 8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD,
TYPE: Fault, VER: 1, SEVERITY: Major
Jan 8 15:42:13 nexenta EVENT-TIME: Fri Jan 8 15:42:12 CET 2010
Jan 8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta
Jan 8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan 8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06
Jan 8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS
device exceeded
Jan 8 15:42:13 nexenta acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-FD for more information.
Jan 8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked
as faulted. An attempt
Jan 8 15:42:13 nexenta will be made to activate a hot spare if
available.
Jan 8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan 8 15:42:13 nexenta REC-ACTION: Run ''zpool status -x'' and
replace the bad device.
.. the deivce is seen as faulted:
pool: data
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Fri Jan 8 15:42:03 2010
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
c3d0 ONLINE 0 0 0
c6d0 ONLINE 0 0 0 512 resilvered
mirror ONLINE 0 0 0
c3d1 ONLINE 0 0 0
c4d0 ONLINE 0 0 0
cache
c6d1 FAULTED 0 499 0 too many errors
.. however zpool iostat -v still shows the device ....
root at nexenta:/export/home/admin# zpool iostat -v 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
data 209G 1.61T 0 129 0 4.64M
mirror 104G 824G 0 64 0 2.34M
c3d0 - - 0 64 0 2.34M
c6d0 - - 0 64 0 2.34M
mirror 104G 824G 0 64 0 2.31M
c3d1 - - 0 64 0 2.31M
c4d0 - - 0 64 0 2.31M
cache - - - - - -
c6d1 137M 149G 0 0 0 0
---------- ----- ----- ----- ----- ----- -----
syspool 2.18G 462G 0 0 0 0
c4d1s0 2.18G 462G 0 0 0 0
---------- ----- ----- ----- ----- ----- -----
So this seems to be a hardware issue.
I would expect that there is some "general in kernel timeout" for
I/O''s so that strangly failing and not reacting device (and real
failures are like this) are killed.
Did I miss something ? Is there a tunable (/etc/system) ?
Thanks for your responses :)
--
This message posted from opensolaris.org
Lutz Schumann
2010-Jan-08 15:32 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok, after browsing I found that the sata disks are not shown via cfgadm. I found http://opensolaris.org/jive/message.jspa?messageID=287791&tstart=0 which states that you have to set the mode to "AHCI" to enable hot-plug etc. However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail. I looked in the BIOS and it seems the disks are in "IDE mode". There is a AHCI mode, however I dod not know if I can switch without reinstalling. Is it possible to set AHCI without reinstalling OSol ? Regards -- This message posted from opensolaris.org
Lutz Schumann
2010-Jan-09 16:12 UTC
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
I finally managed to resolve this. I received some useful info from Richard Elling (without List CC):>> (ME) However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail.>(Richard) Yes, this is a little bit odd. The sd driver should be in the stack abovethe IDE driver and the sd driver tends to manage timeouts as well. Could you send the "prtconf -D" output?>> (ME) prtconf -D:: >> >> System Configuration: Sun Microsystems i86pc >> Memory size: 8191 Megabytes >> System Peripherals (Software Nodes): >> >> i86pc (driver name: rootnex) >> scsi_vhci, instance #0 (driver name: scsi_vhci) >> isa, instance #0 (driver name: isa) >> asy, instance #0 (driver name: asy) >> lp, instance #0 (driver name: ecpp) >> i8042, instance #0 (driver name: i8042) >> keyboard, instance #0 (driver name: kb8042) >> motherboard >> pit_beep, instance #0 (driver name: pit_beep) >> pci, instance #0 (driver name: npe) >> pci1002,5957 >> pci1002,5978, instance #0 (driver name: pcie_pci) >> display, instance #1 (driver name: vgatext) >> pci1002,597b, instance #1 (driver name: pcie_pci) >> pci8086,1083, instance #1 (driver name: e1000g) >> pci1002,597c, instance #2 (driver name: pcie_pci) >> pci8086,1083, instance #2 (driver name: e1000g) >> pci1002,597f, instance #3 (driver name: pcie_pci) >> pci1458,e000 (driver name: gani) >> pci-ide, instance #3 (driver name: pci-ide) >> ide, instance #6 (driver name: ata) >> cmdk, instance #1 (driver name: cmdk)> (Richard) Here is where you see the driver stack. Inverted it looks like:> cmdk > ata > pci-ide > npe>I/O from the file system will go directly to the cmdk driver. >I''m not familiar with that driver, mostly because if you change >to AHCI, then you will see something more like: > > sd > ahci > pci-ide > npe>The important detail to remember is that ZFS does not have any >timeouts. It will patiently wait for a response from cmdk or sd. >The cmdk and sd drivers manage timeouts and retries farther >down the stack.> For sd, I know that disk selection errors are propagated quickly.>> (ME) Is it possible to set AHCI without reinstalling OSol ?> (Richard) Yes. But you might need to re-import the non-syspool pools manually.---- OK so I wanted to switch from IDE to AHCI while keeping my Installation and Test again. When setting the Mode for my IDE devices to AHCI in the BIOS, the machine paniced with "Error could not import root volume: error 19" in Grub. So the machine could not boot. Afer some googeling I found: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6795637 ( Implement full root disk portability, boot must not be dependent on install hardware config) and http://defect.opensolaris.org/bz/show_bug.cgi?id=5785 (guide how to change the boot device for P2V, which is actually similar) So I did as described in the guide. Maybe this is of use for someone else finding this. --- Overview: My Storage Server''s disk mode was set to IDE (If your server is set to SATA or IDE can be tested with cfgadm. If the devices are not shown, you are in IDE mode). To Enable AHCI / SATA mode for you drives, you have to go to the BIOS and set the mode to AHCI. However, after you have done this - your machine will (may?) not boot anymore. You will get a Panic after GRUB saying "cannot mount rootfs" (actally this screen is there only VERY SHORT. To actually see it add "-k -v" to the grub boot options and you will fall into the debugger to read the message) IDE MODE: * NO hot plug * The system hangs 100% if a cache or non device is removed (see thread above) * NO NCQ available AHCI Mode: * Full support for NCQ (?) * Full support for Hot Plug (devices shown via cfgadm as sata/X:disk) To switch from IDE mode to AHCI for a running installation of NexentaStor I did the following: * Create a checkpoint just to be sure * node (write down) which checkpoint is the safety checkpoint you just created * note (write down) which checkpoint is currently bootet * export your data volumes * reboot * Enter BIOS and set mode to AHCI * Boot rescue CD (USB CDROM not working, must be IDE, PXE maybe later added) * In the Rescue CD do (login root / passwd empty): o mkdir -mnt o zfs import -f syspool o mount -F zfs syspool/rootfs-nms-XXX /mnt (this is the active snapshot / clone you are booting normally, not the rescue checkpoint you created) o mv /mnt/etc/path_to_inst /mnt/etc/path_to_inst.ORG o touch /mnt/etc/path_to_inst o devfsadm -C -r /mnt o devfsadm -c disk -r /mnt o devfsadm -i e1000g -r /mnt o cp -a /mnt/etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache.ORG o cp -a /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache o touch /mnt/reconfigure o bootadm update-archive -v -R /mnt o umount /mnt o sync o reboot * Now your system should come up. * Verify that your SATA drives can be seen with cfgadm (cfgadm should list sata/X:disk) After doing this I tested the PULL of the power cable of the L2ARC SSD again. No hang here and the device was detected as failed "immediatly". I could also reenable the device by removing it, adding the power again, "cfgadm -c configure devicename" and zpool grow -- Consclusion (this is build 104): - do not use IDE :) - ZFS does not have timeouts for commands but relies on the hardware layer (as with cache flush command) - switching from IDE 2 AHCI requires some manual steps Regards, Robert p.s. Thanks Richard for the tips. -- This message posted from opensolaris.org