Chris Baker
2009-Aug-04 16:54 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Hi I''m running an application which is using hot plug sata drives as giant removable usb keys but bigger and with SATA performance. I''m using ?cfgadm connect? then ?configure? then ?zpool import? to bring a drive on-line and export / unconfigure / disconnect before unplugging. All works well. What I can''t guarantee is that one of my users won''t one day just yank the drive without running the offline sequence. In testing for that case I am finding that the system runs fine until a command or subsystem tries to write to the drive and then that command and that subsystem locks up hard. The big problem then becomes if I try a zfs or zpool command to attempt recovery I then lose zfs/zpool access to all pools in the system and not just the damaged one. Specifically - in testing: Just one single drive with s0 mounted and then yanked: - zpool status ? I have seen either the results show the pool online and no errors or a lock up of zpool. - I can cd into and ls the missing directory but if I try and write anything my shell locks up hard - I try a zfs unmount -f and that locks hard plus I can now no longer run zfs anything - I try a zpool export -f and that locks plus I can now no longer run zpool anything - Even a simple zfs list can lock up zfs commands Rest of the system continues ticking over but I have now lost access to basic admin commands and I can''t find a recovery plan short of a reboot. I''ve tried "zpool set failmode=continue" with no luck. I tried adding a ZIL, no luck. I can''t kill the locked processes. I''m guessing zfs is waiting for the drive to come back online to safely store the write-in-flight - reconnecting the drive makes some of the locked processes killable, not all, and running zpool/zfs anything locks up again. To be clear - the rest of the system working with different data pools keeps running fine. I don''t mind data loss on the yanked disk - that would be the user''s own stupid fault, but I can''t accept the risk of locking up zpool/zfs control of the rest of the system. Trying the same tests with a UFS removable disk and the processes are interruptible so I could live with zfs internal/ufs removable but it seems to be significantly slower plus I was hoping for the integrity benefits of zfs. Any thoughts on how to stabilise the OS without a reboot? Thanks Chris -- This message posted from opensolaris.org
Ross
2009-Aug-04 17:36 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
What version of Solaris / OpenSolaris are you running there? I remember zfs commands locking up being a big problem a while ago, but I thought they''d managed to solve the issues like this. -- This message posted from opensolaris.org
Chris Baker
2009-Aug-04 18:35 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Apologies - I''m daft for not saying originally: OpenSolaris 2009.06 on x86 Cheers Chris -- This message posted from opensolaris.org
roland
2009-Aug-04 18:54 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
what exact type of sata controller do you use? -- This message posted from opensolaris.org
Chris Baker
2009-Aug-04 19:44 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
It''s a generic Sil3132 based PCIe x1 card using the si3124 driver. Prior to this I had been using Intel ICH10R with AHCI but I have found the Sil3132 actually hot plugs a little smoother than the Intel chipset. I have not gone back to recheck this specific problem on the ICH10R (though I can), I had been quite happy with the Sil up to this point. Kind regards Chris -- This message posted from opensolaris.org
Chris Baker
2009-Aug-05 03:56 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Ok - in an attempt to weasel my way past the issue I mirrored my problematic si3124 drive to a second drive on the ICH10R, started writing to the file system and then killed the power to the si3124 removable drive. To my (unfortunate) surprise, the IO stream that was writing to the mirrored filesystem just hung. I can still zpool status, zfs list, but the process that was writing has hung and the zpool iostat that was running in another window has also hung. dmesg shows the kernel noticed the sata disconnect ok and cfgadm shows the sata port as empty. zpool status shows both drives online and no errors. Now I''m worried my mirror protection isn''t quite as solid as I thought too. Anyone any ideas? Cheers Chris -- This message posted from opensolaris.org
Ross
2009-Aug-05 05:00 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Whether ZFS properly detects device removal depends to a large extent on the device drivers for the controller. I personally have stuck to using controllers with chipsets I know Sun use on their own servers, but even then I''ve been bitten by similar problems to yours on the AOC-SAT2-MV8 cards. The LSI 1068 based cards seem to be the most stable, but I haven''t been fortunate enough to test them myself yet. I''ve been saying for ages that ZFS needs its own timeouts to detect when a drive has gone in a redundant pool, but Sun don''t seem to agree that it''s needed. They seem happy to have ZFS working on their own kit, and hanging for others. -- This message posted from opensolaris.org
Sanjeev
2009-Aug-05 05:45 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Chris, Can you please check the failmode property of the pool ? -- zpool get failmode <poolname> If it is set to "wait", you could try setting it to "continue". Regards, Sanjeev On Tue, Aug 04, 2009 at 08:56:03PM -0700, Chris Baker wrote:> Ok - in an attempt to weasel my way past the issue I mirrored my problematic si3124 drive to a second drive on the ICH10R, started writing to the file system and then killed the power to the si3124 removable drive. > > To my (unfortunate) surprise, the IO stream that was writing to the mirrored filesystem just hung. I can still zpool status, zfs list, but the process that was writing has hung and the zpool iostat that was running in another window has also hung. > > dmesg shows the kernel noticed the sata disconnect ok and cfgadm shows the sata port as empty. zpool status shows both drives online and no errors. > > Now I''m worried my mirror protection isn''t quite as solid as I thought too. > > Anyone any ideas? > > Cheers > > Chris > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India
Chris Baker
2009-Aug-05 12:33 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Sanjeev Thanks for taking an interest. Unfortunately I did have failmode=continue, but I have just destroyed/recreated and double confirmed and got exactly the same results. zpool status shows both drives mirror, ONLINE, no errors dmesg shows: SATA device detached at port 0 cfgadm shows: sata-port empty unconfigured The IO process has just hung. It seems to me that zfs thinks it has a drive with a really long response time rather than a dead drive so no failmode processing, no mirror resilience etc. Clearly something has been reported back to the kernel re the port going dead but whether that came from the driver or not I wouldn''t know. KInd regards Chris -- This message posted from opensolaris.org
Ross
2009-Aug-05 13:45 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Just a thought, but how long have you left it? I had problems with a failing drive a while back which did eventually get taken offline, but took about 20 minutes to do so. -- This message posted from opensolaris.org
Chris Baker
2009-Aug-05 15:24 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
I''ve left it hanging about 2 hours. I''ve also just learned that whatever the issue is it is also blocking an "init 5" shutdown. I was thinking about setting a watchdog with a forced reboot but that will get me nowhere if I need I reset button restart. Thanks for the advice re the LSI 1068, not exactly what I was hoping to hear but very good info all the same. KInd regards Chris -- This message posted from opensolaris.org
Ross
2009-Aug-05 17:17 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Yeah, sounds just like the issues I''ve seen before. I don''t think you''re likely to see a fix anytime soon, but the good news is that so far I''ve not seen anybody reporting problems with LSI 1068 based cards (and I''ve been watching for a while). With the 1068 being used in the x4540 Thumper 2, I''d expect it to have pretty solid drivers :) -- This message posted from opensolaris.org
roland
2009-Aug-05 18:43 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
doesn?t solaris have the great builtin dtrace for issues like these ? if we knew in which syscall or kernel-thread the system is stuck, we may get a clue... unfortunately, i don?t have any real knowledge of solaris kernel internals or dtrace... -- This message posted from opensolaris.org
Sanjeev
2009-Aug-06 05:29 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Chris, On Wed, Aug 05, 2009 at 05:33:24AM -0700, Chris Baker wrote:> Sanjeev > > Thanks for taking an interest. Unfortunately I did have failmode=continue, but I have just destroyed/recreated and double confirmed and got exactly the same results. > > zpool status shows both drives mirror, ONLINE, no errors > > dmesg shows: > > SATA device detached at port 0 > > cfgadm shows: > > sata-port empty unconfigured > > The IO process has just hung. > > It seems to me that zfs thinks it has a drive with a really long response time rather than a dead drive so no failmode processing, no mirror resilience etc. Clearly something has been reported back to the kernel re the port going dead but whether that came from the driver or not I wouldn''t know.Would it be possible for you to take a crashdump of the machine and point me to it. We could try looking at where things are stuck. Thanks and regards, Sanjeev -- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India
Chris Baker
2009-Aug-10 00:53 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Hi Sanjeev OK - had a chance to do more testing over the weekend. Firstly some extra data: Moving the mirror to both drives on ICH10R ports and on sudden disk power-off the mirror faulted cleanly to the remaining drive no problem. Having a one drive pool on the ICH10R under heavy write traffic and then powered off causes the zpool/zfs hangs described above. ZPool being tested is called "Remove" and consists of: c7t2d0s0 - attached to the ICH10R c8t0d0s0 - second disk attached to the Si3132 card with the Si3124 driver This leads me to the following suspicions: (1) We have an Si3124 issue in not detecting the drive removal always, or of failing to pass that info back to ZFS, even though we know the kernel noticed (2) In the event that the only disk in a pool goes faulted, the zpool/zfs subsystem will block indefinitely waiting to get rid of the pending writes. I''ve just recabled back to one disk on ICH10R and one on Si3132 and tried the sudden off with the Si drive: *) First try - mirror faulted and IO continued - good news but confusing *) Second try - zfs/zpool hung, couldn''t even get a zpool status, tried a savecore but savecore hung moving the data to a seperate zpool *) Third try - zfs/zpool hung, ran savecore -L to a UFS filesystem I created for the that purpose After the first try, dmesg shows: Aug 10 00:34:41 TS1 SATA device detected at port 0 Aug 10 00:34:41 TS1 sata: [ID 663010 kern.info] /pci at 0,0/pci8086,3a46 at 1c,3/pci1095,7132 at 0 : Aug 10 00:34:41 TS1 sata: [ID 761595 kern.info] SATA disk device at port 0 Aug 10 00:34:41 TS1 sata: [ID 846691 kern.info] model WDC WD5000AACS-00ZUB0 Aug 10 00:34:41 TS1 sata: [ID 693010 kern.info] firmware 01.01B01 Aug 10 00:34:41 TS1 sata: [ID 163988 kern.info] serial number WD-xxxxxxxxxxxxxx Aug 10 00:34:41 TS1 sata: [ID 594940 kern.info] supported features: Aug 10 00:34:41 TS1 sata: [ID 981177 kern.info] 48-bit LBA, DMA, Native Command Queueing, SMART, SMART self-test Aug 10 00:34:41 TS1 sata: [ID 643337 kern.info] SATA Gen2 signaling speed (3.0Gbps) Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] Supported queue depth 32, limited to 31 Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] capacity = 976773168 sectors Aug 10 00:34:41 TS1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Aug 10 00:34:41 TS1 EVENT-TIME: Mon Aug 10 00:34:41 BST 2009 Aug 10 00:34:41 TS1 PLATFORM: , CSN: , HOSTNAME: TS1 Aug 10 00:34:41 TS1 SOURCE: zfs-diagnosis, REV: 1.0 Aug 10 00:34:41 TS1 EVENT-ID: ab7df266-3380-4a35-e0bc-9056878fd182 Aug 10 00:34:41 TS1 DESC: The number of I/O errors associated with a ZFS device exceeded Aug 10 00:34:41 TS1 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Aug 10 00:34:41 TS1 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Aug 10 00:34:41 TS1 will be made to activate a hot spare if available. Aug 10 00:34:41 TS1 IMPACT: Fault tolerance of the pool may be compromised. Aug 10 00:34:41 TS1 REC-ACTION: Run ''zpool status -x'' and replace the bad device. and after the second and third test, just: SATA device detached at port 0 Core files were tar-ed together and bzip2-ed and can be found at: http://dl.getdropbox.com/u/1709454/dump.bakerci.200908100106.tar.bz2 Please let me know if you need any further core/debug. Apologies to readers having all this inflicted by email digest. Many thanks Chris -- This message posted from opensolaris.org
Sanjeev
2009-Aug-10 05:46 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Chris, Thanks for providing the details and the dump. I shall look into this and update with my findings. Thanks and regards, Sanjeev On Sun, Aug 09, 2009 at 05:53:12PM -0700, Chris Baker wrote:> Hi Sanjeev > > OK - had a chance to do more testing over the weekend. Firstly some extra data: > > Moving the mirror to both drives on ICH10R ports and on sudden disk power-off the mirror faulted cleanly to the remaining drive no problem. > > Having a one drive pool on the ICH10R under heavy write traffic and then powered off causes the zpool/zfs hangs described above. > > ZPool being tested is called "Remove" and consists of: > c7t2d0s0 - attached to the ICH10R > c8t0d0s0 - second disk attached to the Si3132 card with the Si3124 driver > > This leads me to the following suspicions: > (1) We have an Si3124 issue in not detecting the drive removal always, or of failing to pass that info back to ZFS, even though we know the kernel noticed > (2) In the event that the only disk in a pool goes faulted, the zpool/zfs subsystem will block indefinitely waiting to get rid of the pending writes. > > I''ve just recabled back to one disk on ICH10R and one on Si3132 and tried the sudden off with the Si drive: > > *) First try - mirror faulted and IO continued - good news but confusing > *) Second try - zfs/zpool hung, couldn''t even get a zpool status, tried a savecore but savecore hung moving the data to a seperate zpool > *) Third try - zfs/zpool hung, ran savecore -L to a UFS filesystem I created for the that purpose > > After the first try, dmesg shows: > Aug 10 00:34:41 TS1 SATA device detected at port 0 > Aug 10 00:34:41 TS1 sata: [ID 663010 kern.info] /pci at 0,0/pci8086,3a46 at 1c,3/pci1095,7132 at 0 : > Aug 10 00:34:41 TS1 sata: [ID 761595 kern.info] SATA disk device at port 0 > Aug 10 00:34:41 TS1 sata: [ID 846691 kern.info] model WDC WD5000AACS-00ZUB0 > Aug 10 00:34:41 TS1 sata: [ID 693010 kern.info] firmware 01.01B01 > Aug 10 00:34:41 TS1 sata: [ID 163988 kern.info] serial number WD-xxxxxxxxxxxxxx > Aug 10 00:34:41 TS1 sata: [ID 594940 kern.info] supported features: > Aug 10 00:34:41 TS1 sata: [ID 981177 kern.info] 48-bit LBA, DMA, Native Command Queueing, SMART, SMART self-test > Aug 10 00:34:41 TS1 sata: [ID 643337 kern.info] SATA Gen2 signaling speed (3.0Gbps) > Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] Supported queue depth 32, limited to 31 > Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] capacity = 976773168 sectors > Aug 10 00:34:41 TS1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major > Aug 10 00:34:41 TS1 EVENT-TIME: Mon Aug 10 00:34:41 BST 2009 > Aug 10 00:34:41 TS1 PLATFORM: , CSN: , HOSTNAME: TS1 > Aug 10 00:34:41 TS1 SOURCE: zfs-diagnosis, REV: 1.0 > Aug 10 00:34:41 TS1 EVENT-ID: ab7df266-3380-4a35-e0bc-9056878fd182 > Aug 10 00:34:41 TS1 DESC: The number of I/O errors associated with a ZFS device exceeded > Aug 10 00:34:41 TS1 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. > Aug 10 00:34:41 TS1 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt > Aug 10 00:34:41 TS1 will be made to activate a hot spare if available. > Aug 10 00:34:41 TS1 IMPACT: Fault tolerance of the pool may be compromised. > Aug 10 00:34:41 TS1 REC-ACTION: Run ''zpool status -x'' and replace the bad device. > > and after the second and third test, just: > SATA device detached at port 0 > > Core files were tar-ed together and bzip2-ed and can be found at: > > http://dl.getdropbox.com/u/1709454/dump.bakerci.200908100106.tar.bz2 > > Please let me know if you need any further core/debug. Apologies to readers having all this inflicted by email digest. > > Many thanks > > Chris > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India
Sanjeev
2009-Aug-11 09:16 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
Hi Chris, On Sun, Aug 09, 2009 at 05:53:12PM -0700, Chris Baker wrote:> OK - had a chance to do more testing over the weekend. Firstly some extra data: > > Moving the mirror to both drives on ICH10R ports and on sudden disk power-off the mirror faulted cleanly to the remaining drive no problem. > > Having a one drive pool on the ICH10R under heavy write traffic and then powered off causes the zpool/zfs hangs described above. > > ZPool being tested is called "Remove" and consists of: > c7t2d0s0 - attached to the ICH10R > c8t0d0s0 - second disk attached to the Si3132 card with the Si3124 driver > > This leads me to the following suspicions: > (1) We have an Si3124 issue in not detecting the drive removal always, or of failing to pass that info back to ZFS, even though we know the kernel noticed > (2) In the event that the only disk in a pool goes faulted, the zpool/zfs subsystem will block indefinitely waiting to get rid of the pending writes. > > I''ve just recabled back to one disk on ICH10R and one on Si3132 and tried the sudden off with the Si drive: > > *) First try - mirror faulted and IO continued - good news but confusing > *) Second try - zfs/zpool hung, couldn''t even get a zpool status, tried a savecore but savecore hung moving the data to a seperate zpool > *) Third try - zfs/zpool hung, ran savecore -L to a UFS filesystem I created for the that purpose > > After the first try, dmesg shows: > Aug 10 00:34:41 TS1 SATA device detected at port 0 > Aug 10 00:34:41 TS1 sata: [ID 663010 kern.info] /pci at 0,0/pci8086,3a46 at 1c,3/pci1095,7132 at 0 : > Aug 10 00:34:41 TS1 sata: [ID 761595 kern.info] SATA disk device at port 0 > Aug 10 00:34:41 TS1 sata: [ID 846691 kern.info] model WDC WD5000AACS-00ZUB0 > Aug 10 00:34:41 TS1 sata: [ID 693010 kern.info] firmware 01.01B01 > Aug 10 00:34:41 TS1 sata: [ID 163988 kern.info] serial number WD-xxxxxxxxxxxxxx > Aug 10 00:34:41 TS1 sata: [ID 594940 kern.info] supported features: > Aug 10 00:34:41 TS1 sata: [ID 981177 kern.info] 48-bit LBA, DMA, Native Command Queueing, SMART, SMART self-test > Aug 10 00:34:41 TS1 sata: [ID 643337 kern.info] SATA Gen2 signaling speed (3.0Gbps) > Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] Supported queue depth 32, limited to 31 > Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info] capacity = 976773168 sectors > Aug 10 00:34:41 TS1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major > Aug 10 00:34:41 TS1 EVENT-TIME: Mon Aug 10 00:34:41 BST 2009 > Aug 10 00:34:41 TS1 PLATFORM: , CSN: , HOSTNAME: TS1 > Aug 10 00:34:41 TS1 SOURCE: zfs-diagnosis, REV: 1.0 > Aug 10 00:34:41 TS1 EVENT-ID: ab7df266-3380-4a35-e0bc-9056878fd182 > Aug 10 00:34:41 TS1 DESC: The number of I/O errors associated with a ZFS device exceeded > Aug 10 00:34:41 TS1 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. > Aug 10 00:34:41 TS1 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt > Aug 10 00:34:41 TS1 will be made to activate a hot spare if available. > Aug 10 00:34:41 TS1 IMPACT: Fault tolerance of the pool may be compromised. > Aug 10 00:34:41 TS1 REC-ACTION: Run ''zpool status -x'' and replace the bad device. > > and after the second and third test, just: > SATA device detached at port 0 > > Core files were tar-ed together and bzip2-ed and can be found at: > > http://dl.getdropbox.com/u/1709454/dump.bakerci.200908100106.tar.bz2 > > Please let me know if you need any further core/debug. Apologies to readers having all this inflicted by email digest.Spent some time analysing the dump and I find that ZFS does not know that the disk is dead. There are about 1900 WRITE requests pending on that disk (c8t0d0s0). Attached are the details. Let me know what you find from fmdump. I suspect that this has got to do with support for the card. Hope that helps. Regards, Sanjeev -- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India -------------- next part -------------- The pool in question "remove" -- snip -- ZFS spa @ 0xffffff01c77b9800 Pool name: remove State: ACTIVE VDEV Address State Aux Description 0xffffff01c9faec80 HEALTHY - root VDEV Address State Aux Description 0xffffff01c9faf2c0 HEALTHY - mirror VDEV Address State Aux Description 0xffffff01d4099940 HEALTHY - /dev/dsk/c7t2d0s0 VDEV Address State Aux Description 0xffffff01d4099300 HEALTHY - /dev/dsk/c8t0d0s0 -- snip -- Obviously, the status for c8t0d0s0 is wrong because, it should have marked it dead. Looking at the threads we have spa_sync() waiting for an IO to complete : -- snip --> ffffff0008678c60::findstack -vstack pointer for thread ffffff0008678c60: ffffff0008678a00 [ ffffff0008678a00 _resume_from_idle+0xf1() ] ffffff0008678a30 swtch+0x147() ffffff0008678a60 cv_wait+0x61(ffffff01db0e9bc0, ffffff01db0e9bb8) ffffff0008678aa0 zio_wait+0x5d(ffffff01db0e9900) ffffff0008678b10 dsl_pool_sync+0xe1(ffffff01cedac300, d9) ffffff0008678ba0 spa_sync+0x32a(ffffff01c77b9800, d9) ffffff0008678c40 txg_sync_thread+0x265(ffffff01cedac300) ffffff0008678c50 thread_start+8() -- snip -- There are 1921 write IOs on the write-queue for the vdev (c8t0d0s0) : -- snip --> 0xffffff01d4099300::print -at struct vdev vdev_queue{ ffffff01d4099790 avl_tree_t vdev_queue.vq_deadline_tree = { ffffff01d4099790 struct avl_node *avl_root = 0xffffff01db8ae600 ffffff01d4099798 int (*)() avl_compar = vdev_queue_deadline_compare ffffff01d40997a0 size_t avl_offset = 0x220 ffffff01d40997a8 ulong_t avl_numnodes = 0x781 ffffff01d40997b0 size_t avl_size = 0x2d0 } ffffff01d40997b8 avl_tree_t vdev_queue.vq_read_tree = { ffffff01d40997b8 struct avl_node *avl_root = 0 ffffff01d40997c0 int (*)() avl_compar = vdev_queue_offset_compare ffffff01d40997c8 size_t avl_offset = 0x208 ffffff01d40997d0 ulong_t avl_numnodes = 0 ffffff01d40997d8 size_t avl_size = 0x2d0 } ffffff01d40997e0 avl_tree_t vdev_queue.vq_write_tree = { ffffff01d40997e0 struct avl_node *avl_root = 0xffffff01db8ae5e8 ffffff01d40997e8 int (*)() avl_compar = vdev_queue_offset_compare ffffff01d40997f0 size_t avl_offset = 0x208 ffffff01d40997f8 ulong_t avl_numnodes = 0x781 ffffff01d4099800 size_t avl_size = 0x2d0 } ffffff01d4099808 avl_tree_t vdev_queue.vq_pending_tree = { ffffff01d4099808 struct avl_node *avl_root = 0xffffff01ddaae0f8 ffffff01d4099810 int (*)() avl_compar = vdev_queue_offset_compare ffffff01d4099818 size_t avl_offset = 0x208 ffffff01d4099820 ulong_t avl_numnodes = 0x23 ffffff01d4099828 size_t avl_size = 0x2d0 } ffffff01d4099830 kmutex_t vdev_queue.vq_lock = { ffffff01d4099830 void *[1] _opaque = [ 0 ] } } -- snip -- So, ZFS is not aware that the device has been removed and it is still waiting for those IOs to finish. You could run : fmdump -v /var/fm/fmd/fltlog and see if any faults were reported. And if ZFS did detect the failure that would be reported as well.
Ross
2009-Aug-11 10:19 UTC
[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?
... which sounds very similar to issues I''ve raised many times. ZFS should have the ability to double check what a drive is doing, and speculatively time out a device that appears to be failing in order to maintain pool performance. If a single drive in a redundant pool can be seen to be responding 10-50x slower than others, or to have hundreds of oustanding IOs, ZFS should be able to flag it as ''possibly faulty'' and return data from the rest of the pool without that one device blocking it. It should not block an entire redundant pool when just one device is behaving badly. And I don''t care what the driver says. If the performance figures indicate there''s a problem, that''s a driver bug, and it''s possible for ZFS to spot that. I''ve no problems with Sun''s position that this should be done at the driver level, I agree that in theory that is where it should be dealt with, I just feel that in the real world bugs occur, and this extra sanity check could be useful in ensuring that ZFS still performs well despite problems in the device drivers. There have been reports to this forum now of single disk timeout errors have caused whole pool problems for devices connected via iscsi, usb, sas and sata. I''ve had personal experience of it on a test whitebox server using an AOC-SAT2-MV8, and similar problems have been reported on a Sun x4540. -- This message posted from opensolaris.org