Kevin
2007-Dec-12 02:17 UTC
[zfs-discuss] Degraded zpool won''t online disk device, instead resilvers spare
I''ve got a zpool that has 4 raidz2 vdevs each with 4 disks (750GB), plus 4 spares. At one point 2 disks failed (in different vdevs). The message in /var/adm/messages for the disks were ''device busy too long''. Then SMF printed this message: Nov 23 04:23:51 x.x.com EVENT-TIME: Fri Nov 23 04:23:51 EST 2007 Nov 23 04:23:51 x.x.com PLATFORM: Sun Fire X4200 M2, CSN: 0734BD159F , HOSTNAME: x.x.com Nov 23 04:23:51 x.x.com SOURCE: zfs-diagnosis, REV: 1.0 Nov 23 04:23:51 x.x.com EVENT-ID: bb0f6d83-0c12-6f0f-d121-99d72f7de981 Nov 23 04:23:51 x.x.com DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Nov 23 04:23:51 x.x.com AUTO-RESPONSE: No automated response will occur. Nov 23 04:23:51 x.x.com IMPACT: Fault tolerance of the pool may be compromised. Nov 23 04:23:51 x.x.com REC-ACTION: Run ''zpool status -x'' and replace the bad device. Interestingly, zfs reported the failure but did not bring two of the spare disks online to temporarily replace the failed disks. Here''s the zpool history command to see what hapenned after the failures (from Nov 26 on): 2007-11-21.20:56:47 zpool create tank raidz2 c5t22d0 c5t30d0 c5t23d0 c5t31d0 2007-11-21.20:57:07 zpool add tank raidz2 c5t24d0 c5t32d0 c5t25d0 c5t33d0 2007-11-21.20:57:17 zpool add tank raidz2 c5t26d0 c5t34d0 c5t27d0 c5t35d0 2007-11-21.20:57:35 zpool add tank raidz2 c5t28d0 c5t36d0 c5t29d0 c5t37d0 2007-11-21.20:57:44 zpool scrub tank 2007-11-23.02:15:38 zpool scrub tank 2007-11-26.12:16:41 zpool online tank c5t23d0 2007-11-26.12:17:48 zpool online tank c5t23d0 2007-11-26.12:18:59 zpool add tank spare c5t17d0 2007-11-26.12:29:32 zpool offline tank c5t29d0 2007-11-26.12:32:08 zpool online tank c5t29d0 2007-11-26.12:32:35 zpool scrub tank 2007-11-26.12:34:15 zpool scrub -s tank 2007-11-26.12:34:22 zpool export tank 2007-11-26.12:43:42 zpool import tank tank.2 2007-11-26.12:45:45 zpool export tank.2 2007-11-26.12:46:32 zpool import tank.2 2007-11-26.12:47:02 zpool scrub tank.2 2007-11-26.12:48:11 zpool add tank.2 spare c5t21d0 c4t17d0 c4t21d0 2007-11-26.14:02:08 zpool scrub -s tank.2 2007-11-27.01:56:35 zpool clear tank.2 2007-11-27.01:57:02 zfs set atime=off tank.2 2007-11-27.01:57:07 zfs set checksum=fletcher4 tank.2 2007-11-27.01:57:45 zfs create tank.2/a 2007-11-27.01:57:46 zfs create tank.2/b 2007-11-27.01:57:47 zfs create tank.2/c 2007-11-27.01:59:39 zpool scrub tank.2 2007-12-05.15:31:51 zpool online tank.2 c5t23d0 2007-12-05.15:32:02 zpool online tank.2 c5t29d0 2007-12-05.15:36:58 zpool online tank.2 c5t23d0 2007-12-05.16:24:56 zpool replace tank.2 c5t23d0 c5t17d0 2007-12-05.21:52:43 zpool replace tank.2 c5t29d0 c5t21d0 2007-12-06.16:12:24 zpool online tank.2 c5t29d0 2007-12-11.13:08:13 zpool online tank.2 c5t23d0 2007-12-11.19:52:38 zpool online tank.2 c5t29d0 You can see that I manually attached 2 of the spare devices to the pool. Scrubbing finished fairly quickly (within 5 hours probably). Here is what the pool status looks like right now: pool: tank.2 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Tue Dec 11 19:58:17 2007 config: |--------NAME STATE READ WRITE CKSUM |--------tank.2 DEGRADED 0 0 0 |----------raidz2 DEGRADED 0 0 0 |------------c5t22d0 ONLINE 0 0 0 |------------c4t30d0 ONLINE 0 0 0 |------------spare DEGRADED 0 0 0 |--------------c5t23d0 UNAVAIL 0 0 0 cannot open |--------------c5t17d0 ONLINE 0 0 0 |------------c4t31d0 ONLINE 0 0 0 |----------raidz2 ONLINE 0 0 0 |------------c5t24d0 ONLINE 0 0 0 |------------c4t32d0 ONLINE 0 0 0 |------------c5t25d0 ONLINE 0 0 0 |------------c4t33d0 ONLINE 0 0 0 |----------raidz2 ONLINE 0 0 0 |------------c5t26d0 ONLINE 0 0 0 |------------c4t34d0 ONLINE 0 0 0 |------------c5t27d0 ONLINE 0 0 0 |------------c4t35d0 ONLINE 0 0 0 |----------raidz2 DEGRADED 0 0 0 |------------c5t28d0 ONLINE 0 0 0 |------------c4t36d0 ONLINE 0 0 0 |------------spare DEGRADED 0 0 0 |--------------c5t29d0 UNAVAIL 0 0 0 cannot open |--------------c5t21d0 ONLINE 0 0 0 |------------c4t37d0 ONLINE 0 0 0 |--------spares |----------c5t17d0 INUSE currently in use |----------c5t21d0 INUSE currently in use |----------c4t17d0 AVAIL |----------c4t21d0 AVAIL errors: No known data errors The disks failed because they were temporarily detached. The disks were brought back online. We can verify that the OS can actually read data from them: ROOT $ dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 count=1000000 ^Z [1]+ Stopped dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 count=1000000 ROOT $ bg [1]+ dd if=/dev/rdsk/c5t29d0 of=tst bs=1024 count=1000000 & ROOT $ iostat -xn c5t29d0 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.8 0.1 4.1 0.0 0.0 0.0 0.0 0.2 0 0 c5t29d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 5018.1 2.0 5031.1 0.0 0.0 0.8 0.0 0.2 5 80 c5t29d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 5180.4 0.0 5180.4 0.0 0.0 0.8 0.0 0.2 5 78 c5t29d0 extended device statistics ^C ROOT $ 1000000+0 records in 1000000+0 records out We performed a similar test to make sure that data can be written to the disk without any problems. So the device is clearly online. We have rebooted the server just to make sure. Now I try to bring the devices back online it gives me a one line message telling me that it''s bringing the device online (no error messages): ROOT $ zpool online tank.2 c5t29d0 Bringing device c5t29d0 online zpool status -x then tells me that it''s doing a resilvering. But if I do a zpool iostat -v 1 you can see that it''s actually resilvering the mirror disks again! Here is the zpool status -x: ROOT $ zpool status tank.2 pool: tank.2 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver in progress, 0.01% done, 11h5m to go config: |--------NAME STATE READ WRITE CKSUM |--------tank.2 DEGRADED 0 0 0 |----------raidz2 DEGRADED 0 0 0 |------------c5t22d0 ONLINE 0 0 0 |------------c4t30d0 ONLINE 0 0 0 |------------spare DEGRADED 0 0 0 |--------------c5t23d0 UNAVAIL 0 0 0 cannot open |--------------c5t17d0 ONLINE 0 0 0 |------------c4t31d0 ONLINE 0 0 0 |----------raidz2 ONLINE 0 0 0 |------------c5t24d0 ONLINE 0 0 0 |------------c4t32d0 ONLINE 0 0 0 |------------c5t25d0 ONLINE 0 0 0 |------------c4t33d0 ONLINE 0 0 0 |----------raidz2 ONLINE 0 0 0 |------------c5t26d0 ONLINE 0 0 0 |------------c4t34d0 ONLINE 0 0 0 |------------c5t27d0 ONLINE 0 0 0 |------------c4t35d0 ONLINE 0 0 0 |----------raidz2 DEGRADED 0 0 0 |------------c5t28d0 ONLINE 0 0 0 |------------c4t36d0 ONLINE 0 0 0 |------------spare DEGRADED 0 0 0 |--------------c5t29d0 UNAVAIL 0 0 0 cannot open |--------------c5t21d0 ONLINE 0 0 0 |------------c4t37d0 ONLINE 0 0 0 |--------spares |----------c5t17d0 INUSE currently in use |----------c5t21d0 INUSE currently in use |----------c4t17d0 AVAIL |----------c4t21d0 AVAIL errors: No known data errors And then here is some of the output of zpool iostat -v 1: ------------- capacity operations bandwidth pool--------- used avail read write read write ------------- ----- ----- ----- ----- ----- ----- tank.2 5.01T 5.86T 283 99 25.0M 1.11M |--raidz2 1.25T 1.47T 114 26 13.3M 294K |----c5t22d0 - - 103 20 6.64M 124K |----c4t30d0 - - 85 21 5.39M 147K |----spare - - 0 136 0 6.74M |------c5t23d0 - - 0 0 0 0 |------c5t17d0 - - 0 135 0 6.74M |----c4t31d0 - - 72 21 4.40M 148K |--raidz2 1.25T 1.47T 21 16 46.1K 212K |----c5t24d0 - - 9 16 14.4K 108K |----c4t32d0 - - 10 15 14.6K 108K |----c5t25d0 - - 11 15 16.6K 108K |----c4t33d0 - - 10 15 15.2K 107K |--raidz2 1.25T 1.47T 28 23 57.7K 250K |----c5t26d0 - - 11 21 16.6K 127K |----c4t34d0 - - 10 20 15.2K 127K |----c5t27d0 - - 15 21 23.0K 127K |----c4t35d0 - - 16 21 24.8K 126K |--raidz2 1.25T 1.47T 119 33 11.6M 377K |----c5t28d0 - - 109 22 5.79M 151K |----c4t36d0 - - 93 22 4.76M 151K |----spare - - 0 137 0 5.95M |------c5t29d0 - - 0 0 0 0 |------c5t21d0 - - 0 136 0 5.95M |----c4t37d0 - - 74 23 3.86M 190K ------------- ----- ----- ----- ----- ----- ----- So notice that there is 0 disk traffic for the disk we are trying to bring online (c5t29d0), but there is write disk traffic for the spare disk AND the other spare disk. So it looks like it''s resilvering both mirror disks again? (why would it need to do that?) So I try using the replace command instead of the online command to tell it to bring itself online (and resilver only what has changed since it was brought online). But now it''s complaining that the disk is already part of the same pool (since it''s reading the old yet valid on-disk metadata for that disk which is still valid): ROOT $ zpool replace tank.2 c5t29d0 invalid vdev specification use ''-f'' to override the following errors: /dev/dsk/c5t29d0s0 is part of active ZFS pool tank.2. Please see zpool(1M). I could try the -f command to force it, but I want it to only resilver those parts that have changed. I tried detaching the mirror in hope that it would recognize that the c5t29d0 is online again: ROOT $ zpool detach tank.2 c5t21d0 However, running zpool status again shows that the spare has been removed, but no change other than that. When I reattach the spare device immediately, the resilver process begins again (it looks like again from zpool iostat or iostat -xn that it is resilvering both of the attached spares, not just the one that I''m attaching again). Also, this resilver process takes quite a long time (like it has to resilver everything all over again, as opposed to just changes). Does the resilver logic work differently if there is a spare involved? Any idea what is going wrong here? It seems that zfs should be able to online the disks since the OS can read/write perfectly fine to those devices. And it seems that if the online fails it shouldn''t cause a resilver of both of the attached spares. You will notice that the pool was renamed by doing zpool export tank, zpool import tank tank.2. Could this be causing ZFS to get confused when the device is brought online? We are willing to try zpool replace -f on the disks that need to be brought online during the weekend to see what happens. Here is the system info: ROOT $ uname -a SunOS x.x.com 5.10 Generic_120012-14 i86pc i386 i86pc Will send showrev -p output if desired. Thanks, Kevin This message posted from opensolaris.org