Albert Chin
2009-Aug-25 11:05 UTC
[zfs-discuss] Resilver complete, but device not replaced, odd zpool status output
$ cat /etc/release Solaris Express Community Edition snv_105 X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 15 December 2008 $ zpool status tww pool: tww state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 7h12m with 30786 errors on Tue Aug 25 00:31:22 2009 config: NAME STATE READ WRITE CKSUM tww DEGRADED 0 0 48.8K raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005964668CB39d0 ONLINE 0 0 0 c6t600A0B8000299CCC000006C84744C892d0 ONLINE 0 0 0 c6t600A0B8000299CCC000005B44668CC6Ad0 ONLINE 0 0 0 c6t600A0B8000299966000005A44668CC3Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005BA4668CD2Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005AA4668CDB1d0 ONLINE 0 0 0 c6t600A0B80002999660000073547C5CED9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005B04668F17Dd0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099E4A400B94d0 ONLINE 0 0 0 c6t600A0B8000299966000005B64668F26Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005CC4668F30Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005BC4668F305d0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099B4A400A9Cd0 ONLINE 0 0 0 c6t600A0B8000299966000005C24668F39Bd0 ONLINE 0 0 0 raidz2 DEGRADED 0 0 98.7K c6t600A0B8000299CCC00000A154A89E426d0 ONLINE 0 0 0 228K resilvered c6t600A0B8000299966000009F74A89E1A5d0 ONLINE 0 0 1.33K 11.8M resilvered c6t600A0B8000299CCC00000A174A89E520d0 ONLINE 0 0 187 278K resilvered c6t600A0B8000299966000009F94A89E24Bd0 ONLINE 0 0 22.9K 217M resilvered replacing DEGRADED 0 0 118K c6t600A0B8000299CCC00000A194A89E634d0 UNAVAIL 20 827K 0 experienced I/O failures c6t600A0B8000299966000009EE4A89DA51d0 ONLINE 0 0 0 1.85G resilvered c6t600A0B8000299CCC00000A0C4A89DDE8d0 ONLINE 0 0 124 310K resilvered c6t600A0B8000299966000009F04A89DB1Bd0 ONLINE 0 0 15.7K 210M resilvered spares c6t600A0B8000299CCC000005D84668F448d0 AVAIL c6t600A0B8000299966000005C84668F461d0 AVAIL errors: 31539 data errors, use ''-v'' for a list Once I noticed the drive failure, I started to replace the device # zpool replace tww c6t600A0B8000299CCC00000A194A89E634d0 \ c6t600A0B8000299966000009EE4A89DA51d0 Then, decided to reseat it instead, so stopped the resilver, reseated the device, and online''d it: # zpool scrub -s tww # zpool online tww c6t600A0B8000299CCC00000A194A89E634d0 But, the resilver continued to completion. But, now that the resilver is complete, the UNAVAIL drive is still listed. Why? And, it seems zfs still thinks the drive is in a replacing state: # zpool replace tww c6t600A0B8000299CCC00000A194A89E634d0 \ c6t600A0B8000299966000005C84668F461d0 cannot replace c6t600A0B8000299CCC00000A194A89E634d0 with c6t600A0B8000299966000005C84668F461d0: cannot replace a replacing device Maybe something to do with http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6782540? So, I marked the device offline and the resilver restarted: # zpool offline tww c6t600A0B8000299CCC00000A194A89E634d0 # zpool status tww pool: tww state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h0m, 0.02% done, 39h14m to go config: ... After the resilver completed: # zpool status tww pool: tww state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 6h9m with 27886 errors on Tue Aug 25 08:32:41 2009 config: NAME STATE READ WRITE CKSUM tww DEGRADED 0 0 76.0K raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005964668CB39d0 ONLINE 0 0 0 c6t600A0B8000299CCC000006C84744C892d0 ONLINE 0 0 0 c6t600A0B8000299CCC000005B44668CC6Ad0 ONLINE 0 0 0 c6t600A0B8000299966000005A44668CC3Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005BA4668CD2Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005AA4668CDB1d0 ONLINE 0 0 0 c6t600A0B80002999660000073547C5CED9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005B04668F17Dd0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099E4A400B94d0 ONLINE 0 0 0 c6t600A0B8000299966000005B64668F26Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005CC4668F30Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005BC4668F305d0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099B4A400A9Cd0 ONLINE 0 0 0 c6t600A0B8000299966000005C24668F39Bd0 ONLINE 0 0 0 raidz2 DEGRADED 0 0 153K c6t600A0B8000299CCC00000A154A89E426d0 ONLINE 0 0 1 1K resilvered c6t600A0B8000299966000009F74A89E1A5d0 ONLINE 0 0 2.14K 5.67M resilvered c6t600A0B8000299CCC00000A174A89E520d0 ONLINE 0 0 299 34K resilvered c6t600A0B8000299966000009F94A89E24Bd0 ONLINE 0 0 29.7K 23.5M resilvered replacing DEGRADED 0 0 118K c6t600A0B8000299CCC00000A194A89E634d0 OFFLINE 20 1.28M 0 c6t600A0B8000299966000009EE4A89DA51d0 ONLINE 0 0 0 1.93G resilvered c6t600A0B8000299CCC00000A0C4A89DDE8d0 ONLINE 0 0 247 54K resilvered c6t600A0B8000299966000009F04A89DB1Bd0 ONLINE 0 0 24.2K 51.3M resilvered spares c6t600A0B8000299CCC000005D84668F448d0 AVAIL c6t600A0B8000299966000005C84668F461d0 AVAIL errors: 27886 data errors, use ''-v'' for a list # zpool replace c6t600A0B8000299CCC00000A194A89E634d0 \ c6t600A0B8000299966000009EE4A89DA51d0 invalid vdev specification use ''-f'' to override the following errors: /dev/dsk/c6t600A0B8000299966000009EE4A89DA51d0s0 is part of active ZFS pool tww. Please see zpool(1M). So, what is going on? -- albert chin (china at thewrittenword.com)
Albert Chin
2009-Aug-25 18:34 UTC
[zfs-discuss] Resilver complete, but device not replaced, odd zpool status output
On Tue, Aug 25, 2009 at 06:05:16AM -0500, Albert Chin wrote:> [[ snip snip ]] > > After the resilver completed: > # zpool status tww > pool: tww > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: resilver completed after 6h9m with 27886 errors on Tue Aug 25 > 08:32:41 2009 > config: > > NAME STATE READ > WRITE CKSUM > tww DEGRADED 0 > 0 76.0K > raidz2 ONLINE 0 > 0 0 > c6t600A0B8000299966000005964668CB39d0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC000006C84744C892d0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC000005B44668CC6Ad0 ONLINE 0 > 0 0 > c6t600A0B8000299966000005A44668CC3Fd0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC000005BA4668CD2Ed0 ONLINE 0 > 0 0 > c6t600A0B8000299966000005AA4668CDB1d0 ONLINE 0 > 0 0 > c6t600A0B80002999660000073547C5CED9d0 ONLINE 0 > 0 0 > raidz2 ONLINE 0 > 0 0 > c6t600A0B8000299966000005B04668F17Dd0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC0000099E4A400B94d0 ONLINE 0 > 0 0 > c6t600A0B8000299966000005B64668F26Fd0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC000005CC4668F30Ed0 ONLINE 0 > 0 0 > c6t600A0B8000299966000005BC4668F305d0 ONLINE 0 > 0 0 > c6t600A0B8000299CCC0000099B4A400A9Cd0 ONLINE 0 > 0 0 > c6t600A0B8000299966000005C24668F39Bd0 ONLINE 0 > 0 0 > raidz2 DEGRADED 0 > 0 153K > c6t600A0B8000299CCC00000A154A89E426d0 ONLINE 0 > 0 1 1K resilvered > c6t600A0B8000299966000009F74A89E1A5d0 ONLINE 0 > 0 2.14K 5.67M resilvered > c6t600A0B8000299CCC00000A174A89E520d0 ONLINE 0 > 0 299 34K resilvered > c6t600A0B8000299966000009F94A89E24Bd0 ONLINE 0 > 0 29.7K 23.5M resilvered > replacing DEGRADED 0 > 0 118K > c6t600A0B8000299CCC00000A194A89E634d0 OFFLINE 20 > 1.28M 0 > c6t600A0B8000299966000009EE4A89DA51d0 ONLINE 0 > 0 0 1.93G resilvered > c6t600A0B8000299CCC00000A0C4A89DDE8d0 ONLINE 0 > 0 247 54K resilvered > c6t600A0B8000299966000009F04A89DB1Bd0 ONLINE 0 > 0 24.2K 51.3M resilvered > spares > c6t600A0B8000299CCC000005D84668F448d0 AVAIL > c6t600A0B8000299966000005C84668F461d0 AVAIL > > errors: 27886 data errors, use ''-v'' for a list > > # zpool replace c6t600A0B8000299CCC00000A194A89E634d0 \ > c6t600A0B8000299966000009EE4A89DA51d0 > invalid vdev specification > use ''-f'' to override the following errors: > /dev/dsk/c6t600A0B8000299966000009EE4A89DA51d0s0 is part of active ZFS > pool tww. Please see zpool(1M). > > So, what is going on?Rebooted the server and see the same problem. So, I ran: # zpool detach tww c6t600A0B8000299CCC00000A194A89E634d0 and now the zpool status output looks "normal": # zpool status tww pool: tww state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h16m, 7.88% done, 3h9m to go config: NAME STATE READ WRITE CKSUM tww ONLINE 0 0 5 raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005964668CB39d0 ONLINE 0 0 0 c6t600A0B8000299CCC000006C84744C892d0 ONLINE 0 0 0 c6t600A0B8000299CCC000005B44668CC6Ad0 ONLINE 0 0 0 c6t600A0B8000299966000005A44668CC3Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005BA4668CD2Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005AA4668CDB1d0 ONLINE 0 0 0 c6t600A0B80002999660000073547C5CED9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B8000299966000005B04668F17Dd0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099E4A400B94d0 ONLINE 0 0 0 c6t600A0B8000299966000005B64668F26Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC000005CC4668F30Ed0 ONLINE 0 0 0 c6t600A0B8000299966000005BC4668F305d0 ONLINE 0 0 0 c6t600A0B8000299CCC0000099B4A400A9Cd0 ONLINE 0 0 0 c6t600A0B8000299966000005C24668F39Bd0 ONLINE 0 0 0 raidz2 ONLINE 0 0 71 c6t600A0B8000299CCC00000A154A89E426d0 ONLINE 0 0 0 c6t600A0B8000299966000009F74A89E1A5d0 ONLINE 0 0 1 c6t600A0B8000299CCC00000A174A89E520d0 ONLINE 0 0 1 1K resilvered c6t600A0B8000299966000009F94A89E24Bd0 ONLINE 0 0 17 3K resilvered c6t600A0B8000299966000009EE4A89DA51d0 ONLINE 0 0 0 25.9M resilvered c6t600A0B8000299CCC00000A0C4A89DDE8d0 ONLINE 0 0 9 3K resilvered c6t600A0B8000299966000009F04A89DB1Bd0 ONLINE 0 0 39 13K resilvered spares c6t600A0B8000299CCC000005D84668F448d0 AVAIL c6t600A0B8000299966000005C84668F461d0 AVAIL errors: 1 data errors, use ''-v'' for a list -- albert chin (china at thewrittenword.com)