Will Murnane
2009-Apr-19 05:52 UTC
[zfs-discuss] Degraded log device in "zpool status" output
I have a pool, "huge", composed of one six-disk raidz2 vdev and a log device. I failed to plug in one disk when I took the machine down to plug in the log device, and booted all the way before I realized this, so the raidz2 vdev was rightly listed as degraded. Then I brought the machine down, plugged the disk in, and brought it back up. I ran "zpool scrub huge" to make sure that the missing disk was completely synced. After a few minutes, "zpool status huge" showed this: $ zpool status huge pool: huge state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h8m, 1.19% done, 11h15m to go config: NAME STATE READ WRITE CKSUM huge DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c4t4d0 DEGRADED 0 0 15 too many errors c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 logs DEGRADED 0 0 0 c7d1 ONLINE 0 0 0 errors: No known data errors I understand that not all of the blocks may have been synced onto c4t4d0 (the missing disk), so some checksum errors are normal there. But the log disk reports no errors, and its sole component reports none either, yet the log device is marked as degraded. To see what would happen, I executed this: $ pfexec zpool clear huge c4t4d0 $ zpool status huge pool: huge state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 0h12m, 1.87% done, 10h32m to go config: NAME STATE READ WRITE CKSUM huge ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 2 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 logs ONLINE 0 0 0 c7d1 ONLINE 0 0 0 errors: No known data errors So clearing the errors from one device has an effect on the status of another device? Is this expected behavior, or is something wrong with my log device? I''m running snv_111. Will
Neil Perrin
2009-Apr-19 05:58 UTC
[zfs-discuss] Degraded log device in "zpool status" output
Will, This is bug: 6710376 log device can show incorrect status when other parts of pool are degraded This is just an error in the reporting. There was nothing actually wrong with the log device. It is picking up the degraded status from the rest of the pool. The bug was fixed only yesterday and checked into snv_114. Neil. On 04/18/09 23:52, Will Murnane wrote:> I have a pool, "huge", composed of one six-disk raidz2 vdev and a log > device. I failed to plug in one disk when I took the machine down to > plug in the log device, and booted all the way before I realized this, > so the raidz2 vdev was rightly listed as degraded. Then I brought the > machine down, plugged the disk in, and brought it back up. I ran > "zpool scrub huge" to make sure that the missing disk was completely > synced. After a few minutes, "zpool status huge" showed this: > $ zpool status huge > pool: huge > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: scrub in progress for 0h8m, 1.19% done, 11h15m to go > config: > > NAME STATE READ WRITE CKSUM > huge DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > c4t4d0 DEGRADED 0 0 15 too many errors > c4t1d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > c4t6d0 ONLINE 0 0 0 > logs DEGRADED 0 0 0 > c7d1 ONLINE 0 0 0 > > errors: No known data errors > > I understand that not all of the blocks may have been synced onto > c4t4d0 (the missing disk), so some checksum errors are normal there. > But the log disk reports no errors, and its sole component reports > none either, yet the log device is marked as degraded. To see what > would happen, I executed this: > $ pfexec zpool clear huge c4t4d0 > $ zpool status huge > pool: huge > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: scrub in progress for 0h12m, 1.87% done, 10h32m to go > config: > > NAME STATE READ WRITE CKSUM > huge ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 2 > c4t1d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > c4t6d0 ONLINE 0 0 0 > logs ONLINE 0 0 0 > c7d1 ONLINE 0 0 0 > > errors: No known data errors > > So clearing the errors from one device has an effect on the status of > another device? Is this expected behavior, or is something wrong with > my log device? I''m running snv_111. > > Will > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss