While testing a zpool with a different storage adapter using my "blkdev" device, I did a test which made a disk unavailable -- all attempts to read from it report EIO. I expected my configuration (which is a 3 disk test, with 2 disks in a RAIDZ and a hot spare) to work where the hot spare would automatically be activated. But I''m finding that ZFS does not behave this way -- if only some I/Os are failed, then the hot spare is failed, but if ZFS decides that the label is gone, it takes no attempt to recruit a hot spare. I had added FMA notification to my blkdev driver - it will post device.no_response or device.invalid_state ereports (per the ddi_fm_ereport_post() man page) in certain failure scenarios. I *suspect* the problem is in the FMA notification for zfs-retire, where the event is not being interpreted in a way that ZFS retire can figure out that the drive is toasted. Of course, this is just an educated guess on my part. I''m no ZFS nor FMA expert here. Am I missing something here? Under what conditions can I expect hot spares to be recruited? My zpool status showing the results is below. - Garrett > pfexec zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: testpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c2t3d0 ONLINE 0 0 0 c2t3d1 UNAVAIL 9 132 0 experienced I/O failures spares c2t3d2 AVAIL errors: No known data errors
On Apr 5, 2010, at 3:38 AM, Garrett D''Amore wrote:> > Am I missing something here? Under what conditions can I expect hot spares to be recruited?Hot spares are activated by the zfs-retire agent in response to a list.suspect event containing one of the following faults: fault.fs.zfs.vdev.io fault.fs.zfs.vdev.checksum fault.fs.zfs.device The last of these (fault.fs.zfs.device) is what is diagnosed when a label is corrupted. What software are you runnig? Have you confirmed that you are getting one of these faults? What does ''fmdump -V'' show? Does doing a ''zpool replace c2t3d1 c2t3d2'' by hand succeed? - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On 04/ 5/10 05:28 AM, Eric Schrock wrote:> On Apr 5, 2010, at 3:38 AM, Garrett D''Amore wrote: > >> Am I missing something here? Under what conditions can I expect hot spares to be recruited? >> > Hot spares are activated by the zfs-retire agent in response to a list.suspect event containing one of the following faults: > > fault.fs.zfs.vdev.io > fault.fs.zfs.vdev.checksum > fault.fs.zfs.device > > The last of these (fault.fs.zfs.device) is what is diagnosed when a label is corrupted. What software are you runnig? Have you confirmed that you are getting one of these faults? What does ''fmdump -V'' show? Does doing a ''zpool replace c2t3d1 c2t3d2'' by hand succeed? >I see ereport.fs.zfs.io_failure, and ereport.fs.zfs.probe_failure. Also, ereport.io.service.lost and ereport.io.device.inval_state. There is indeed a fault.fs.zfs.device in the list as well. Clearly ZFS thinks the device is unavailable (which is accurate). And "pfexec zpool replace testpool c2t3d1 c2t3d2" works fine, as shown here: gdamore at tabasco{33}> pfexec zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: testpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 0h0m with 0 errors on Mon Apr 5 08:39:57 2010 config: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c2t3d0 ONLINE 0 0 0 spare-1 DEGRADED 0 0 0 c2t3d1 UNAVAIL 9 132 0 cannot open c2t3d2 ONLINE 0 0 0 20.8M resilvered spares c2t3d2 INUSE currently in use errors: No known data errors gdamore at tabasco{34}> Everything seems to be correct *except* that ZFS isn''t automatically doing the replace operation with the hot spare. It feels to me like this is possibly a ZFS bug --- perhaps ZFS is expecting a specific set of FMA faults that only sd delivers? (Recall this is with a different target device.) - Garrett> - Eric > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > >
On Apr 5, 2010, at 11:43 AM, Garrett D''Amore wrote:> > I see ereport.fs.zfs.io_failure, and ereport.fs.zfs.probe_failure. Also, ereport.io.service.lost and ereport.io.device.inval_state. There is indeed a fault.fs.zfs.device in the list as well.The ereports are not interesting, only the fault. In FMA, ereports contribute to diagnosis, but faults are the only thing that are presented to the user and retire agents.> Everything seems to be correct *except* that ZFS isn''t automatically doing the replace operation with the hot spare. > > It feels to me like this is possibly a ZFS bug --- perhaps ZFS is expecting a specific set of FMA faults that only sd delivers? (Recall this is with a different target device.)Yes, it may be a bug. You will have to step through the zfs retire agent to see what goes wrong when it receives the list.suspect event. This code path is tested many, many times every day, so it''s not as obvious as "this doesn''t work." The ZFS retire agent subscribes only to ZFS faults. The underlying driver or other telemetry has no bearing on the diagnosis or associated action. - Eric