While testing a zpool with a different storage adapter using my
"blkdev"
device, I did a test which made a disk unavailable -- all attempts to
read from it report EIO.
I expected my configuration (which is a 3 disk test, with 2 disks in a
RAIDZ and a hot spare) to work where the hot spare would automatically
be activated. But I''m finding that ZFS does not behave this way -- if
only some I/Os are failed, then the hot spare is failed, but if ZFS
decides that the label is gone, it takes no attempt to recruit a hot spare.
I had added FMA notification to my blkdev driver - it will post
device.no_response or device.invalid_state ereports (per the
ddi_fm_ereport_post() man page) in certain failure scenarios.
I *suspect* the problem is in the FMA notification for zfs-retire, where
the event is not being interpreted in a way that ZFS retire can figure
out that the drive is toasted.
Of course, this is just an educated guess on my part. I''m no ZFS nor
FMA expert here.
Am I missing something here? Under what conditions can I expect hot
spares to be recruited?
My zpool status showing the results is below.
- Garrett
> pfexec zpool status
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
errors: No known data errors
pool: testpool
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ''zpool replace''.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
testpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c2t3d0 ONLINE 0 0 0
c2t3d1 UNAVAIL 9 132 0 experienced I/O failures
spares
c2t3d2 AVAIL
errors: No known data errors
On Apr 5, 2010, at 3:38 AM, Garrett D''Amore wrote:> > Am I missing something here? Under what conditions can I expect hot spares to be recruited?Hot spares are activated by the zfs-retire agent in response to a list.suspect event containing one of the following faults: fault.fs.zfs.vdev.io fault.fs.zfs.vdev.checksum fault.fs.zfs.device The last of these (fault.fs.zfs.device) is what is diagnosed when a label is corrupted. What software are you runnig? Have you confirmed that you are getting one of these faults? What does ''fmdump -V'' show? Does doing a ''zpool replace c2t3d1 c2t3d2'' by hand succeed? - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On 04/ 5/10 05:28 AM, Eric Schrock wrote:> On Apr 5, 2010, at 3:38 AM, Garrett D''Amore wrote: > >> Am I missing something here? Under what conditions can I expect hot spares to be recruited? >> > Hot spares are activated by the zfs-retire agent in response to a list.suspect event containing one of the following faults: > > fault.fs.zfs.vdev.io > fault.fs.zfs.vdev.checksum > fault.fs.zfs.device > > The last of these (fault.fs.zfs.device) is what is diagnosed when a label is corrupted. What software are you runnig? Have you confirmed that you are getting one of these faults? What does ''fmdump -V'' show? Does doing a ''zpool replace c2t3d1 c2t3d2'' by hand succeed? >I see ereport.fs.zfs.io_failure, and ereport.fs.zfs.probe_failure. Also, ereport.io.service.lost and ereport.io.device.inval_state. There is indeed a fault.fs.zfs.device in the list as well. Clearly ZFS thinks the device is unavailable (which is accurate). And "pfexec zpool replace testpool c2t3d1 c2t3d2" works fine, as shown here: gdamore at tabasco{33}> pfexec zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: testpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 0h0m with 0 errors on Mon Apr 5 08:39:57 2010 config: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c2t3d0 ONLINE 0 0 0 spare-1 DEGRADED 0 0 0 c2t3d1 UNAVAIL 9 132 0 cannot open c2t3d2 ONLINE 0 0 0 20.8M resilvered spares c2t3d2 INUSE currently in use errors: No known data errors gdamore at tabasco{34}> Everything seems to be correct *except* that ZFS isn''t automatically doing the replace operation with the hot spare. It feels to me like this is possibly a ZFS bug --- perhaps ZFS is expecting a specific set of FMA faults that only sd delivers? (Recall this is with a different target device.) - Garrett> - Eric > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > >
On Apr 5, 2010, at 11:43 AM, Garrett D''Amore wrote:> > I see ereport.fs.zfs.io_failure, and ereport.fs.zfs.probe_failure. Also, ereport.io.service.lost and ereport.io.device.inval_state. There is indeed a fault.fs.zfs.device in the list as well.The ereports are not interesting, only the fault. In FMA, ereports contribute to diagnosis, but faults are the only thing that are presented to the user and retire agents.> Everything seems to be correct *except* that ZFS isn''t automatically doing the replace operation with the hot spare. > > It feels to me like this is possibly a ZFS bug --- perhaps ZFS is expecting a specific set of FMA faults that only sd delivers? (Recall this is with a different target device.)Yes, it may be a bug. You will have to step through the zfs retire agent to see what goes wrong when it receives the list.suspect event. This code path is tested many, many times every day, so it''s not as obvious as "this doesn''t work." The ZFS retire agent subscribes only to ZFS faults. The underlying driver or other telemetry has no bearing on the diagnosis or associated action. - Eric