Eric Sproul
2010-Mar-17 20:46 UTC
[zfs-discuss] checksum errors increasing on "spare" vdev?
Hi,
One of my colleagues was confused by the output of ''zpool
status'' on a pool
where a hot spare is being resilvered in after a drive failure:
$ zpool status data
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h56m, 23.78% done, 3h1m to go
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz1 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c0t7d0 ONLINE 0 0 0
raidz1 DEGRADED 0 0 0
spare DEGRADED 0 0 2.89M
c0t1d0 REMOVED 0 0 0
c0t6d0 ONLINE 0 0 0 59.3G resilvered
c1t5d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
spares
c0t6d0 INUSE currently in use
The CKSUM error count is increasing so he thought that the spare was also
failing. I disagreed because the errors were being recorded on the
"fake" vdev
"spare", but I want to make sure my hunch is correct.
My hunch is that since reads from userland continue to come to the pool, and
since it''s raidz, some of those reads will be for zobject addresses on
the
failed drive, now represented by the spare. Because the data at those addresses
is uninitialized, we get checksum errors.
I guess I really have two questions:
1. Am I correct about the source of the checksum errors attributed to the
"spare" vdev?
2. During raidz resilver, if a read happens for an address that is among
what''s
already been resilvered, will that read succeed, or will ALL reads to that
top-level vdev require reconstruction from the other leaf vdevs?
If the answer to #2 is that reads will succeed if they ask for data
that''s been
resilvered, then I might expect my read performance to increase as resilver
progresses, as less and less data requires reconstruction. I haven''t
measured
this in a controlled environment though, so I''m mostly just curious
about the
theory.
Eric
