David Abrahams wrote:> I have lots to learn, so this question may be riddled with
> misconceptions, but here goes:
>
> My understanding is that ordinary RAID-1 is only really resilient to
> the extent that you can tell which of the two disks has the bad data
> (e.g. if one whole disk is smoked, it''s easy). Is mirroring
> inherently better than that with RAID-Z because of builtin ZFS
> checksumming?
The simple answer is yes.
When you begin to look at failure modes and scenarios, then the real
difference becomes more clear. If you don''t have a checksum, then
you must rely on the disk as well as the software and hardware between
main memory and the bits on the media. The most common error message
we see from the disks themselves is a non-recoverable read. A common
source of this error is a failure on the media which is detected by
the ECC builtin to the disk. For example, a disk may be able to correct
an error in 8 bytes of the 512 byte block, but anything beyond 8 is
likely (not guaranteed) to be detected. For normal LVM operations, this
is handled by the LVM which will see the error and look to the other
side of the mirror. ZFS goes the extra mile and has a higher-level
checksum to verify the data.
For most cases, the disk error detection is suitable. However, consider
the case where the sys admin runs the format(1m) command on the disk to
repair the bad block. If the data cannot be read (we already failed once)
then the block will be zero-filled by format''s analyzer -- the data is
lost, though the block will now be available and will return data without
error. Now, the disk no longer returns an error so the LVM thinks that
the data is good. ZFS will still detect this case and try to read from
the other side of the mirror because the checksum will have a very high
probability (approximately 1-(1/2^512)) of being wrong. This is a case
where a detected error became a undetected error for most file systems
and LVMs, but ZFS still detects it.
There are other cases where faults occur along the data path which will
may not be detected. For example, IDE, SCSI, and PCI busses only offer
simple parity detection. Ethernet, SAS, SATA, FC, PCI-Express, and the
more modern data transports offer much better error detection and, for
some cases, correction. New media types, such as Flash memory devices,
also create new opportunities for data loss which are detected and
handled by ZFS. Basically, if you can get the end-to-end error detection,
then you have solved a large problem space, once and for all.
-- richard