On 2011-06-10, at 10:58 AM, David Noriega wrote:> I was checking out zfsonlinux.org to see how things have been going
> lately and I had a question. Whats the difference, or whats better:
> Use a hardware raid5(or 6) or use zfs to create a raidz pool? In terms
> of Lustre, is one preferred over another?
ZFS much prefers to have direct access to the individual disks in a JBOD,
instead of via h/w RAID-5/6. There are several reasons:
- it "knows" where the data and parity are located, and if there is an
error reading data from disk it can retry with different data/parity
combinations until the checksum matches, even trying single-bit error
recovery in extreme cases
- it is easier to locate multiple copies of the metadata on different
disks and if it has direct access to the individual disks
- it has more IO queues and can schedule IO better for individual disks,
keeping the IO queue relatively shallow so that read latency isn''t
hurt
- pooled storage, in theory, allows all space/bandwidth to be used by any
thread doing IO. In practice this doesn''t perform as well as in
theory.
- no read-modify-write when writing "partial block" data (there
isn''t really
such a thing as a "partial block write" for RAID-Z"
The main drawback is that RAID-Z needs a lot more effort when rebuilding
a failed disk compared to a normal RAID-5/6. ZFS proponents will claim
that "it only needs to rebuild the used parts of the filesystem", but
most HPC filesystems are kept 70-80% full, so the RAID-Z overhead wipes
out any advantage gained by not rebuilding the 20% of unused space.
See zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf for some
performance comparisons.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer
Whamcloud, Inc.