thr3ads.net - zfs discuss - [zfs-discuss] raidz vs raid5 clarity needed [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Brad

2009-Dec-29 22:37 UTC

[zfs-discuss] raidz vs raid5 clarity needed

Hi!  I''m attempting to understand the pros/cons  between raid5 and
raidz after running into a performance issue with Oracle on zfs 
(http://opensolaris.org/jive/thread.jspa?threadID=120703&tstart=0).

I would appreciate some feedback on what I''ve understood so far:

WRITES

raid5 - A FS block is written on a single disk (or multiple disks depending on
size data???)
raidz - A FS block is written in a dynamic stripe (depending on size of
data?)across n number of vdevs (minus parity).

READS

raid5 - IO count depends on  how many disks FS block written to. (data crosses
two disks 2 IOs??)
raidz - A single read will span across n number of vdevs (minus parity). 
(1single IO??)

NEGATIVES

raid5 - Write hole penalty, where if system crashes in the middle of a write
block update before or after updating parity - data is corrupt.
         - Overhead (read previous block, read parity, update parity and write
block)
        - No checksumming of data!
        - Slow read sequential performance.

raidz - Bound by x number of IOPS from slowest vdev since blocks are striped.
          Bad for small random reads

POSITIVES

raid5 - Good for random reads (between raid5 and raidz!) since blocks are not
striped across sum of disks.
raidz - Good for sequential reads and writes since data is striped across sum of
vdevs.
        - No write hole penalty!
-- 
This message posted from opensolaris.org

A Darren Dunham

2009-Dec-29 23:49 UTC

head link

[zfs-discuss] raidz vs raid5 clarity needed

On Tue, Dec 29, 2009 at 02:37:20PM -0800, Brad wrote:> I would appreciate some feedback on what I''ve understood so far:
> 
> WRITES
> 
> raid5 - A FS block is written on a single disk (or multiple disksdepending on size data???)

There is no direct relationship between a filesystem and the RAID
structure.  RAID5 maps virtual sectors to columns in some width pattern.
How the FS uses those virtual sectors is up to it.  The admin may need
to know how it is to be used if there is a desire to tweak the stripe
width.  This makes some comparisons difficult because RAID5 is only a
presentation and management of a set of contiguous blocks, while raidz
is always associated with a particular filesystem.

Updates to RAID5 are in-place.
> raidz - A FS block is written in a dynamic stripe (depending on size of
data?)across n number of vdevs (minus parity).
> 
The stripe may be written in as few as 1 disk for data and other disks
for parity, or the stripe may cover all the disks.
> READS
> 
> raid5 - IO count depends on  how many disks FS block written to. (data
> crosses two disks 2 IOs??)
Well, that''s true for anything.  You can''t read two disks
without
issuing two reads.  The main issue is that RAID5 has no ability to
validate the data, so it doesn''t need to read all columns.  It can just
read one sector if necessary and return the data.  How many disk sectors
must be retreived may depend on which filesystem is in use.  But in most
cases (common filesystems, common stripe widths), a single FS block will
not be distributed over many disks.
> raidz - A single read will span across n number of vdevs (minus  parity).  (1single IO??)

If not in cache, the ZFS block is read (usually only from the non-parity
components), and that block may be on many disks.  The entire ZFS block
is read so that it can be validated against the checksum.
> NEGATIVES
> 
> raid5 - Write hole penalty, where if system crashes in the middle of a
> write block update before or after updating parity - data is corrupt.
Assuming no other structures are used to address it (like a log
device).  A log device is not really part of RAID5, but may be found in
implementations of RAID5.
>          - Overhead (read previous block, read parity, update parity             and write block)

True for non-full-stripe writes.  Full stripe writes need no read step
(something the raidz implementation leverages).
>         - No checksumming of data!
>         - Slow read sequential performance.
Not sure why sequential read performance would have a penalty under
RAID5.  

-- 
Darren

Ross Walker

2009-Dec-30 05:38 UTC

head link

[zfs-discuss] raidz vs raid5 clarity needed

On Dec 29, 2009, at 5:37 PM, Brad <beneri3 at yahoo.com> wrote:
> Hi!  I''m attempting to understand the pros/cons  between raid5 and
> raidz after running into a performance issue with Oracle on zfs 
(http://opensolaris.org/jive/thread.jspa?threadID=120703&tstart=0
> ).
>
> I would appreciate some feedback on what I''ve understood so far:
>
> WRITES
>
> raid5 - A FS block is written on a single disk (or multiple disks  
> depending on size data???)
If the write doesn''t span the whole stripe width then there is a read  
of the parity chunk, write of the block and a write of the parity  
chunk which is the write hole penalty/vulnerability, and is 3  
operations (if the data spans more then 1 chunk then it is written in  
parallel so you can think of it as one operation, if the data doesn''t  
fill any given chunk then a read of the existing data chunk is  
necessary to fill in the missing data making it 4 operations). No  
other operation on the array can execute while this is happening.
> raidz - A FS block is written in a dynamic stripe (depending on size  
> of data?)across n number of vdevs (minus parity).
Yes, and since there is no write penalty it is only one operation, so  
writes should be faster on raidz then raid5 (and safer). Like raid5 no  
other operation to the vdev can execute while this is happening.
> READS
>
> raid5 - IO count depends on  how many disks FS block written to.  
> (data crosses two disks 2 IOs??)
These can happen in parallel so really think of it as one operation,  
but the reads only need to read from the disks the data lies on  
allowing multiple read operations to execute simultaneously  as long  
as the data is on separate spindles. Of course no write can execute  
while a read is happening.
> raidz - A single read will span across n number of vdevs (minus  
> parity).  (1single IO??)
Yes, reads are exactly like writes on the raidz vdev, no other  
operation, read or write, can execute while this is happening. This is  
where the problem lies, and is felt hardest with random IOs.
> NEGATIVES
>
> raid5 - Write hole penalty, where if system crashes in the middle of  
> a write block update before or after updating parity - data is  
> corrupt.
>         - Overhead (read previous block, read parity, update parity  
> and write block)
>        - No checksumming of data!
>        - Slow read sequential performance.
Sequential reads and random reads should be as fast as a raid0 of the  
number of disks in the array minus one.
> raidz - Bound by x number of IOPS from slowest vdev since blocks are  
> striped.
>          Bad for small random reads
>
> POSITIVES
>
> raid5 - Good for random reads (between raid5 and raidz!) since  
> blocks are not striped across sum of disks.
> raidz - Good for sequential reads and writes since data is striped  
> across sum of vdevs.
>        - No write hole penalty!
That about sums it up.

Avoid raidz for random workloads, unless you have so many disks you  
can create multiple raidz vdevs to accomodate the needed IOPS.

-Ross

Brad

2009-Dec-30 06:28 UTC

head link

[zfs-discuss] raidz vs raid5 clarity needed

@ross

"If the write doesn''t span the whole stripe width then there is a
read
of the parity chunk, write of the block and a write of the parity
chunk which is the write hole penalty/vulnerability, and is 3
operations (if the data spans more then 1 chunk then it is written in
parallel so you can think of it as one operation, if the data doesn''t
fill any given chunk then a read of the existing data chunk is
necessary to fill in the missing data making it 4 operations). No
other operation on the array can execute while this is happening."

I thought with raid5 for a new FS block write, the previous block is read in,
then read parity, write/update parity then write the new block (2 reads 2
writes)??

"Yes, reads are exactly like writes on the raidz vdev, no other
operation, read or write, can execute while this is happening. This is
where the problem lies, and is felt hardest with random IOs."

Ah - so with a random read workload, a read IO can not be
executed in multiple streams or simultaneously until the current IO has
completed with raidz.  Was the thought process behind this to mitigate the
write hole issue or for performance (a write is a single IO instead of  3 or 4
IOs with raid5)?
-- 
This message posted from opensolaris.org

zfs discuss - Dec 2009 - raidz vs raid5 clarity needed

[zfs-discuss] raidz vs raid5 clarity needed

[zfs-discuss] raidz vs raid5 clarity needed

[zfs-discuss] raidz vs raid5 clarity needed

[zfs-discuss] raidz vs raid5 clarity needed