thr3ads.net - zfs discuss - [zfs-discuss] ZFS layout on hardware RAID-5? [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Marion Hakanson

2006-Sep-17 20:29 UTC

[zfs-discuss] ZFS layout on hardware RAID-5?

Greetings,

I followed closely the thread "ZFS and Storage", and other discussions
about using ZFS on hardware RAID arrays, since we are deploying ZFS in
a similar situation here.  I''m sure I''m oversimplifying, but
the consensus
for general filesystem-type storage needs, as I''ve read it, tends
toward
doing ZFS RAID-Z (or RAID-Z2) on LUNS consisting of hardware RAID-0
stripes.  This gives good performance, and allows ZFS self-healing
properties, with reasonable space utilization, while taking advantage
of the NV cache in the array for write acceleration.  Well, that''s the
approach that seems to match our needs, anyway.

However, the disk array we have (not a new purchase) is a Hitachi (HDS)
9520V, consisting mostly of SATA drives.  This array does not support
RAID-0 for some reason (one can guess that HDS does not want to provide
the ammunition for self-foot-shooting involving SATA drives).  Our pressing
question is how to configure a shelf of 15 400GB SATA drives, with the idea
that we may add another such shelf within a year.

Prior to ZFS, we likely would''ve setup two 6D+1P RAID-5 groups on that
shelf, leaving a single hot-spare, and applied UFS or SAM-QFS filesystems
onto hardware LUN''s sliced out of those grous.  The use of two smaller
RAID-groups seems advisable given the likely large reconstruct time on
these 400GB 7200RPM drives.

Some options we''re considering with ZFS are:

(0) One 13D+1P h/w RAID-5 group, one hot-spare, configured as 5-9
LUN''s.
    Setup ZFS pool of one RAID-Z group from all those LUN''s.  With a
    6-LUN RAID-Z group should have ~4333GB available space (9-LUN group
    gives ~4622GB).  Some block-level recovery available, but an extra
    helping of RAID-Z space overhead is lost.

(1) Two 6D+1P h/w RAID-5 groups, configured as 1 LUN each.  Run a simple
    stripe ZFS pool consisting of those two LUN''s.  The "con"
here is that
    there is no ZFS self-healing capability, though we do gain the other
    ZFS features.  We rely on tape backups for any block-level corruption
    recovery necessary.  The "pro" is there is no RAID-Z space
overhead;
    ~4800GB available space.

(2) Same two h/w RAID-5 groups as (1), but configured as some larger
    number of LUN''s, say 5-9 LUN''s each.  Setup a ZFS pool of
two RAID-Z
    groups consisting of those 5-9 LUN''s each.  We gain some ZFS
self-healing
    here for block-level issues, but sacrifice some space (again, double
    the single-layer RAID-5 space overhead).  With two 6-LUN RAID-Z groups,
    should be ~4000GB available space.  With 9-LUN RAID-Z groups, ~4266GB.

(3) Three 4D+1P h/w RAID-5 groups, no hot spare, mapped to one LUN each.
    Setup a ZFS pool of one RAID-Z group consisting of those three
LUN''s.
    Only ~3200GB available space, but what looks like very good resiliency
    in face of multiple disk failures.

(4) Same three h/w RG''s as (3) above, but configured 5-9 LUN''s
each.  ZFS
    pool of RAID-Z groups made from those LUN''s. With 9-LUN RAID-Z
groups,
    looks like the same 4266GB as (2) above.


One of the unknowns I have, which hopefully the more experienced folks
can help with, is related to (0), (2) and (4) above.  I''m unsure of
what
happens should a h/w RAID-5 group suffer a catastrophic problem, e.g. a
dual-drive failure.  Would all 5-9 LUN''s on the failed RAID-5 group go
away?
Or would just the affected blocks go away (two drives-worth), allowing
_some_ ZFS recovery to occur?  This makes the robustness of (0), (2) and
(4) uncertain to me.

Note that this particular pool of storage is intended to be served up
to clients via NFS and/or Samba, from a single T2000 file server.  We
hope to be able to scale this solution up to 100-200TB, by adding arrays
or JDBOD''s to the current storage.

Suggestions, discussion, advice are welcome.

Thanks and regards,

Marion

Bill Sommerfeld

2006-Sep-18 16:25 UTC

head link

[zfs-discuss] ZFS layout on hardware RAID-5?

I would go with:
> (3) Three 4D+1P h/w RAID-5 groups, no hot spare, mapped to one LUN each.
>     Setup a ZFS pool of one RAID-Z group consisting of those three
LUN''s.
>     Only ~3200GB available space, but what looks like very good resiliency
>     in face of multiple disk failures.
IMHO building a raid-z or zfs mirror out of multiple luns from the same
set of physical disks makes as little sense as mirroring between two
slices on the same disk: you''re not resiliant against a disk failure,
and you''re likely going to thrash the disk heads around quite a bit
when
ZFS sends I/O''s in parallel to all members of the raid-z or mirror set.

I''d want to partition the disks into disjoint sets and build raidz or
mirror sets out of no more than one LUN backed by each physical disk
set.

						- Bill

Richard Elling - PAE

2006-Sep-19 18:10 UTC

head link

[zfs-discuss] ZFS layout on hardware RAID-5?

In general, you will have to trade-off data availability, reliability,
performance, and space.  You haven''t really given us requirements which
would point us in one direction or another.

<teaser>
I''m developing a tool to help you make these trade-offs.  But it
isn''t
yet ready for public consumption.  RSN.
</teaser>

more below...

Marion Hakanson wrote:> Greetings,
> 
> I followed closely the thread "ZFS and Storage", and other
discussions
> about using ZFS on hardware RAID arrays, since we are deploying ZFS in
> a similar situation here.  I''m sure I''m oversimplifying,
but the consensus
> for general filesystem-type storage needs, as I''ve read it, tends
toward
> doing ZFS RAID-Z (or RAID-Z2) on LUNS consisting of hardware RAID-0
> stripes.  This gives good performance, and allows ZFS self-healing
> properties, with reasonable space utilization, while taking advantage
> of the NV cache in the array for write acceleration.  Well, that''s
the
> approach that seems to match our needs, anyway.
> 
> However, the disk array we have (not a new purchase) is a Hitachi (HDS)
> 9520V, consisting mostly of SATA drives.  This array does not support
> RAID-0 for some reason (one can guess that HDS does not want to provide
> the ammunition for self-foot-shooting involving SATA drives).  Our pressing
> question is how to configure a shelf of 15 400GB SATA drives, with the idea
> that we may add another such shelf within a year.
Thank goodness.  The worst thing vendors have ever done is recognize the
existance of RAID-0.  Millions of lives could have been spared if we started
counting at 1 :-)

OK, so one requirement is expandability.  Good.
> Prior to ZFS, we likely would''ve setup two 6D+1P RAID-5 groups on
that
> shelf, leaving a single hot-spare, and applied UFS or SAM-QFS filesystems
> onto hardware LUN''s sliced out of those grous.  The use of two
smaller
> RAID-groups seems advisable given the likely large reconstruct time on
> these 400GB 7200RPM drives.
Yes.

Note: ZFS reconstruct time is dependent on the space used, not the
size of the disk(s).
> Some options we''re considering with ZFS are:
> 
> (0) One 13D+1P h/w RAID-5 group, one hot-spare, configured as 5-9
LUN''s.
>     Setup ZFS pool of one RAID-Z group from all those LUN''s.  With
a
>     6-LUN RAID-Z group should have ~4333GB available space (9-LUN group
>     gives ~4622GB).  Some block-level recovery available, but an extra
>     helping of RAID-Z space overhead is lost.
> 
> (1) Two 6D+1P h/w RAID-5 groups, configured as 1 LUN each.  Run a simple
>     stripe ZFS pool consisting of those two LUN''s.  The
"con" here is that
>     there is no ZFS self-healing capability, though we do gain the other
>     ZFS features.  We rely on tape backups for any block-level corruption
>     recovery necessary.  The "pro" is there is no RAID-Z space
overhead;
>     ~4800GB available space.
> 
> (2) Same two h/w RAID-5 groups as (1), but configured as some larger
>     number of LUN''s, say 5-9 LUN''s each.  Setup a ZFS
pool of two RAID-Z
>     groups consisting of those 5-9 LUN''s each.  We gain some ZFS
self-healing
>     here for block-level issues, but sacrifice some space (again, double
>     the single-layer RAID-5 space overhead).  With two 6-LUN RAID-Z groups,
>     should be ~4000GB available space.  With 9-LUN RAID-Z groups, ~4266GB.
> 
> (3) Three 4D+1P h/w RAID-5 groups, no hot spare, mapped to one LUN each.
>     Setup a ZFS pool of one RAID-Z group consisting of those three
LUN''s.
>     Only ~3200GB available space, but what looks like very good resiliency
>     in face of multiple disk failures.
> 
> (4) Same three h/w RG''s as (3) above, but configured 5-9
LUN''s each.  ZFS
>     pool of RAID-Z groups made from those LUN''s. With 9-LUN RAID-Z
groups,
>     looks like the same 4266GB as (2) above.
> 
> 
> One of the unknowns I have, which hopefully the more experienced folks
> can help with, is related to (0), (2) and (4) above.  I''m unsure
of what
> happens should a h/w RAID-5 group suffer a catastrophic problem, e.g. a
> dual-drive failure.  Would all 5-9 LUN''s on the failed RAID-5
group go away?
> Or would just the affected blocks go away (two drives-worth), allowing
> _some_ ZFS recovery to occur?  This makes the robustness of (0), (2) and
> (4) uncertain to me.
First, you need to understand that the RAID array won''t expose ZFS to
disk failures.  All failures reported through the stack would appear as
block failures or LUN failures.  This is subtle, but does impact the
recovery [1].  ZFS does single block recovery pretty much on the fly,
but LUN recovery could take some time.  Thus your instinct to use smaller
LUNs might make sense (though this is difficult to say for sure without an
availability assessment).

[1] you will also need a good offline backup strategy.  ZFS snapshots
really help here.
> Note that this particular pool of storage is intended to be served up
> to clients via NFS and/or Samba, from a single T2000 file server.  We
> hope to be able to scale this solution up to 100-200TB, by adding arrays
> or JDBOD''s to the current storage.
When you add, you really want to move the data around on LUNs.  I would
recommend using ZFS for RAID-1 or RAID-Z2.  When you add another array,
you add another bunch of LUNs, which will then be logically added as
another RAID set.  However, once you have the LUNs created, you can also
relocate the old LUNs.  This is part of the "grow into your storage"
model.
The benefit is that you can "move" vdevs around as your diversity
opportunities increase over time.
[Insert crummy ASCII art here]

  -- richard

Reasonably Related Threads

Search for more maybe matching threads

zfs discuss - Sep 2006 - ZFS layout on hardware RAID-5?

[zfs-discuss] ZFS layout on hardware RAID-5?

[zfs-discuss] ZFS layout on hardware RAID-5?

[zfs-discuss] ZFS layout on hardware RAID-5?

Reasonably Related Threads