In general, you will have to trade-off data availability, reliability,
performance, and space. You haven''t really given us requirements which
would point us in one direction or another.
<teaser>
I''m developing a tool to help you make these trade-offs. But it
isn''t
yet ready for public consumption. RSN.
</teaser>
more below...
Marion Hakanson wrote:> Greetings,
>
> I followed closely the thread "ZFS and Storage", and other
discussions
> about using ZFS on hardware RAID arrays, since we are deploying ZFS in
> a similar situation here. I''m sure I''m oversimplifying,
but the consensus
> for general filesystem-type storage needs, as I''ve read it, tends
toward
> doing ZFS RAID-Z (or RAID-Z2) on LUNS consisting of hardware RAID-0
> stripes. This gives good performance, and allows ZFS self-healing
> properties, with reasonable space utilization, while taking advantage
> of the NV cache in the array for write acceleration. Well, that''s
the
> approach that seems to match our needs, anyway.
>
> However, the disk array we have (not a new purchase) is a Hitachi (HDS)
> 9520V, consisting mostly of SATA drives. This array does not support
> RAID-0 for some reason (one can guess that HDS does not want to provide
> the ammunition for self-foot-shooting involving SATA drives). Our pressing
> question is how to configure a shelf of 15 400GB SATA drives, with the idea
> that we may add another such shelf within a year.
Thank goodness. The worst thing vendors have ever done is recognize the
existance of RAID-0. Millions of lives could have been spared if we started
counting at 1 :-)
OK, so one requirement is expandability. Good.
> Prior to ZFS, we likely would''ve setup two 6D+1P RAID-5 groups on
that
> shelf, leaving a single hot-spare, and applied UFS or SAM-QFS filesystems
> onto hardware LUN''s sliced out of those grous. The use of two
smaller
> RAID-groups seems advisable given the likely large reconstruct time on
> these 400GB 7200RPM drives.
Yes.
Note: ZFS reconstruct time is dependent on the space used, not the
size of the disk(s).
> Some options we''re considering with ZFS are:
>
> (0) One 13D+1P h/w RAID-5 group, one hot-spare, configured as 5-9
LUN''s.
> Setup ZFS pool of one RAID-Z group from all those LUN''s. With
a
> 6-LUN RAID-Z group should have ~4333GB available space (9-LUN group
> gives ~4622GB). Some block-level recovery available, but an extra
> helping of RAID-Z space overhead is lost.
>
> (1) Two 6D+1P h/w RAID-5 groups, configured as 1 LUN each. Run a simple
> stripe ZFS pool consisting of those two LUN''s. The
"con" here is that
> there is no ZFS self-healing capability, though we do gain the other
> ZFS features. We rely on tape backups for any block-level corruption
> recovery necessary. The "pro" is there is no RAID-Z space
overhead;
> ~4800GB available space.
>
> (2) Same two h/w RAID-5 groups as (1), but configured as some larger
> number of LUN''s, say 5-9 LUN''s each. Setup a ZFS
pool of two RAID-Z
> groups consisting of those 5-9 LUN''s each. We gain some ZFS
self-healing
> here for block-level issues, but sacrifice some space (again, double
> the single-layer RAID-5 space overhead). With two 6-LUN RAID-Z groups,
> should be ~4000GB available space. With 9-LUN RAID-Z groups, ~4266GB.
>
> (3) Three 4D+1P h/w RAID-5 groups, no hot spare, mapped to one LUN each.
> Setup a ZFS pool of one RAID-Z group consisting of those three
LUN''s.
> Only ~3200GB available space, but what looks like very good resiliency
> in face of multiple disk failures.
>
> (4) Same three h/w RG''s as (3) above, but configured 5-9
LUN''s each. ZFS
> pool of RAID-Z groups made from those LUN''s. With 9-LUN RAID-Z
groups,
> looks like the same 4266GB as (2) above.
>
>
> One of the unknowns I have, which hopefully the more experienced folks
> can help with, is related to (0), (2) and (4) above. I''m unsure
of what
> happens should a h/w RAID-5 group suffer a catastrophic problem, e.g. a
> dual-drive failure. Would all 5-9 LUN''s on the failed RAID-5
group go away?
> Or would just the affected blocks go away (two drives-worth), allowing
> _some_ ZFS recovery to occur? This makes the robustness of (0), (2) and
> (4) uncertain to me.
First, you need to understand that the RAID array won''t expose ZFS to
disk failures. All failures reported through the stack would appear as
block failures or LUN failures. This is subtle, but does impact the
recovery [1]. ZFS does single block recovery pretty much on the fly,
but LUN recovery could take some time. Thus your instinct to use smaller
LUNs might make sense (though this is difficult to say for sure without an
availability assessment).
[1] you will also need a good offline backup strategy. ZFS snapshots
really help here.
> Note that this particular pool of storage is intended to be served up
> to clients via NFS and/or Samba, from a single T2000 file server. We
> hope to be able to scale this solution up to 100-200TB, by adding arrays
> or JDBOD''s to the current storage.
When you add, you really want to move the data around on LUNs. I would
recommend using ZFS for RAID-1 or RAID-Z2. When you add another array,
you add another bunch of LUNs, which will then be logically added as
another RAID set. However, once you have the LUNs created, you can also
relocate the old LUNs. This is part of the "grow into your storage"
model.
The benefit is that you can "move" vdevs around as your diversity
opportunities increase over time.
[Insert crummy ASCII art here]
-- richard