thr3ads.net - zfs discuss - [zfs-discuss] ZFS Large scale deployment model [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Brian Kolaci

2010-Mar-02 15:08 UTC

[zfs-discuss] ZFS Large scale deployment model

We have a virtualized environment of T-Series where each host has either zones
or LDoms.
All of the virtual systems will have their own dedicated storage on ZFS (and
some may also get raw LUNs).  All the SAN storage is delivered in fixed sized
33GB LUNs.

The question I have to the community is whether it would be better to have a
pool per virtual system, or create a large pool and carve out ZFS file systems
per virtual system?  The trade-offs are with the one large pool, you''ll
be able to take advantage of dedup when it becomes available in Solaris, however
you then have all your eggs in one basket.  It was originally thought that if we
have a pool per virtual host, then when you migrate a host, you could migrate
the storage too, but that''s not looking feasible in this environment.

What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN
environment?  Have people successfully recovered?

In this environment the redundancy is performed at the hardware level, not at
the host.  So there''s no chance of self-healing here.  Yes, its been
discussed, but because of the legacy storage environment thats shared with other
non-ZFS systems, they require redundancy at the hardware level, and they
won''t budge on that and won''t do additional redundancy at the
ZFS level.

So given the environment, would it be better for lots of small pools, or a large
shared pool?

Thanks,

Brian

Bob Friesenhahn

2010-Mar-02 16:09 UTC

head link

[zfs-discuss] ZFS Large scale deployment model

On Tue, 2 Mar 2010, Brian Kolaci wrote:>
> What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN
> environment?  Have people successfully recovered?
The probability of corruption in a "SAN environment" depends entirely 
on your SAN environment.  With proper design, the probability should 
be "zero".  If it is non-zero then there must be a design defect in 
your SAN hardware which is liable to do harm at any time (aka "snake 
in the grass").
> In this environment the redundancy is performed at the hardware level, not
at
> the host.  So there''s no chance of self-healing here.  Yes, its
been
> discussed, but because of the legacy storage environment thats shared with 
> other non-ZFS systems, they require redundancy at the hardware level, and 
> they won''t budge on that and won''t do additional
redundancy at the ZFS level.
That is unfortunate.  In this case, probabilty of failure increases 
from "zero".
> So given the environment, would it be better for lots of small 
> pools, or a large shared pool?
I think that a larger shared pool will be more satisfying and less 
wasteful of resources.  However, a large pool written to a single huge 
SAN LUN suffers from concurrency issues.  ZFS loses the ability to 
intelligently schedule I/O for individual disks and instead must use 
the strategy to post a lot of (up to 35) simultaneous I/Os and hope 
for the best.

Bob

P.S. The term "zero" is quoted since it does not account for
Murphy''s
      Law.
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2010-Mar-02 16:23 UTC

head link

[zfs-discuss] ZFS Large scale deployment model

On Mar 2, 2010, at 7:08 AM, Brian Kolaci wrote:> We have a virtualized environment of T-Series where each host has either
zones or LDoms.
> All of the virtual systems will have their own dedicated storage on ZFS
(and some may also get raw LUNs).  All the SAN storage is delivered in fixed
sized 33GB LUNs.
There really isn''t a losing proposition here, whatever you do the data
can
be reasonably safe with good performance.
> The question I have to the community is whether it would be better to have
a pool per virtual system, or create a large pool and carve out ZFS file systems
per virtual system?  The trade-offs are with the one large pool, you''ll
be able to take advantage of dedup when it becomes available in Solaris, however
you then have all your eggs in one basket.  It was originally thought that if we
have a pool per virtual host, then when you migrate a host, you could migrate
the storage too, but that''s not looking feasible in this environment.
Fewer pools are easier to manage and use space more efficiently.
> What is probability of corruption with ZFS in Solaris 10 U6 and up in a SAN
environment?  Have people successfully recovered?
> 
> In this environment the redundancy is performed at the hardware level, not
at the host.  So there''s no chance of self-healing here.  Yes, its been
discussed, but because of the legacy storage environment thats shared with other
non-ZFS systems, they require redundancy at the hardware level, and they
won''t budge on that and won''t do additional redundancy at the
ZFS level.
heh, how would they know you''ve set copies=2? ;-)
> So given the environment, would it be better for lots of small pools, or a
large shared pool?
I''d go for the easiest path to manage.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)

Brian Kolaci

2010-Mar-02 18:44 UTC

head link

[zfs-discuss] ZFS Large scale deployment model

On Mar 2, 2010, at 11:09 AM, Bob Friesenhahn wrote:
> On Tue, 2 Mar 2010, Brian Kolaci wrote:
>> 
>> What is probability of corruption with ZFS in Solaris 10 U6 and up in a
SAN environment?  Have people successfully recovered?
> 
> The probability of corruption in a "SAN environment" depends
entirely on your SAN environment.  With proper design, the probability should be
"zero".  If it is non-zero then there must be a design defect in your
SAN hardware which is liable to do harm at any time (aka "snake in the
grass").
The only problem here is that they''ve already had a few corruptions. 
They were on U3 however, but I realize a lot of changes went in on U6 and
I''m not aware of any since then.  Unfortunately I wan''t told
about them until after they rebuilt the pools, and the devices were reused.  All
of the corruptions I''m aware of were OS disk images on ZFS in the
control domain.  One I believe happened when they filled the pool because of a
snapshot in place, and another when the control domain was patched &
rebooted while it had guests still running.  But I don''t know the
extent of the corruption whether it was pool-wide or just files and/or metadata.
I don''t know if anything could have been recovered.  I know my only
experience with a corrupted pool (which I''m still working on) there
were 2 files and one dataset found bad.  And a recent one was due to someone
accidentally adding the same devices to two different pools.  I had him quickly
copy as much data off as he could (got 5 out of 7 disk images copied) before it
paniced.

So if there is corruption, can it be safely isolated so as to not affect other
datasets or LDoms?  Or would it be likely to take down the whole pool?
> 
>> In this environment the redundancy is performed at the hardware level,
not at the host.  So there''s no chance of self-healing here.  Yes, its
been discussed, but because of the legacy storage environment thats shared with
other non-ZFS systems, they require redundancy at the hardware level, and they
won''t budge on that and won''t do additional redundancy at the
ZFS level.
> 
> That is unfortunate.  In this case, probabilty of failure increases from
"zero".
Yes, and need to expect the SA''s to do the unexpected.
> 
>> So given the environment, would it be better for lots of small pools,
or a large shared pool?
> 
> I think that a larger shared pool will be more satisfying and less wasteful
of resources.  However, a large pool written to a single huge SAN LUN suffers
from concurrency issues.  ZFS loses the ability to intelligently schedule I/O
for individual disks and instead must use the strategy to post a lot of (up to
35) simultaneous I/Os and hope for the best.
This is what I had written in my document too.  But all the SAN LUNs are only
33GB (yes, even when they need over a TB there are lots of 33GB LUNs pooled). 
They''re trying to get very efficient with storage, which is why I
started down this path.  But I''m also trying to measure the risk, if
any, for moving to a larger pool rather than a bunch of smaller per-virtual-host
pools.  Regardless of the approach, the LUNs are all 33GB and thats the
granularity of all allocations.

Bob Friesenhahn

2010-Mar-02 22:15 UTC

head link

[zfs-discuss] ZFS Large scale deployment model

On Tue, 2 Mar 2010, Brian Kolaci wrote:>
> So if there is corruption, can it be safely isolated so as to not 
> affect other datasets or LDoms?  Or would it be likely to take down 
> the whole pool?
It seems like you are asking if there could be a software bug or a 
firmware/hardware bug in the SAN.  It is always possible for there to 
be a bug.  The bug might inflict the host system even if it uses many 
smaller pools and the bug might not even be related to zfs.
>From an administrative and performance standpoint it seems like the large pool is better.
>> I think that a larger shared pool will be more satisfying and less 
>> wasteful of resources.  However, a large pool written to a single 
>
> This is what I had written in my document too.  But all the SAN LUNs 
> are only 33GB (yes, even when they need over a TB there are lots of 
> 33GB LUNs pooled).  They''re trying to get very efficient with 
> storage, which is why I started down this path.  But I''m also
trying
> to measure the risk, if any, for moving to a larger pool rather than 
> a bunch of smaller per-virtual-host pools.  Regardless of the 
> approach, the LUNs are all 33GB and thats the granularity of all 
> allocations.
The large pool will surely be more efficient with storage since it is 
unlikely that all consumers will want/need to consume all of the space 
pre-allocated for them.  In fact, if there was an expected 
per-consumer average of less than 50% consumption, you could use the 
large pool with mirroring and still have enough space. :-)

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Reasonably Related Threads

Search for more maybe matching threads

zfs discuss - Mar 2010 - ZFS Large scale deployment model

[zfs-discuss] ZFS Large scale deployment model

[zfs-discuss] ZFS Large scale deployment model

[zfs-discuss] ZFS Large scale deployment model

[zfs-discuss] ZFS Large scale deployment model

[zfs-discuss] ZFS Large scale deployment model

Reasonably Related Threads