thr3ads.net - zfs discuss - ZFS Features when Using Enterprise Arrays [Aug 2007]

If this information is useful, please help other people find it:
Share via:

John.Karwoski@Sun.COM

2007-Aug-03 15:18 UTC

ZFS Features when Using Enterprise Arrays

When using an array''s RAID
schemes, what higher level ZFS

features are not in play when ZFS Stripes/Concats are used without using

any ZFS  RaidZ or mirrors? 

I understand from 

http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid

that ZFS can only report errors but not correct them. I think ZFS  still

does copy-on-wite, and rollback on error - these are separate from 

RAID. 

Does ZFS round robin the writes across the LUNs when there is no ZFS

 RaidZ or mirrors in play?  Or do all the writes go to the first

LUN until it is full?

What other ZFS features depend on ZFS RAID ?

Thanks,

John


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tomas Ögren

2007-Aug-03 15:29 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

On 03 August, 2007 - John.Karwoski at Sun.COM sent me these 1,6K bytes:
> When using an array''s RAID schemes, what higher level ZFS
> features are not in play when ZFS Stripes/Concats are used without using
> any ZFS  RaidZ or mirrors?
> 
> I understand from
> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
> 
> that ZFS can only report errors but not correct them. I think ZFS  still
> does copy-on-wite, and rollback on error - these are separate from
> RAID.
> 
> Does ZFS round robin the writes across the LUNs when there is no ZFS
>  RaidZ or mirrors in play?  Or do all the writes go to the first
> LUN until it is full?
Round-robin across all vdevs (single disk, mirror, raidz, raidz2) in a
pool.
> What other ZFS features depend on ZFS RAID ?
Mostly the self-healing stuff.. But if it''s not zfs-redundant and a
device experiences write errors, the machine will currently panic.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

John Martinez

2007-Aug-03 17:37 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

On Aug 3, 2007, at 8:29 AM, Tomas ?gren wrote:
> On 03 August, 2007 - John.Karwoski at Sun.COM sent me these 1,6K bytes:
>
>> When using an array''s RAID schemes, what higher level ZFS
>> features are not in play when ZFS Stripes/Concats are used without  
>> using
>> any ZFS  RaidZ or mirrors?
>>
>> I understand from
>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
>>
>> that ZFS can only report errors but not correct them. I think ZFS   
>> still
>> does copy-on-wite, and rollback on error - these are separate from
>> RAID.
>>
>> Does ZFS round robin the writes across the LUNs when there is no ZFS
>>  RaidZ or mirrors in play?  Or do all the writes go to the first
>> LUN until it is full?
>
> Round-robin across all vdevs (single disk, mirror, raidz, raidz2) in a
> pool.
That''s good to hear.
>> What other ZFS features depend on ZFS RAID ?
>
> Mostly the self-healing stuff.. But if it''s not zfs-redundant and
a
> device experiences write errors, the machine will currently panic.
Wow, this is certainly worse than the current VxVM/VxFS  
implementation. At least there I get I/O errors and disk groups get  
failed or disabled.

-john

Alderman, Sean

2007-Aug-03 18:35 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

The OP here is posting the "Z"illion dollar question ... And apologies
in advance for the verbal diarrhea.

Most of the Enterprise Level systems people here (my company) look at ZFS and
say, "Wow that''s really cool...but..."  What comes after the
"but..." is a host of questions that ultimately come down to how much
does ZFS cost?

What is the cost of running ZFS RAIDZ on top of a Enterprise Storage System
that''s already RAID 1+0 or RAID 5?

As a "manager" of systems, how do I justify switching from tried and
true SVM/UFS or VxFS/VxVM on high speed redundant storage storage?  The nice
features are the never needing to go offline to manage storage the server sees,
snapshots, clones, etc. don''t seem to make up for the loss of the
ability to repair in the event of a failure.  We monitor heavily, we can
schedule a maintenance window when necessary, and we can cope with an outage
(however painful) that requires an FSCK or even a tape restore...for as often as
it happens (once in the last 5 years I believe).  Does this mean that my
environment is too low on the totem pole for ZFS?  I''m pretty sure we
subscribe to a five 9''s uptime SLA.

A gentleman yesterday posted the zpool status below that used SAN Devices. 
Suppose each device is 100GB of RAID 1+0 storage.

  pool: ms2
 state: ONLINE
 scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
config:
        NAME                                       STATE     READ WRITE CKSUM
        ms2                                        ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600C0FF0000000000A7E0A0E6F8A1000d0  ONLINE       0     0     0
            c4t600C0FF0000000000A7E8D1EA7178800d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600C0FF0000000000A7E0A7219D78100d0  ONLINE       0     0     0
            c4t600C0FF0000000000A7E8D7B3709D800d0  ONLINE       0     0     0
errors: No known data errors

This configuration shows 400GB (800GB actual behind the SAN) and my usable space
is 200GB.  That''s 25% in storage capacity alone, and I''m sure
there are other costs in RAID X over RAID Y that are less tangible.  So, is that
worth it?

Am I supposed to suggest that we go double the capacity in our RAID 1+0 CLARiioN
so that I can implement ZFS 1+0 and not sacrifice any storage capacity?  Am I
supposed to suggest that the storage crew abandon RAID 1+0 on their devices in
order for ZFS to provide fault tolerance?  Either way, this makes ZFS a very
tough sell.  How would I historically show that the investment was worth it when
ZFS probably never sees a checksum error because the Storage System hides
failures so well?

On the other hand, if I were to configure my pool like this...

  pool: ms2
 state: ONLINE
 scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
config:
        NAME                                       STATE     READ WRITE CKSUM
        ms2                                        ONLINE       0     0     0
          c4t600C0FF0000000000A7E0A0E6F8A1000d0    ONLINE       0     0     0
          c4t600C0FF0000000000A7E8D1EA7178800d0    ONLINE       0     0     0
          c4t600C0FF0000000000A7E0A7219D78100d0    ONLINE       0     0     0
          c4t600C0FF0000000000A7E8D7B3709D800d0    ONLINE       0     0     0
errors: No known data errors

I''d have a nice 400GB (800GB actual) pool.  I''d still have my
hard RAID 1+0, but now a single checksum error on any one LUN would render the
entire file system unusable.  There is NO way to replace or repair with out
destroying the entire pool.  What is the likelihood of that happening?  And what
would cause such a thing?  I have run the Self Healing Demo against a both of
the above pool configurations, the latter is not pretty.

With Storage Systems providing their own snap/clone facilities (like BCVs with
EMC) it only gets more difficult as Storage and Server teams work largely
independent of each other.  I''d really like to push ZFS for data
storage on all of our new hardware going forward, but unless I can justify over
ruling the Storage System''s RAID 1+0 or dropping my capacity
utilization from 50% to 25%, I haven''t got much ground to stand on.  Is
anyone else paddling in my canoe?

--
Sean

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at
opensolaris.org] On Behalf Of Tomas ?gren
Sent: Friday, August 03, 2007 11:30 AM
To: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] ZFS Features when Using Enterprise Arrays

On 03 August, 2007 - John.Karwoski at Sun.COM sent me these 1,6K bytes:
> When using an array''s RAID schemes, what higher level ZFS features
are
> not in play when ZFS Stripes/Concats are used without using any ZFS  
> RaidZ or mirrors?
> 
> I understand from
> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
> 
> that ZFS can only report errors but not correct them. I think ZFS  
> still does copy-on-wite, and rollback on error - these are separate 
> from RAID.
> 
> Does ZFS round robin the writes across the LUNs when there is no ZFS  
> RaidZ or mirrors in play?  Or do all the writes go to the first LUN 
> until it is full?
Round-robin across all vdevs (single disk, mirror, raidz, raidz2) in a pool.
> What other ZFS features depend on ZFS RAID ?
Mostly the self-healing stuff.. But if it''s not zfs-redundant and a
device experiences write errors, the machine will currently panic.

/Tomas
--
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Mario Goebbels

2007-Aug-03 18:36 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

>>> What other ZFS features depend on ZFS RAID ?
>> Mostly the self-healing stuff.. But if it''s not zfs-redundant
and a
>> device experiences write errors, the machine will currently panic.
> 
> Wow, this is certainly worse than the current VxVM/VxFS  
> implementation. At least there I get I/O errors and disk groups get  
> failed or disabled.
Yeah, this is strange behavior. Depending on when/how the error shows
up, a server could be sent on a reboot party.

I haven''t run into I/O error issues yet, but by the time that happens,
I
hope ZFS will generate an event that can be easily trapped by an
application that sends warning emails or automagically IMs. Same for the
desktop scenario, where Gnome would be notified and pops up a system
modal error dialog or something.

-mg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 648 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070803/ad140d87/attachment.bin>

Wade.Stuart at fallon.com

2007-Aug-03 20:46 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

zfs-discuss-bounces at opensolaris.org wrote on 08/03/2007 01:35:04 PM:
> The OP here is posting the "Z"illion dollar question ... And
> apologies in advance for the verbal diarrhea.
>
> Most of the Enterprise Level systems people here (my company) look
> at ZFS and say, "Wow that''s really cool...but..."  What
comes after
> the "but..." is a host of questions that ultimately come down to
how
> much does ZFS cost?
>
> What is the cost of running ZFS RAIDZ on top of a Enterprise Storage
> System that''s already RAID 1+0 or RAID 5?
>
I really don''t get what you are asking?  VS vxfs/vxvm and svm? then the
additional cost is none (or negative cost vs licensing vxfs/vxvm/snap).

> As a "manager" of systems, how do I justify switching from tried
and
> true SVM/UFS or VxFS/VxVM on high speed redundant storage storage?
> The nice features are the never needing to go offline to manage
> storage the server sees, snapshots, clones, etc. don''t seem to
make
> up for the loss of the ability to repair in the event of a failure.
> We monitor heavily, we can schedule a maintenance window when
> necessary, and we can cope with an outage (however painful) that
> requires an FSCK or even a tape restore...for as often as it happens
> (once in the last 5 years I believe).  Does this mean that my
> environment is too low on the totem pole for ZFS?  I''m pretty sure
> we subscribe to a five 9''s uptime SLA.
>
Yet you have no way to know if your uptime includes spewing out invalid
data.

> A gentleman yesterday posted the zpool status below that used SAN
> Devices.  Suppose each device is 100GB of RAID 1+0 storage.
>
>   pool: ms2
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
> config:
>         NAME                                       STATE     READ WRITE
CKSUM>         ms2                                        ONLINE       0     0
0>           mirror                                   ONLINE       0     0
0>             c4t600C0FF0000000000A7E0A0E6F8A1000d0  ONLINE       0     0
0>             c4t600C0FF0000000000A7E8D1EA7178800d0  ONLINE       0     0
0>           mirror                                   ONLINE       0     0
0>             c4t600C0FF0000000000A7E0A7219D78100d0  ONLINE       0     0
0>             c4t600C0FF0000000000A7E8D7B3709D800d0  ONLINE       0     0
0> errors: No known data errors
>
> This configuration shows 400GB (800GB actual behind the SAN) and my
> usable space is 200GB.  That''s 25% in storage capacity alone, and
> I''m sure there are other costs in RAID X over RAID Y that are less
> tangible.  So, is that worth it?
25% vs 25% for vxfs/vxvm and svm in similar configurations.  Striped config
(which is what you are using in vxvm and svm now right?) has no additional
penalty.
>
> Am I supposed to suggest that we go double the capacity in our RAID
> 1+0 CLARiioN so that I can implement ZFS 1+0 and not sacrifice any
> storage capacity?  Am I supposed to suggest that the storage crew
> abandon RAID 1+0 on their devices in order for ZFS to provide fault
> tolerance?  Either way, this makes ZFS a very tough sell.
Well the easy sell is to use ZFS as you use vxfs/vxvm and svm now (stripe)
-- you still gain checksumming data (but not self heal data -- only
metadata), snaps (free vs license), compression, pooling, etc...
> How would
> I historically show that the investment was worth it when ZFS
> probably never sees a checksum error because the Storage System
> hides failures so well?
>
You can''t.  Maybe show the last time your emc had a failed disk on a
RAID
lun group -- emc went to scrub before replace and showed yet another disk
in the same lun group that was bad and you had to restore from tape or emc
fibbed and replaced the disk with suspect parity data.
> On the other hand, if I were to configure my pool like this...
>
>   pool: ms2
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
> config:
>         NAME                                       STATE     READ WRITE
CKSUM>         ms2                                        ONLINE       0     0
0>           c4t600C0FF0000000000A7E0A0E6F8A1000d0    ONLINE       0     0
0>           c4t600C0FF0000000000A7E8D1EA7178800d0    ONLINE       0     0
0>           c4t600C0FF0000000000A7E0A7219D78100d0    ONLINE       0     0
0>           c4t600C0FF0000000000A7E8D7B3709D800d0    ONLINE       0     0
0> errors: No known data errors
>
> I''d have a nice 400GB (800GB actual) pool.  I''d still
have my hard
> RAID 1+0, but now a single checksum error on any one LUN would
> render the entire file system unusable.  >
No,  copied from another thread:

To clarify - ditto blocks are used: 3 copies for pool metadata, each
copy on different lun if possible, 2 copies for each file system
metadata with each copy on different lun. This means that file system
meta data corruptions should self-heal in a non-redundand config
(symlinks being an exception now, but there''s RFE to fix it).



> There is NO way to replace
> or repair with out destroying the entire pool.
Delete the file and restore -- also you may want to call EMC and ask why
your host is being fed corrupted data without any failures showing on the
EMC. If the checksum error overlaps the zfs metadata ditto blocks and makes
metadata self heal fail then you restore from tape.  If you lose access to
a lun in the stripe you go down -- just like with vxvm and svm.  How is
this not better then vxvm and svm?

> What is the
> likelihood of that happening?  And what would cause such a thing?  I
> have run the Self Healing Demo against a both of the above pool
> configurations, the latter is not pretty.
Depends what opensolaris/solaris bits you are on.  Newer bits handle this
better and should keep you up and heal the metadata if metadata dittos are
available. Either way how the heck does svm and vxvm handle this for you
currently? =)

>
> With Storage Systems providing their own snap/clone facilities (like
> BCVs with EMC) it only gets more difficult as Storage and Server
> teams work largely independent of each other.
Hmm,  in most environments I have seen,  BCVs have been used on the os/app
side after the admin states the machine -- what good are random snaps of a
unknown state?  Sure the storage guys grant them to you (BCV space), but do
you really not own the snap side too?
> I''d really like to
> push ZFS for data storage on all of our new hardware going forward,
> but unless I can justify over ruling the Storage System''s RAID 1+0
> or dropping my capacity utilization from 50% to 25%, I haven''t got
> much ground to stand on.  Is anyone else paddling in my canoe?
It may help if you don''t sabotage your own arguments for ZFS.  Bottom
line
is ZFS (even in stripe mode) buys you more than vxvm or svm for less cost
($$ and time).  Try to compare ZFS stripe to vxvm stripe.  ZFS raidz to
vxvm raid. ZFS should come out ahead, except for a few places such as user
quotas and evacuating luns.  Those are coming sometime.

Richard Elling

2007-Aug-03 22:46 UTC

head link

[zfs-discuss] ZFS Features when Using Enterprise Arrays

Alderman, Sean wrote:> I''d have a nice 400GB (800GB actual) pool.  I''d still
have my hard RAID 1+0, but now a single checksum error on any one LUN would
render the entire file system unusable.No, this is not a correct assumption.  The "panic when ZFS sees an 
error" case is
for *writes* where ZFS has no other option to write the data correctly 
(copies=1
*and* zpool has no redundancy.  Some other file systems will patiently wait,
perhaps  forever.  The real solution to this requires changes beyond 
ZFS, which
is perhaps why it is not already finished (I don''t work directly on
this
code, so I
can''t say for sure.)

For errors on reads, only the affected file is affected.
 -- richard

zfs discuss - Aug 2007 - ZFS Features when Using Enterprise Arrays

ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays

[zfs-discuss] ZFS Features when Using Enterprise Arrays