thr3ads.net - zfs discuss - [zfs-discuss] ZFS: A general question [May 2008]

If this information is useful, please help other people find it:
Share via:

Steve Hull

2008-May-24 08:12 UTC

[zfs-discuss] ZFS: A general question

Hello everyone,

I''m new to ZFS and OpenSolaris, and I''ve been reading the docs
on ZFS (the pdf "The Last Word on Filesystems" and wikipedia of
course), and I''m trying to understand something.

So ZFS is self-healing, correct?  This is accomplished via parity and/or
metadata of some sort on the disk, right?  So it protects against data
corruption, but not against disk failure.  Or is it the case that ZFS
intelligently puts the parity and/or metadata on alternate disks to protect
against disk failure, even without a raid array?

Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the pool,
right?  But you can''t "effortlessly" grow/shrink this
protected array if you wanted to add a disk or two to increase your protected
storage capacity.  My understanding is that if you want to add storage to a raid
array, you must copy all your data off the array, destroy the array, recreate it
with your extra disk(s), then copy all your data back.

I like the idea of a protected storage pool that can grow and shrink
effortlessly, but if protecting your data against drive failure is not as
effortless, then honestly, what''s the point?  In my opinion, the ease
of use should be nearly that of the Drobo product.  Which brings me to my final
question: is there a gui tool available?  I can use command line just like the
next guy, but gui''s sure are convenient...

Thanks for your help!
-Steve
 
 
This message posted from opensolaris.org

Tim

2008-May-24 14:38 UTC

head link

[zfs-discuss] ZFS: A general question

On Sat, May 24, 2008 at 3:12 AM, Steve Hull <p.witty at gmail.com> wrote:
> Hello everyone,
>
> I''m new to ZFS and OpenSolaris, and I''ve been reading the
docs on ZFS (the
> pdf "The Last Word on Filesystems" and wikipedia of course), and
I''m trying
> to understand something.
>
> So ZFS is self-healing, correct?  This is accomplished via parity and/or
> metadata of some sort on the disk, right?  So it protects against data
> corruption, but not against disk failure.  Or is it the case that ZFS
> intelligently puts the parity and/or metadata on alternate disks to protect
> against disk failure, even without a raid array?
>
> Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the pool,
> right?  But you can''t "effortlessly" grow/shrink this
protected array if you
> wanted to add a disk or two to increase your protected storage capacity. 
My
> understanding is that if you want to add storage to a raid array, you must
> copy all your data off the array, destroy the array, recreate it with your
> extra disk(s), then copy all your data back.
>
> I like the idea of a protected storage pool that can grow and shrink
> effortlessly, but if protecting your data against drive failure is not as
> effortless, then honestly, what''s the point?  In my opinion, the
ease of use
> should be nearly that of the Drobo product.  Which brings me to my final
> question: is there a gui tool available?  I can use command line just like
> the next guy, but gui''s sure are convenient...
>
> Thanks for your help!
> -Steve
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

You''re thinking in terms of a home user.  ZFS was designed for an
enterprise
environment.  When they add disks, they don''t add one disk at a time,
it''s a
tray at a time at the very least.  Because of this, they aren''t ever
copying
data off of the array and back on, and no destruction is needed.  You just
add a raidz/raidz2 at a time striped across your 14 disks (or however large
the tray of disks is).

The gui is a web interface.  Just point your browser at
https://localhost:6789

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080524/9d0ec61a/attachment.html>

Rob at Logan.com

2008-May-24 14:39 UTC

head link

[zfs-discuss] ZFS: A general question

> Anyway you can add mirrored, [...], raidz, or raidz2 arrays to the pool,
right?correct.

 > add a disk or two to increase your protected storage capacity.
if its a protected vdev, like a mirror or raidz, sure... one can
force add a single disk, but then the pool isn''t protected until
you attach a mirror to that single disk. one can''t (currently)
remove a vdev (shrink a pool) but one can increase each element
of a vdev increasing the size of the pool while maintaining the
number of elements (disk count)

				Rob

Ralf Bertling

2008-May-24 16:29 UTC

head link

[zfs-discuss] ZFS: A general question

Hi Steve,
Am 24.05.2008 um 10:17 schrieb zfs-discuss-request at opensolaris.org:
> ZFS: A general question
> To: zfs-discuss at opensolaris.org
> Message-ID: <4935302.1211617017042.JavaMail.Twebapp at oss-app1>
> Content-Type: text/plain; charset=UTF-8
>
> Hello everyone,
>
> I''m new to ZFS and OpenSolaris, and I''ve been reading the
docs on
> ZFS (the pdf "The Last Word on Filesystems" and wikipedia of  
> course), and I''m trying to understand something.
>
> So ZFS is self-healing, correct?  This is accomplished via parity  
> and/or metadata of some sort on the disk, right?  So it protects  
> against data corruption, but not against disk failure.This is not entirely true, but possible. You can use the copies  
attribute to have some sort of redundancy on a single disk.
But obviously, if yo only use a single disk and it breaks completely,  
data loss can not be avoided.
Even without redundancy features ZFS, provides very good detection of  
block failure and snapshots that can be used to avoid accidental  
deletion/unwanted changes of data> Or is it the case that ZFS intelligently puts the parity and/or  
> metadata on alternate disks to protect against disk failure, even  
> without a raid array?You do not need a hardware RAID array to get these features and you  
can theoretically even use partitions/slices on a single disk, but to  
get good protection and acceptable performance, you will need multiple  
drives, since a drive can always fail in a way that it is completely  
unusable (i.e. it does not spin up anymore).>
> Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the  
> pool, right?  But you can''t "effortlessly" grow/shrink
this
> protected array if you wanted to add a disk or two to increase your  
> protected storage capacity.A number of redundant disks is called vdev - this is probably what you  
call "array". A vdev can be build from disks, files, iscsi targets or
partitions.
Several vdevs form a storage pool.
You can increase the size of a pool by adding extra vdevs or replacing  
all members of a vdev with bigger ones.> My understanding is that if you want to add storage to a raid array,  
> you must copy all your data off the array, destroy the array,  
> recreate it with your extra disk(s), then copy all your data back.This is currently true for shrinking a pool and for changing the  
number of devices in a raidz1/2 vdev.
Some efforts have been made to change that - see

http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z
Theoretically it should also be possible to "evacuate" vdevs (and  
remove them from a pool), but I do not think any code has been written  
to do so.
The main reason is that Sun''s paying customers are probably reasonably
happy to just add a vdev to increase storage, so other features are  
much higher on their priority list.>
> I like the idea of a protected storage pool that can grow and shrink  
> effortlessly, but if protecting your data against drive failure is  
> not as effortless, then honestly, what''s the point?  In my
opinion,
> the ease of use should be nearly that of the Drobo product.  Which  
> brings me to my final question: is there a gui tool available?  I  
> can use command line just like the next guy, but gui''s sure are  
> convenient...I''d say: The point is "First things first".
Sun provides a free, reasonably manageable very robust storage concept  
that does not have all desirable features (yet).
For a nice GUI-Tool you might have to wait for Mac OS X 10.6 ;-)

Hope this helps, 	
ralf

Steve Hull

2008-May-24 22:59 UTC

head link

[zfs-discuss] ZFS: A general question

OK so in my (admittedly basic) understanding of raidz and raidz2, these
technologies are very similar to raid5 and raid6.  BUT if you set up one disk as
a raidz vdev, you (obviously) can''t maintain data after a disk failure,
but you are protected against data corruption that is NOT a result of disk
failure.  Right?

So is there a resource somewhere that I could look at that clearly spells out
how many disks I could have vs. how much resulting space I would have that would
still protect me against disk failure (a la the "Drobolator"
http://www.drobo.com/drobolator/index.html)?  I mean, if I have a raidz vdev
with one disk, then I add a disk, am I protected from disk failure?  Is it the
case that I need to have disks in groups of 4 to maintain protection against
single disk failure with raidz and in groups of 5 for raidz2?  It gets even more
confusing if I wanted to add disks of varying sizes...

And you said I could add a disk (or disks) to a mirror -- can I force add a disk
(or disks) to a raidz or raidz2?  Without destroying and rebuilding as I read
would be required somewhere else?

And if I create a zpool and add various single disks to it (without creating
raidz/mirror/etc), is it the case that the zpool is essentially functioning like
spanning raid?  Ie, no protection at all??

Please either point me to an existing resource that spells this out a little
clearer or give me a little more explanation around it.

And...  do you think that the Drobo (www.drobo.com) product is essentially just
a box with OpenSolaris and ZFS on it?
 
 
This message posted from opensolaris.org

Ellis, Mike

2008-May-24 23:21 UTC

head link

[zfs-discuss] ZFS: A general question

I like the link you sent along... They did a nice job with that. 
(but it does show that mixing and matching vastly different drive-sizes
is not exactly optimal...)

	http://www.drobo.com/drobolator/index.html

Doing something like this for ZFS allowing people to create pools by
mixing/matching drives, raid1, and raidz/z2 drives in a zpool makes for
a pretty cool page. If one of the statistical gurus can add MTBF
MTTdataLoss etc. to that as a calculator at the bottom that would be
even better. (someone did some static graphs for different thumper
configurations for this in the past... This would just make that more
general purpose/GUI driven... Sounds like a cool project)

--

No mention anywhere of "removing drives" thereby reducing capacity
though... Raid-re-striping isn''t all that much fun, especially with
larger drives... (and even ZFS lacks some features in this area for now)


See the answer to you other question below. (from their FAQ)

-- MikeE



What file systems does drobo support?
 
RESOLUTION:

Drobo is a usb external disk array that is formatted by the host
operating system (Windows or OS X). We currently support NTFS, HFS+, and
FAT32 file systems with firmware revision 1.0.2.

Drobo is not a ZFS file system.

STATUS:

Current specification 1.0.2

Applies to:
Drobo DRO4D-U  




-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Steve Hull
Sent: Saturday, May 24, 2008 7:00 PM
To: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] ZFS: A general question

OK so in my (admittedly basic) understanding of raidz and raidz2, these
technologies are very similar to raid5 and raid6.  BUT if you set up one
disk as a raidz vdev, you (obviously) can''t maintain data after a disk
failure, but you are protected against data corruption that is NOT a
result of disk failure.  Right?

So is there a resource somewhere that I could look at that clearly
spells out how many disks I could have vs. how much resulting space I
would have that would still protect me against disk failure (a la the
"Drobolator" http://www.drobo.com/drobolator/index.html)?  I mean, if
I
have a raidz vdev with one disk, then I add a disk, am I protected from
disk failure?  Is it the case that I need to have disks in groups of 4
to maintain protection against single disk failure with raidz and in
groups of 5 for raidz2?  It gets even more confusing if I wanted to add
disks of varying sizes...  

And you said I could add a disk (or disks) to a mirror -- can I force
add a disk (or disks) to a raidz or raidz2?  Without destroying and
rebuilding as I read would be required somewhere else?

And if I create a zpool and add various single disks to it (without
creating raidz/mirror/etc), is it the case that the zpool is essentially
functioning like spanning raid?  Ie, no protection at all??

Please either point me to an existing resource that spells this out a
little clearer or give me a little more explanation around it.

And...  do you think that the Drobo (www.drobo.com) product is
essentially just a box with OpenSolaris and ZFS on it?
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Steve Hull

2008-May-25 01:54 UTC

head link

[zfs-discuss] ZFS: A general question

Sooo...  I''ve  been reading a lot in various places.  The conclusion
I''ve drawn is this:

I can create raidz vdevs in groups of 3 disks and add them to my zpool to be
protected against 1 drive failure.  This is the current status of "growing
protected space" in raidz.  Am I correct here?
 
 
This message posted from opensolaris.org

Erik Trimble

2008-May-25 05:27 UTC

head link

[zfs-discuss] ZFS: A general question

Steve Hull wrote:> Sooo...  I''ve  been reading a lot in various places.  The
conclusion I''ve drawn is this:
>
> I can create raidz vdevs in groups of 3 disks and add them to my zpool to
be protected against 1 drive failure.  This is the current status of
"growing protected space" in raidz.  Am I correct here?
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

Correct.  Here''s some quick summary information:

a POOL is made of 1 or more VDEVs.

POOLs consisting of more than 1 VDEV will stripe data across all the VDEVs.

VDEVS may be freely added to any POOL, but cannot currently be removed 
from a POOL.

When a vdev is added to a pool, data on the existing vdevs is not 
automatically re-distributed. That is, say you have 3 vdevs of 1GB each, 
and add another vdev of 1GB.   The system does not immediately attempt 
to re-distribute the data on the original 3 devices.  It will re-balance 
the data as you WRITE to the pool. Thus, if you expand a pool like this, 
it is a good idea to copy the data around.   i.e.
    cp /zpool/olddir /zpool/newdir
    rm -rf /zpool/olddir

If there are more than 1 vdev in a pool, the pool''s capacity is 
determined by the smallest device. Thus, if you have a 2GB, a 3GB, and a 
5GB device in a pool, the pool''s capacity is 3 x 2GB = 6GB, as ZFS will
only do full-stripes.  Thus, there really is no equivalent to 
Concatenation in other RAID solutions.

However, if you replace ALL devices in a pool with larger ones, ZFS will 
automatically expand the pool size. Thus, if you replaced the 2GB 
devices in the above case with 4GB devices, then the pool would 
automatically appear to be  3 x 4GB = 12GB.


A VDEV can consist of:
    any file
    any disk slice/partition
    a whole disk  (preferred!)
    a special sub-device,  raidz/raidz1/raidz2/mirror/cache/log/spare


For the special sub-devices, here''s a summary:

raidz  (synonym raidz1):
       You must provide at LEAST 3 storage devices (where a file, slice, 
or disk is a storage device)
       1 device''s capacity is consumed in parity.
       However, parity is scattered around the devices, thus this is 
roughly analogous to RAID-5
       Currently, devices CANNOT be added or removed from a raidz.
       It is possible to increase the size of raidz by replacing each 
drive, ONE AT A TIME, with a larger drive. But altering the NUMBER of 
drives is not possible.

raidz2:
       You must have at LEAST _4_ storage devices
       2 device''s capacity is consumed by parity.
       Like raidz, parity is scattered around the devices, improving I/O 
performance.
       Roughly analogous to RAID-6.
       Altering a raidz2 is exactly like doing a raidz.

mirror
       You must provide at LEAST 2 storage devices
       All data is replicated across all devices, acting as a "normal"
mirror.
       You can add or detach devices from a mirror at will, so long as 
they are at least a big as the original mirror.

spare
       Indicates a device which can be used as a hot spare.

log
       indicates an Intent Log, which is basically a transactional log 
of filesystem operations.
       Generally speaking, this is used only for certain 
high-performance cases, and tends to be
       used in association with enterprise-level devices, such as 
solid-state drives.

cache
       similar to an Intent Log, this provide a place to cache 
filesystem internals (metadata such as directory/file attributes)
       usually used in situations similar to log devices.

--------

All pools store redundant  metadata, so they can automatically detect 
and repair most faults in metadata.

If you vdev is raidz, raidz2 or mirror,  they store redundant data 
(which allows them to recover from losing a disk), so they can 
automatically detect AND repair block-level faults.




-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Will Murnane

2008-May-25 06:01 UTC

head link

[zfs-discuss] ZFS: A general question

On Sun, May 25, 2008 at 1:27 AM, Erik Trimble <Erik.Trimble at sun.com>
wrote:> If there are more than 1 vdev in a pool, the pool''s capacity is
> determined by the smallest device. Thus, if you have a 2GB, a 3GB, and a
> 5GB device in a pool, the pool''s capacity is 3 x 2GB = 6GB, as ZFS
will
> only do full-stripes.  Thus, there really is no equivalent to
> Concatenation in other RAID solutions.Not true.  If you have a mirrored or a raidz vdev, then the size of
that vdev is determined by the size of the smallest disk, but if you
have multiple vdevs the size of the pool is the sum of the size of the
vdevs.  I have two pools off the top of my head that illustrate this:
one with 3*120 and 1*200 that has ~550GB capacity, and one with 8*320
raidz and 8*500 raidz that has 4.47TB capacity.

Will

Rob at Logan.com

2008-May-25 19:56 UTC

head link

[zfs-discuss] ZFS: A general question

> Thus, if you have a 2GB, a 3GB, and a 5GB device in a pool, > the pool''s capacity is 3 x 2GB = 6GB

If you put the three into one raidz vdev it will be 2+2
until you replace the 2G disk with a 5G at which point
it will be 3+3 and then when you replace the 3G with a 5G
it will be 5+5G. and if you replace the 5G with a 10G
it will still be 5+5G

If one lists out the three disks so they are all their own
vdevs it will be 3x faster than the raidz and 3+2+5 in size
(see example below of mirrors and raidz vdevs of different sizes)

 > All pools store redundant  metadata, so they can
 > automatically detect  and repair most faults in metadata.

and one can `zfs set copys=2 pool/home` with the 2+3+5
stripe to automatically detect and repair most faults in data
as there is an "attempt" to store files on different vdevs
(mirrors are best)

7 % zpool iostat -v
                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
root        15.9G   100G      2      0   177K    800
   c2t0d0s7  15.9G   100G      2      0   177K    800
----------  -----  -----  -----  -----  -----  -----
z           3.28T  1.59T    379     19  26.6M   103K
   raidz1    1.83T  1.58T    207     12  14.9M  64.7K
     c0t2d0      -      -     69      6  3.84M  17.1K
     c4t1d0      -      -     69      6  3.84M  17.1K
     c0t6d0      -      -     69      6  3.84M  17.1K
     c0t4d0      -      -     69      6  3.84M  17.1K
     c4t3d0      -      -     69      6  3.84M  17.1K
   raidz1    1.44T  12.0G    172      7  11.7M  37.9K
     c4t4d0      -      -     58      5  3.06M  10.2K
     c4t6d0      -      -     58      5  3.06M  10.2K
     c0t3d0      -      -     58      5  3.06M  10.2K
     c4t2d0      -      -     58      5  3.06M  10.2K
     c0t5d0      -      -     58      5  3.06M  10.2K
----------  -----  -----  -----  -----  -----  -----

1 % zpool iostat -v
                  capacity     operations    bandwidth
pool           used  avail   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
root          5.28G  24.0G      0      0    863  2.13K
   mirror      5.28G  24.0G      0      0    863  2.13K
     c0t1d0s0      -      -      0      0    297  4.76K
     c0t0d0s0      -      -      0      0    597  4.76K
------------  -----  -----  -----  -----  -----  -----
z              230G   500G     17     76   150K   461K
   mirror      83.8G   182G      6     25  52.1K   158K
     c0t0d0s7      -      -      2     15  85.1K   248K
     c0t1d0s7      -      -      2     15  86.7K   248K
   mirror      72.6G   159G      5     26  49.4K   161K
     c0t2d0        -      -      2     19  82.7K   251K
     c0t3d0        -      -      2     19  81.2K   251K
   mirror      74.0G   158G      5     23  48.3K   142K
     c0t4d0        -      -      2     18  72.3K   232K
     c0t5d0        -      -      2     18  71.9K   232K
------------  -----  -----  -----  -----  -----  -----

Erik Trimble

2008-May-25 21:01 UTC

head link

[zfs-discuss] ZFS: A general question

Will and several other people are correct.

I had forgotten that ZFS does a funky form of concatenation when you use
different size vdevs.  I tend to ignore this case, because it''s kinda
useless (I know, I know, there''s people who use it, but, really...
<wink>)

Basically, it will stripe across vdevs as it can.

So, if you have a zpool like this:

2GB vdev
3GB vdev
4GB vdev

ZFS will have a 3-wide stripe across the first 2GB of all devices, then
a 2-wide stripe across the next 1GB of the two larger devices, then
finally a single stripe (aka no stripe) in the 1GB left in the largest
one (in this example).  So you do get the full 9GB of space.

Naturally, this produces really weird performance curves for (random)
data access. OK, maybe weird isn''t the right word, but... 


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Steve Hull

2008-May-25 21:48 UTC

head link

[zfs-discuss] ZFS: A general question

THANK YOU VERY MUCH EVERYONE!!

You have been very helpful and my questions are (mostly) resolved.  While I am
not (and probably will not become) a ZFS expert, I now at least feel confident
that I can accomplish what I want to do.

My last comment on this is this:

I realize that ZFS is designed and intended for Enterprise use, but it also has
many useful features that home and soho users appreciate.  That being said, I
feel that it still will leave most casual home and soho users a bit confused and
wishing for other features (especially ease of use).

If Sun released a software alternative to the Drobo product, I feel certain that
they would be able to very successfully market a product like this to home and
soho users.  Heck, I would buy such a piece of software (from Sun) in a hot
second.  Plus, if they based it off of ZFS and just "hid" most of the
configuration options so that your pools were automatically configured with
single parity (or mirror for 2 drive setups) -- then added the
"expand-o-matic raidz" feature, add a "shrink" feature, and
add the ability to better utilize space on differently sized drives -- it would
be awesome, and a good part of the work would already be done (ie, ZFS).  It
would be far superior to Drobo, and could probably undercut Drobo significantly
on price point.  Then it would truly be the holy grail of file systems.

In fact, depending on the license of OpenSolaris/ZFS, I wonder if a group of
independent developers could package up Vbox, OpenSolaris, a modified ZFS, and a
setup/admin utility to create such a product... that would be cool.  Again, the
"heavy lifting" would be modifying raidz so that it could
expand/shrink/better utilize space on differently sized drives.
 
 
This message posted from opensolaris.org

zfs discuss - May 2008 - ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question

[zfs-discuss] ZFS: A general question