thr3ads.net - zfs discuss - [zfs-discuss] Adding disk to a RAID-Z? [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Tom Buskey

2007-Jan-08 20:36 UTC

[zfs-discuss] Adding disk to a RAID-Z?

I want to setup a ZFS server with RAID-Z.  Right now I have 3 disks.  In 6
months, I want to add a 4th drive and still have everything under RAID-Z without
a backup/wipe/restore scenario.  Is this possible?

I''ve used NetApps in the past (1996 even!) and they do it.  I think
they''re using RAID4.
 
 
This message posted from opensolaris.org

Peter Schuller

2007-Jan-08 22:42 UTC

head link

[zfs-discuss] Adding disk to a RAID-Z?

> I want to setup a ZFS server with RAID-Z.  Right now I have 3 disks.  In 6
> months, I want to add a 4th drive and still have everything under RAID-Z
> without a backup/wipe/restore scenario.  Is this possible?
You can add additional storage to the same pool effortlessly, such that the 
pool will be striped across two raidz:s. You cannot (AFAIK) expand the raidz 
itself. End result is 9 disks, with 7 disks worth of effective storage 
capacity. The ZFS administratiion guide contains examples of doing exactly 
this, except I believe the examples use mirrors.

ZFS administration guide:

http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

Martin

2007-Jan-09 04:28 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

> I want to setup a ZFS server with RAID-Z.  Right now
> I have 3 disks.  In 6 months, I want to add a 4th
> drive and still have everything under RAID-Z without
> a backup/wipe/restore scenario.  Is this possible?
I am trying to figure out how to code this right now, as I see it being one of
most needed and ignored features of ZFS.  Unfortunately, there exists precious
little documentation of how the stripes are laid out, so I find myself studying
the code.

In addition to having the ability to add/remove a data drive, I can see use
cases for:
* Adding/removing arbitrary numbers of parity drives.
Raidz2 uses Reed-Solomon codes for the 2nd parity, which implies that there is
no practical limit on the number of parity drives.
* Maximizing the use of different disk sizes
Allowing the stripe geometry to vary throughout the vdev would allow maximal use
of space for different size devices, while preserving the desired fault
tolerance.

If such capabilities exist, you could start with a single disk vdev and grow it
to consume a large disk farm with any number of parity drives, all while the
system is fully available.
 
 
This message posted from opensolaris.org

Tom Buskey

2007-Jan-09 14:03 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

[i]* Maximizing the use of different disk sizes[/i]

[i]If such capabilities exist, you could start with a single disk vdev and grow
it to consume a large disk farm with any number of parity drives, all while the
system is fully available.[/i]

Now you''re just teasing me ;-)
 
 
This message posted from opensolaris.org

Anon

2007-Jan-09 20:52 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

I agree for non enterprise users the expansion of raidz vdevs is a critical
missing feature.
 
 
This message posted from opensolaris.org

Martin

2007-Jan-10 03:32 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

> I agree for non enterprise users the expansion of
> raidz vdevs is a critical missing feature.
Now you''ve got me curious.  I''m not trying to be inflammatory
here, but how is online expansion a non-enterprise feature?  From my
perspective, enterprise users are the ones most likely to keep legacy
filesystems for extended lengths of time, well past any rational usage plan. 
Enterprise users are also the ones most likely to need 24/7 availability.  Any
hacker-in-a-basement can take a storage pool offline to expand or contract it,
while enterprise users lack this luxury.

Experience taught me that enterprise users most need future flexibility and zero
downtime.

Again, I''m not arguing here, only interested in your contrasting
viewpoint.
 
 
This message posted from opensolaris.org

Matt Ingenthron

2007-Jan-10 04:29 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hi Martin,

Martin wrote:>> I agree for non enterprise users the expansion of
>> raidz vdevs is a critical missing feature.
>>     
>
> Now you''ve got me curious.  I''m not trying to be
inflammatory here, but how is online expansion a non-enterprise feature?  From
my perspective, enterprise users are the ones most likely to keep legacy
filesystems for extended lengths of time, well past any rational usage plan. 
Enterprise users are also the ones most likely to need 24/7 availability.  Any
hacker-in-a-basement can take a storage pool offline to expand or contract it,
while enterprise users lack this luxury.
>   Not exactly.  All users would lack the ability to expand a raidz dev 
(which in turn could require resilvering so it comes with lots of other 
Enterprise feature questions), but it''s possible now to expand a pool 
containing raidz devs-- and this is the more likely case with enterprise 
users:

# ls -lh /var/tmp/fakedisk/
total 1229568
-rw------T   1 root     root        100M Jan  9 20:22 disk1
-rw------T   1 root     root        100M Jan  9 20:22 disk2
-rw------T   1 root     root        100M Jan  9 20:22 disk3
-rw------T   1 root     root        100M Jan  9 20:22 disk4
-rw------T   1 root     root        100M Jan  9 20:22 disk5
-rw------T   1 root     root        100M Jan  9 20:22 disk6
# zpool create test raidz /var/tmp/fakedisk/disk1 /var/tmp/fakedisk/disk2
/var/tmp/fakedisk/disk3
# zpool list test
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
test                    286M    155K    286M     0%  ONLINE     -
# zpool add test raidz /var/tmp/fakedisk/disk4 /var/tmp/fakedisk/disk5
/var/tmp/fakedisk/disk6
# zpool list test
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
test                    572M    159K    572M     0%  ONLINE     -



Otherwise, you''re absolutely correct.  I think some enterprise users 
would probably like to have the ability to expand/contract even raidz 
groups.  I''m sure it''s possible to implement this, and luckily
ZFS was
designed with the ability to add these features over the course of 
time.  Still, it''s better to get ZFS out and in use sooner rather than 
later, right?> Experience taught me that enterprise users most need future flexibility and
zero downtime.
>   With respect to expanding a pool based on raidz vdevs (and definitely 
with respect to expanding a filesystem), that''s available today, with 
the limitation that you can''t expand a raidz group itself.

Regards,

- Matt

-- 
Matt Ingenthron - Web Infrastructure Solutions Architect
Sun Microsystems, Inc. - Client Solutions, Systems Practice
http://blogs.sun.com/mingenthron/
email: matt.ingenthron at sun.com             Phone: 310-242-6439



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070109/44086bcb/attachment.html>

Kyle McDonald

2007-Jan-10 16:33 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Martin wrote:>> I agree for non enterprise users the expansion of
>> raidz vdevs is a critical missing feature.
>>     
>
> Now you''ve got me curious.  I''m not trying to be
inflammatory here, but how is online expansion a non-enterprise feature?  From
my perspective, enterprise users are the ones most likely to keep legacy
filesystems for extended lengths of time, well past any rational usage plan. 
Enterprise users are also the ones most likely to need 24/7 availability.  Any
hacker-in-a-basement can take a storage pool offline to expand or contract it,
while enterprise users lack this luxury.
>
> Experience taught me that enterprise users most need future flexibility and
zero downtime.
>
> Again, I''m not arguing here, only interested in your contrasting
viewpoint.
>   I think the original poster, was thinking that non-enterprise users 
would be most interested in only having to *purchase* one drive at a time.

Enterprise users aren''t likely to balk at purchasing 6-10 drives at a 
time, so for them adding an additional *new* RaidZ to stripe across is 
easier.

Remember though that it''s been mathematically figured that the 
disadvantages to RaidZ start to show up after 9 or 10 drives. (That''s 
been posted in this list earlier. I don''t know that they''re
great
disadvantages - and probably even less to non-enterprise users, so I 
think this would be useful.) So Most enterprise users are going to go 
the ''new raidz'' route.

    -Kyle

>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Tom Buskey

2007-Jan-10 17:27 UTC

head link

[zfs-discuss] Re: Re: Adding disk to a RAID-Z?

[i]Enterprise feature questions), but it''s possible now to expand a
pool
containing raidz devs-- and this is the more likely case with
enterprise users:

<pre># ls -lh /var/tmp/fakedisk/
total 1229568
-rw------T 1 root root 100M Jan 9 20:22 disk1
-rw------T 1 root root 100M Jan 9 20:22 disk2
-rw------T 1 root root 100M Jan 9 20:22 disk3
-rw------T 1 root root 100M Jan 9 20:22 disk4
-rw------T 1 root root 100M Jan 9 20:22 disk5
-rw------T 1 root root 100M Jan 9 20:22 disk6
# zpool create test raidz /var/tmp/fakedisk/disk1 /var/tmp/fakedisk/disk2
/var/tmp/fakedisk/disk3
# zpool list test
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 286M 155K 286M 0% ONLINE -
# zpool add test raidz /var/tmp/fakedisk/disk4 /var/tmp/fakedisk/disk5
/var/tmp/fakedisk/disk6
# zpool list test
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 572M 159K 572M 0% ONLINE -

</pre>[/i]

Does this mean I can expand a raidz pool?  That I could take my 3 disk raidz and
add a 4th disk into the pool?
 
 
This message posted from opensolaris.org

Tom Buskey

2007-Jan-10 17:39 UTC

head link

[zfs-discuss] Re: Re: Adding disk to a RAID-Z?

[i]I think the original poster, was thinking that non-enterprise users
would be most interested in only having to *purchase* one drive at a time.

Enterprise users aren''t likely to balk at purchasing 6-10 drives at a
time, so for them adding an additional *new* RaidZ to stripe across is
easier.
[/i]

Yes.  I have $xxx to spend on disks and can afford 3.  As my needs increase,
I''ll have saved enough to buy another disk.

Traditionally, you RAID your disks together then use a VM to divvy it up into
partitions that can grow/shrink as needed.  The total size of the RAID
isn''t important until you''ve filled it.  Then you want to
increase the RAID.

You could just add new RAID chunks and have a VM on each chunk.  But
you''d be wasting some of your space.  The incremental cost of the added
space is the same as the original RAID.

3*n*R5=2n
4*n*R5=3n.

Or doubling the disks:
6*n*R5=5n
   vs 
3*n*R5 + 3*n*R5 = 2n + 2n = 4n (6 disks)
or 3*n*R5 + 4*n*R5 = 2n + 3n = 5n (7 disks)

The cost of scaling/loss of space is balanced against the cost of
backup/wipe&reraid/restore.
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Jan-10 19:17 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hello Kyle,

Wednesday, January 10, 2007, 5:33:12 PM, you wrote:

KM> Remember though that it''s been mathematically figured that the 
KM> disadvantages to RaidZ start to show up after 9 or 10 drives.
(That''s

Well, nothing like this was proved and definitely not mathematically.

It''s just a common sense advise - for many users keeping raidz groups
below 9 disks should give good enough performance. However if someone
creates raidz group of 48 disks he/she probable expects also
performance and in general raid-z wouldn''t offer one.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Kyle McDonald

2007-Jan-10 21:34 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Robert Milkowski wrote:> Hello Kyle,
>
> Wednesday, January 10, 2007, 5:33:12 PM, you wrote:
>
> KM> Remember though that it''s been mathematically figured that
the
> KM> disadvantages to RaidZ start to show up after 9 or 10 drives.
(That''s
>
> Well, nothing like this was proved and definitely not mathematically.
>
> It''s just a common sense advise - for many users keeping raidz
groups
> below 9 disks should give good enough performance. However if someone
> creates raidz group of 48 disks he/she probable expects also
> performance and in general raid-z wouldn''t offer one.
>
>
>   It''s very possible I misstated something. :)

I thought I had read though, something like over 9 or so disks would put 
mean that each FS block would be written to less than a single disk 
block on each disk?

Or maybe it was that waiting to read from all drives for files less than 
a FS block would suffer?

Ahhh...  I can''t remember what the effect were thought to be. I thought
there was some theoretical math involved though.

I do remember people advising against it though. Not just on a 
performance basis, but also on a increased risk of failure basis. I 
think it was just seen as a good balancing point.

    -Kyle

Jason J. W. Williams

2007-Jan-10 21:54 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hi Kyle,

I think there was a lot of talk about this behavior on the RAIDZ2 vs.
RAID-10 thread. My understanding from that discussion was that every
write stripes the block across all disks on a RAIDZ/Z2 group, thereby
making writing the group no faster than writing to a single disk.
However reads are much faster, as all the disk are activated in the
read process.

The default config on the X4500 we received recently was RAIDZ-groups
of 6 disks (across the 6 controllers) striped together into one large
zpool.

Best Regards,
Jason

On 1/10/07, Kyle McDonald <Kyle.McDonald at bigbandnet.com>
wrote:> Robert Milkowski wrote:
> > Hello Kyle,
> >
> > Wednesday, January 10, 2007, 5:33:12 PM, you wrote:
> >
> > KM> Remember though that it''s been mathematically figured
that the
> > KM> disadvantages to RaidZ start to show up after 9 or 10 drives.
(That''s
> >
> > Well, nothing like this was proved and definitely not mathematically.
> >
> > It''s just a common sense advise - for many users keeping
raidz groups
> > below 9 disks should give good enough performance. However if someone
> > creates raidz group of 48 disks he/she probable expects also
> > performance and in general raid-z wouldn''t offer one.
> >
> >
> >
> It''s very possible I misstated something. :)
>
> I thought I had read though, something like over 9 or so disks would put
> mean that each FS block would be written to less than a single disk
> block on each disk?
>
> Or maybe it was that waiting to read from all drives for files less than
> a FS block would suffer?
>
> Ahhh...  I can''t remember what the effect were thought to be. I
thought
> there was some theoretical math involved though.
>
> I do remember people advising against it though. Not just on a
> performance basis, but also on a increased risk of failure basis. I
> think it was just seen as a good balancing point.
>
>     -Kyle
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Robert Milkowski

2007-Jan-10 23:16 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hello Jason,

Wednesday, January 10, 2007, 10:54:29 PM, you wrote:

JJWW> Hi Kyle,

JJWW> I think there was a lot of talk about this behavior on the RAIDZ2 vs.
JJWW> RAID-10 thread. My understanding from that discussion was that every
JJWW> write stripes the block across all disks on a RAIDZ/Z2 group, thereby
JJWW> making writing the group no faster than writing to a single disk.
JJWW> However reads are much faster, as all the disk are activated in the
JJWW> read process.

The opposite actually. Because of COW, writing (modifying as well)
will give you up-to N-1 disks performance for raid-z1 and N-2 disks performance
for
raid-z2. Howeer reading can be slow in case of many small random reads
as to read each fs block you''ve got to wait for all data disks in a
group.


JJWW> The default config on the X4500 we received recently was RAIDZ-groups
JJWW> of 6 disks (across the 6 controllers) striped together into one large
JJWW> zpool.

However the problem with that config is lack of hot-spare.
Of course it depends what you want (and there was no hot spare support
in U2 which is os installed in factory so far).


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Wade.Stuart at fallon.com

2007-Jan-10 23:30 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

zfs-discuss-bounces at opensolaris.org wrote on 01/10/2007 05:16:33 PM:
> Hello Jason,
>
> Wednesday, January 10, 2007, 10:54:29 PM, you wrote:
>
> JJWW> Hi Kyle,
>
> JJWW> I think there was a lot of talk about this behavior on the RAIDZ2
vs.> JJWW> RAID-10 thread. My understanding from that discussion was that
every> JJWW> write stripes the block across all disks on a RAIDZ/Z2 group,
thereby> JJWW> making writing the group no faster than writing to a single disk.
> JJWW> However reads are much faster, as all the disk are activated in
the
> JJWW> read process.
>
> The opposite actually. Because of COW, writing (modifying as well)
> will give you up-to N-1 disks performance for raid-z1 and N-2 disks
> performance for
> raid-z2. Howeer reading can be slow in case of many small random reads
> as to read each fs block you''ve got to wait for all data disks in
a
> group.
>
>
> JJWW> The default config on the X4500 we received recently was
RAIDZ-groups> JJWW> of 6 disks (across the 6 controllers) striped together into one
large> JJWW> zpool.
>
> However the problem with that config is lack of hot-spare.
> Of course it depends what you want (and there was no hot spare support
> in U2 which is os installed in factory so far).
>
Yeah, this kinda ticked me off, first thing I notice is that the thumper
that was on back order for 3 months waiting for U3 fixes was shipped with
U2 + patches.  Called support to try to track down if U3 base was
installable with/without patches and spent 3 days of off and on calling to
get to someone who could find the info (sun''s internal documentation
was
locked down and unpublished to support at the time).  5 out of 6 support
engineers I talked to did not even realize that U3 was released (three
weeks after the fact). It also took 4 (long) calls to clarify that it did
infact need 220 power (at the time I ordered it was listed as 110, and it
shipped with 110 rated cables).

Long story short,  I wiped and reinstalled with U3 and raidz2 with
hostspares like it should have had in the first place.

-Wade

Jason J. W. Williams

2007-Jan-10 23:46 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hi Robert,

I read the following section from
http://blogs.sun.com/roch/entry/when_to_and_not_to as indicating
random writes to a RAID-Z had the performance of a single disk
regardless of the group size:
>Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
>behave as   a single   device in  terms  of  delivered    random input
>IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
>globally act as a 200-IOPS capable RAID-Z group.

Best Regards,
Jason

On 1/10/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:> Hello Jason,
>
> Wednesday, January 10, 2007, 10:54:29 PM, you wrote:
>
> JJWW> Hi Kyle,
>
> JJWW> I think there was a lot of talk about this behavior on the RAIDZ2
vs.
> JJWW> RAID-10 thread. My understanding from that discussion was that
every
> JJWW> write stripes the block across all disks on a RAIDZ/Z2 group,
thereby
> JJWW> making writing the group no faster than writing to a single disk.
> JJWW> However reads are much faster, as all the disk are activated in
the
> JJWW> read process.
>
> The opposite actually. Because of COW, writing (modifying as well)
> will give you up-to N-1 disks performance for raid-z1 and N-2 disks
performance for
> raid-z2. Howeer reading can be slow in case of many small random reads
> as to read each fs block you''ve got to wait for all data disks in
a
> group.
>
>
> JJWW> The default config on the X4500 we received recently was
RAIDZ-groups
> JJWW> of 6 disks (across the 6 controllers) striped together into one
large
> JJWW> zpool.
>
> However the problem with that config is lack of hot-spare.
> Of course it depends what you want (and there was no hot spare support
> in U2 which is os installed in factory so far).
>
>
> --
> Best regards,
>  Robert                            mailto:rmilkowski at task.gda.pl
>                                        http://milek.blogspot.com
>
>

Robert Milkowski

2007-Jan-10 23:50 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hello Jason,

Thursday, January 11, 2007, 12:46:32 AM, you wrote:

JJWW> Hi Robert,

JJWW> I read the following section from
JJWW> http://blogs.sun.com/roch/entry/when_to_and_not_to as indicating
JJWW> random writes to a RAID-Z had the performance of a single disk
JJWW> regardless of the group size:
>>Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
>>behave as   a single   device in  terms  of  delivered    random input
>>IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
>>globally act as a 200-IOPS capable RAID-Z group.

"random input IOPS" means random reads not writes.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2007-Jan-10 23:53 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hello Wade,

Thursday, January 11, 2007, 12:30:40 AM, you wrote:

WSfc> Long story short,  I wiped and reinstalled with U3 and raidz2 with
WSfc> hostspares like it should have had in the first place.

The same here.

Besides I always put "my own" system and I''m not using
preinstalled
ones - except when x4500s arrive I run small script (dd + scrubbing)
for 2-3 days to see if everything works fine before putting into
production. Then I re-install.



-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Peter Schuller

2007-Jan-11 00:08 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

> It''s just a common sense advise - for many users keeping raidz
groups
> below 9 disks should give good enough performance. However if someone
> creates raidz group of 48 disks he/she probable expects also
> performance and in general raid-z wouldn''t offer one.
There is at least one reason for wanting more drives in the same 
raidz/raid5/etc: redundancy.

Suppose you have 18 drives. Having two raidz:s constisting of 9 drives is 
going to mean you are more likaly to fail than having a single raidz2 
consisting of 18 drives, since in the former case yes - two drives can go 
down, but only if they are the *right* two drives. In the latter case any two 
drives can go down.

The ZFS administration guide mentions this recommendation, but does not give 
any hint as to why. A reader may assume/believe it''s just general
adviced,
based on someone''s opinion that with more than 9 drives, the
statistical
probability of failure is too high for raidz (or raid5). It''s a shame
the
statement in the guide is not further qualified to actually explain that 
there is a concrete issue at play.

(I haven''t looked into the archives to find the previously mentioned 
discussion.)

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

Robert Milkowski

2007-Jan-11 00:25 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hello Peter,

Thursday, January 11, 2007, 1:08:38 AM, you wrote:
>> It''s just a common sense advise - for many users keeping raidz
groups
>> below 9 disks should give good enough performance. However if someone
>> creates raidz group of 48 disks he/she probable expects also
>> performance and in general raid-z wouldn''t offer one.
PS> There is at least one reason for wanting more drives in the same 
PS> raidz/raid5/etc: redundancy.

PS> Suppose you have 18 drives. Having two raidz:s constisting of 9 drives is
PS> going to mean you are more likaly to fail than having a single raidz2 
PS> consisting of 18 drives, since in the former case yes - two drives can go
PS> down, but only if they are the *right* two drives. In the latter case any
two
PS> drives can go down.

PS> The ZFS administration guide mentions this recommendation, but does not
give
PS> any hint as to why. A reader may assume/believe it''s just
general adviced,
PS> based on someone''s opinion that with more than 9 drives, the
statistical
PS> probability of failure is too high for raidz (or raid5). It''s a
shame the
PS> statement in the guide is not further qualified to actually explain that
PS> there is a concrete issue at play.

I don''t know if ZFS MAN pages should teach people about RAID.

If somebody doesn''t understand RAID basics then some kind of tool
where you just specify pool of disk and have to choose from: space
efficient, performance, non-redundant and that''s it - all the rest
will be hidden.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Martin

2007-Jan-11 02:05 UTC

head link

[zfs-discuss] Re: Re[2]: Re: Adding disk to a RAID-Z?

> Hello Kyle,
> 
> Wednesday, January 10, 2007, 5:33:12 PM, you wrote:
> 
> KM> Remember though that it''s been mathematically
> figured that the 
> KM> disadvantages to RaidZ start to show up after 9
> or 10 drives. (That''s 
> 
> Well, nothing like this was proved and definitely not
> mathematically.
> 
> It''s just a common sense advise - for many users
> keeping raidz groups
> below 9 disks should give good enough performance.
> However if someone
> creates raidz group of 48 disks he/she probable
> expects also
> performance and in general raid-z wouldn''t offer one.
Wow, lots of good discussion here.  I started the idea of allowing a RAIDZ group
to grow to arbitrary drives because I was unaware of the downsides to massive
pools.  From my RAID5 experience, a perfect world would be large numbers of data
spindles and a sufficient number of parity spindles, e.g. 99+17 (99 data drives
and 17 parity drives).  In RAID5 this would give massive iops and redundancy.

After studying the code and reading the blogs, a few things have jumped out,
with some interesting (and sometimes goofy) implications.  Since I am still
learning, I could be wrong on any of the following.

RAIDZ pools operate with a storage granularity of one stripe.  If you request a
read of a block within the stripe, you get the whole stripe.  If you modify a
block within the stripe, the whole stripe is written to a different location
(ala COW).

This implies that ANY read will require the whole stripe, therefore all spindles
to seek and read a sector.  All drives will return the sectors (mostly)
simultaneously.  For performance purposes, a RAIDZ pool seeks like a single
drive would and has the throughput of multiple drives.  Unlike traditional
RAID5, adding more spindles does NOT increase read IOPS.

Another implication is ZFS checksums the stripe, not the component sectors.  If
a drive silently returns a bad sector, ZFS only knows is that the whole stripe
is bad (which could probably also be inferred from a bogus parity sector).  ZFS
has no clue which drive produced bad data, only that the whole stripe failed the
checksum.  ZFS finds the offending sector by process of elimination: going
through the sectors one at a time, throwing away the data actually read,
reconstructing the data from the parity then determining if the stripe passes
the checksum.

Two parity drives make this a bigger problem still, almost squaring the number
of computations needed.  If a stripe has enough parity drives, then the cost of
determining N bad data sectors in a stripe is roughly O(k^N), where k is some
constant.

Another implication is that there is no RAID5 "write penalty."  More
accurately, the write penalty is incurred during the read operation where an
entire stripe is read.

Finally, there is no need to rotate parity.  Rotating parity was introduced in
RAID5 because every write of a single sector in a stripe also necessitated the
read and subsequent write of the parity sector.  Since there are no partial
stripe writes in ZFS, there is no need to read then write the parity sector.

For those in the know, where I am off base here?

Thanks!
Marty

This message posted from opensolaris.org

Erik Trimble

2007-Jan-11 08:16 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Robert Milkowski wrote:> I don''t know if ZFS MAN pages should teach people about RAID.
>
> If somebody doesn''t understand RAID basics then some kind of tool
> where you just specify pool of disk and have to choose from: space
> efficient, performance, non-redundant and that''s it - all the rest
> will be hidden.
>   Actually, this would be really nice to put into some sort of a low-level 
CLI tool, ala "mdadm" for linux.  That is, a nice little tool that 
presents you with a list of drives, allows you to select which ones 
you''d like to put into a ZFS pool, then lets you select from a couple
of
different options based on performance/space/redundancy.   It would also 
prompt for answers to some of the common options.  Figure a nice little 
perl/python script, or even a Borne-shell script would be sufficient.

Target audience would be non-sysadmin folks, plus entry-level admins.

Also, on another note:   adding drives to a RAIDZ[2] isn''t that 
important for enterprise folks with massive disk arrays, since creating 
new pools (or adding to a stripe RAIDZs) is the most likely action when 
acquiring new disk space.

However, adding to a RAIDZ is really, really, really common action in 
the mid-size and small-size business, as well as at the department 
level  for enterprises.  These people tend to have setups of a couple of 
dozen disks at best, but do occasionally either add single disks or 
rarely add a whole disk shelf.  Migrating data for these folks is a 
royal pain (they generally don''t have the super-experienced staff, or 
their staff is severely overworked), so it would be really nice to 
provide them with this functionality.

Given that the x64 stuff is now a huge part of Sun''s business, and that
we''re selling large chunks of them to mid-size companies or large 
companies for department/remote office use, we should definitely 
consider this target audience as at least equal to our traditional 
enterprise market in terms of feature priority.

:-)

-Erik

Peter Schuller

2007-Jan-11 08:57 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

> PS> The ZFS administration guide mentions this recommendation, but does
not
> give PS> any hint as to why. A reader may assume/believe it''s
just general
> adviced, PS> based on someone''s opinion that with more than 9
drives, the
> statistical PS> probability of failure is too high for raidz (or raid5).
> It''s a shame the PS> statement in the guide is not further
qualified to
> actually explain that PS> there is a concrete issue at play.
>
> I don''t know if ZFS MAN pages should teach people about RAID.
>
> If somebody doesn''t understand RAID basics then some kind of tool
> where you just specify pool of disk and have to choose from: space
> efficient, performance, non-redundant and that''s it - all the rest
> will be hidden.
But the guide *does* make a recommendation, but does not qualify it. And if 
there is a problem specific to ZFS that is NOT just obvious results of some 
general principle, that''s very relevant for the ZFS administration
guide IMO
(and man pages for that matter).

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

Cindy Swearingen

2007-Jan-11 18:24 UTC

head link

[zfs-discuss] Re: Adding disk to a RAID-Z?

Hi Peter,

I think you must be referring to this section in the ZFS admin guide:

http://docs.sun.com/app/docs/doc/819-5461/6n7ht6qrr?a=view

If you are creating a RAID-Z configuration with many disks, as in this 
example, a RAID-Z configuration with 14 disks is better split into a two 
7-disk groupings. RAID-Z configurations with single-digit groupings of 
disks should perform better.

This is a general recommendation about performance of RAID and not a
comment about disk failure probabilities.

If this isn''t the text that is causing you grief, please let me know
and
I''ll fix that one.

Maintaining a balance in the admin guide between providing ZFS features
and examples and a kitchen sink of everything you wanted to know or do 
with ZFS is a difficult task. :-)

I hope to include more links to blogs and our developing ZFS best 
practices site in the Admin Guide to provide more practical 
recommendations based on what you want to do with ZFS.

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Cindy

Peter Schuller wrote:>>PS> The ZFS administration guide mentions this recommendation, but
does not
>>give PS> any hint as to why. A reader may assume/believe
it''s just general
>>adviced, PS> based on someone''s opinion that with more than
9 drives, the
>>statistical PS> probability of failure is too high for raidz (or
raid5).
>>It''s a shame the PS> statement in the guide is not further
qualified to
>>actually explain that PS> there is a concrete issue at play.
>>
>>I don''t know if ZFS MAN pages should teach people about RAID.
>>
>>If somebody doesn''t understand RAID basics then some kind of
tool
>>where you just specify pool of disk and have to choose from: space
>>efficient, performance, non-redundant and that''s it - all the
rest
>>will be hidden.
> 
> 
> But the guide *does* make a recommendation, but does not qualify it. And if
> there is a problem specific to ZFS that is NOT just obvious results of some
> general principle, that''s very relevant for the ZFS administration
guide IMO
> (and man pages for that matter).
>

zfs discuss - Jan 2007 - Adding disk to a RAID-Z?

[zfs-discuss] Adding disk to a RAID-Z?

[zfs-discuss] Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Re[2]: Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?

[zfs-discuss] Re: Adding disk to a RAID-Z?