thr3ads.net - zfs discuss - [zfs-discuss] ZFS and "Dinamic Stripe" [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jose Luis Barquín Guerola

2009-Jun-29 10:12 UTC

[zfs-discuss] ZFS and "Dinamic Stripe"

Hello.
I have a question about how ZFS works with "Dinamic Stripe".

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free but only
100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 1
disk.

The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe that change
this situation and become in 200MB free with the speed of 5 disks?
   - If the answer is yes, how is it does? in the background?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org

Erik Trimble

2009-Jun-29 13:54 UTC

head link

[zfs-discuss] ZFS and "Dinamic Stripe"

Jose Luis Barqu?n Guerola wrote:> Hello.
> I have a question about how ZFS works with "Dinamic Stripe".
>
> Well, start with the next situation:
>   - 4 Disk of 100MB in stripe format under ZFS.
>   - We use the stripe in a 75%, so we have free 100MB. (easy)
>
> Well, we add a new disk of 100MB in the pool. So we have 200MB free but
only 100MB will have the speed of 4 disk and, the rest 100MB will have the speed
of 1 disk.
>
> The questions are:
>    - Have ZFS any kind of reorganization of the data in the stripe that
change this situation and become in 200MB free with the speed of 5 disks?
>    - If the answer is yes, how is it does? in the background?
>
> Thanks for your time (and sorry for my english).
>
> JLBG
>   
When you add more vdevs to the zpool, NEW data is written to the new 
stripe width.   That is, when data was written to the original pool, it 
was written across 4 drives. It now will be written across 5 drives.  
Existing data WILL NOT be changed.

So, for a zpool 75% full, you will NOT get to immediately use the first 
75% of the new vdevs added.

Thus, in your case, you started with a 400MB zpool (with 300MB of data). 
You added another 100MB vdev, resulting in a 500MB zpool.   300MB is 
written across 4 drives, and will have the appropriate speed.  75% of 
the new vdev isn''t immediately usable (as it corresponds to the 75% 
in-use on the other 4 vdevs), so you effectively only have added 25MB of 
immediately usable space.  Thus, you have:

300MB across 4 vdevs
125MB across 5 vdevs
75MB "wasted" space on 1 vdev

To correct this - that is, to recover the 75MB of "wasted" space and
to
move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you need to 
re-write the entire existing data space. Right now, there is no 
background or other automatic method to do this.  ''cp -rp'' or
''rsync'' is
a good idea.  

We really should have something like ''zpool scrub'' do this
automatically.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Richard Elling

2009-Jun-29 15:02 UTC

head link

[zfs-discuss] ZFS and "Dinamic Stripe"

Erik Trimble wrote:> Jose Luis Barqu?n Guerola wrote:
>> Hello.
>> I have a question about how ZFS works with "Dinamic Stripe".
>>
>> Well, start with the next situation:
>>   - 4 Disk of 100MB in stripe format under ZFS.
>>   - We use the stripe in a 75%, so we have free 100MB. (easy)
>>
>> Well, we add a new disk of 100MB in the pool. So we have 200MB free 
>> but only 100MB will have the speed of 4 disk and, the rest 100MB will 
>> have the speed of 1 disk.
>>
>> The questions are:
>>    - Have ZFS any kind of reorganization of the data in the stripe 
>> that change this situation and become in 200MB free with the speed of 
>> 5 disks?
>>    - If the answer is yes, how is it does? in the background?
Yes, new writes are biased towards the more-empty vdev.
>>
>> Thanks for your time (and sorry for my english).
>>
>> JLBG
>>   
>
> When you add more vdevs to the zpool, NEW data is written to the new 
> stripe width.   That is, when data was written to the original pool, 
> it was written across 4 drives. It now will be written across 5 
> drives.  Existing data WILL NOT be changed.
>
> So, for a zpool 75% full, you will NOT get to immediately use the 
> first 75% of the new vdevs added.
>
> Thus, in your case, you started with a 400MB zpool (with 300MB of 
> data). You added another 100MB vdev, resulting in a 500MB zpool.   
> 300MB is written across 4 drives, and will have the appropriate 
> speed.  75% of the new vdev isn''t immediately usable (as it 
> corresponds to the 75% in-use on the other 4 vdevs), so you 
> effectively only have added 25MB of immediately usable space.  Thus, 
> you have:
>
> 300MB across 4 vdevs
> 125MB across 5 vdevs
> 75MB "wasted" space on 1 vdev
>
> To correct this - that is, to recover the 75MB of "wasted" space
and
> to move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you 
> need to re-write the entire existing data space. Right now, there is 
> no background or other automatic method to do this.  ''cp
-rp'' or
> ''rsync'' is a good idea. 
> We really should have something like ''zpool scrub'' do
this automatically.
>
No.  Dynamic striping is not RAID-0, which is what you are describing.
In a dynamic stripe, the data written is not divided up amongst the current
devices in the stripe.  Rather, data is chunked and written to the vdevs.
When about 500 kBytes has been written to a vdev, the next chunk is
written to another vdev.  The choice of which vdev to go to next is based,
in part, on the amount of free space available on the vdev.  So you get
your cake (stochastic spreading of data across vdevs) and you get to
eat it (use all available space), too.
 -- richard

Jose Luis Barquín Guerola

2009-Jun-29 16:12 UTC

head link

[zfs-discuss] ZFS and "Dinamic Stripe"

Thank you "Relling" and "et151817" for your answers.

So just to end the post:

Relling supouse the next situation:
   One zpool in "Dinamic Stripe" with two disk, one of 100MB and the
second with 200MB

if the spread is "stochastic spreading of data across vdevs" you will
have the double of possibilities of   save one chunk in the second disk than in
the first, right?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org

Richard Elling

2009-Jun-29 16:58 UTC

head link

[zfs-discuss] ZFS and "Dinamic Stripe"

Jose Luis Barqu?n Guerola wrote:> Thank you "Relling" and "et151817" for your answers.
>
> So just to end the post:
>
> Relling supouse the next situation:
>    One zpool in "Dinamic Stripe" with two disk, one of 100MB and
the second with 200MB
>
> if the spread is "stochastic spreading of data across vdevs" you
will have the double of possibilities of   save one chunk in the second disk
than in the first, right?
>   
The simple answer is yes.

The more complex answer is that copies will try to be spread across
different vdevs.  Metadata, by default, uses copies=2, so you could
expect the metadata to be more evenly spread across the disks.
 -- richard

Rob Logan

2009-Jun-29 20:35 UTC

head link

[zfs-discuss] ZFS and "Dinamic Stripe"

> try to be spread across different vdevs.
% zpool iostat -v
                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
z            686G   434G     40      5  2.46M   271K
   c1t0d0s7   250G   194G     14      1   877K  94.2K
   c1t1d0s7   244G   200G     15      2   948K  96.5K
   c0d0       193G  39.1G     10      1   689K  80.2K


note that c0d0 is basically full, but still serving 10
of every 15 reads, and 82% of the writes.

			Rob

zfs discuss - Jun 2009 - ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"

[zfs-discuss] ZFS and "Dinamic Stripe"