thr3ads.net - zfs discuss - [zfs-discuss] space allocation vs. thin provisioning [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Mike Gerdts

2007-Sep-14 14:44 UTC

[zfs-discuss] space allocation vs. thin provisioning

Short question:

I''m curious as to how ZFS manages space (free and used) and how
its usage interacts with thin provisioning provided by HDS
arrays.  Is there any effort to minimize the number of provisioned
disk blocks that get writes so as to not negate any space
benefits that thin provisioning may give?


Background & more detailed questions:

In Jeff Bonwick''s blog[1], he talks about free space management
and metaslabs.  Of particular interest is the statement: "ZFS
divides the space on each virtual device into a few hundred
regions called metaslabs."

1. http://blogs.sun.com/bonwick/entry/space_maps

In Hu Yoshida''s (CTO, Hitachi Data Systems) blog[2] there is a
discussion of thin provisioning at the enterprise array level.
Of particular interest is the statement: "Dynamic Provisioning is
not a panacea for all our storage woes. There are applications
that do a hard format or write across the volume when they do an
allocation and that would negate the value of thin provisioning."
In another entry[3] he goes on to say: "Capacity is allocated to
''thin'' volumes from this pool in units of 42 MB
pages...."

2. http://blogs.hds.com/hu/2007/05/dynamic_or_thin_provisioning.html
3. http://blogs.hds.com/hu/2007/05/thin_and_wide_.html

This says that any time that a 42 MB region gets one sector
written, 42 MB of storage is permanently[4] allocated to the
virtual LUN.

4. Until the LUN is destroyed, that is.

I know that ZFS does not do a write across all of the disk as
part of formatting.  Does it, however, drop some sort of metaslab
data structures on each of those "few hundred regions"?

When space is allocated, does it make an attempt to spread the
allocations across all of the metaslabs, or does it more or less
fill up one metaslab before moving to the next?

As data is deleted, do the freed blocks get reused before never
used blocks?

Is there any collaboration between the storage vendors and ZFS
developers to allow the file system to tell the storage array
"this range of blocks is unused" so that the array can reclaim
the space?  I could see this as useful when doing re-writes of
data (e.g. crypto rekey) to concentrate data that had become
scattered into contiguous space.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Victor Latushkin

2007-Sep-14 18:08 UTC

head link

[zfs-discuss] space allocation vs. thin provisioning

Mike Gerdts wrote:> Short question:
Not so short really :-)

Answers to som questions inline. I think others will correct me if I''m 
wrong.
> I''m curious as to how ZFS manages space (free and used) and how
> its usage interacts with thin provisioning provided by HDS
> arrays.  Is there any effort to minimize the number of provisioned
> disk blocks that get writes so as to not negate any space
> benefits that thin provisioning may give?
> 
> 
> Background & more detailed questions:
> 
> In Jeff Bonwick''s blog[1], he talks about free space management
> and metaslabs.  Of particular interest is the statement: "ZFS
> divides the space on each virtual device into a few hundred
> regions called metaslabs."
> 
> 1. http://blogs.sun.com/bonwick/entry/space_maps
> 
> In Hu Yoshida''s (CTO, Hitachi Data Systems) blog[2] there is a
> discussion of thin provisioning at the enterprise array level.
> Of particular interest is the statement: "Dynamic Provisioning is
> not a panacea for all our storage woes. There are applications
> that do a hard format or write across the volume when they do an
> allocation and that would negate the value of thin provisioning."
> In another entry[3] he goes on to say: "Capacity is allocated to
> ''thin'' volumes from this pool in units of 42 MB
pages...."
> 
> 2. http://blogs.hds.com/hu/2007/05/dynamic_or_thin_provisioning.html
> 3. http://blogs.hds.com/hu/2007/05/thin_and_wide_.html
> 
> This says that any time that a 42 MB region gets one sector
> written, 42 MB of storage is permanently[4] allocated to the
> virtual LUN.
> 
> 4. Until the LUN is destroyed, that is.
> 
> I know that ZFS does not do a write across all of the disk as
> part of formatting.  Does it, however, drop some sort of metaslab
> data structures on each of those "few hundred regions"?
No, it does not need to format disk in any way as metadata like space 
map information is kept in the DMU object which do not differ in nature 
from other DMU objects and may be stored anywhere.
> When space is allocated, does it make an attempt to spread the
> allocations across all of the metaslabs, or does it more or less
> fill up one metaslab before moving to the next?
Answer to this question is contained in here:
http://blogs.sun.com/bonwick/entry/zfs_block_allocation
In short, outer metaslabs (lower DVAs) are assigned higher weight, and 
previously used metaslabs also get weight boost.
> As data is deleted, do the freed blocks get reused before never
> used blocks?It depends. Current implementation of space map allocator function is so 
that it changes cursor position on metaslab only when we allocate data, 
but does not touch it when data is freed. So this looks like answer is 
''no''. But at the same time when we reach the end of space map,
we start
again from the beginning of space map so we have a chance to allocate 
previously freed, so the answer is ''yes''.
> Is there any collaboration between the storage vendors and ZFS
> developers to allow the file system to tell the storage array
> "this range of blocks is unused" so that the array can reclaim
> the space?  I could see this as useful when doing re-writes of
> data (e.g. crypto rekey) to concentrate data that had become
> scattered into contiguous space.I think that currently there are no such mechanism, but it does not mean 
it cannot be developed.

Hth,
Victor

Moore, Joe

2007-Sep-14 18:15 UTC

head link

[zfs-discuss] space allocation vs. thin provisioning

Mike Gerdts wrote: > I''m curious as to how ZFS manages space (free and used) and how
> its usage interacts with thin provisioning provided by HDS
> arrays.  Is there any effort to minimize the number of provisioned
> disk blocks that get writes so as to not negate any space
> benefits that thin provisioning may give?
I was trying to compose an email asking almost the exact same question,
but in the context of array-based replication.  They''re similar in the
sense that you''re asking about using already-written space, rather than
to go off into virgin sectors of the disks (in my case, in the hope that
the previous write is still waiting to be replicated and thus can be
replaced by the current data)
> 
> 
> Background & more detailed questions:
> 
> In Jeff Bonwick''s blog[1], he talks about free space management
> and metaslabs.  Of particular interest is the statement: "ZFS
> divides the space on each virtual device into a few hundred
> regions called metaslabs."
> 
> 1. http://blogs.sun.com/bonwick/entry/space_maps
I wish I''d have seen this blog while I was composing my question... it
answers some of my questions about how things work (plus Jeff''s
zfs_block_allocation entry actually moots most of my comments since
they''ve already been implemented)

(snip)> 
> As data is deleted, do the freed blocks get reused before never
> used blocks?
I didn''t see any code where this would happen.  

I would really love to see a zpool setting where I can specify the reuse
algorithm.  (For example: zpool set block_reuse_policy=mru or =dense or
=broad or =low)

MRU (most recently used) in the hopes that the storage replication
hasn''t yet committed the previous write to the other side of the WAN
DENSE (reuse any previously-written space) in the thin-provisioning case
BROAD (venture off into new space when possible) for media that has a
rewrite cycle limitations (flash drives) to spread the writes over as
much of the media as possible
LOW (prioritize low-block# space) would provide optimal rotational
latency for random i/o in the fututre and might be a special case of the
above.  The corresponding HIGH would improve sequential i/o.

(Implementation is left as an exercise to the reader ;)
> 
> Is there any collaboration between the storage vendors and ZFS
> developers to allow the file system to tell the storage array
> "this range of blocks is unused" so that the array can reclaim
> the space?  I could see this as useful when doing re-writes of
> data (e.g. crypto rekey) to concentrate data that had become
> scattered into contiguous space.
Deallocating storage space is something that nobody seems to be good at:
ever tried to shrink a filesystem?  Or a ZFS pool?  Or a SAN RAID group?

--Joe

Mike Gerdts

2007-Sep-14 18:33 UTC

head link

[zfs-discuss] space allocation vs. thin provisioning

On 9/14/07, Moore, Joe <jmoore at ugs.com> wrote:> I was trying to compose an email asking almost the exact same question,
> but in the context of array-based replication.  They''re similar in
the
> sense that you''re asking about using already-written space, rather
than
> to go off into virgin sectors of the disks (in my case, in the hope that
> the previous write is still waiting to be replicated and thus can be
> replaced by the current data)
At one point, I thought this was how data replication should happen
too.  However, unless you have two consecutive writes to the same
space, coalescing the writes could make it so that the data
(generically, including fs metadata) on the replication target may be
corrupt.  Generally speaking, you need to have in-order writes to
ensure that you maintain "crash consistent" data integrity in the
event of a various failure modes.

Of course, I can see how writes could be batched coalesced and applied
in a journaled manner such that each batch fully applies or is rolled
back on the target.  I haven''t heard of this being done.

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Possibly Parallel Threads

Search for more maybe matching threads

zfs discuss - Sep 2007 - space allocation vs. thin provisioning

[zfs-discuss] space allocation vs. thin provisioning

[zfs-discuss] space allocation vs. thin provisioning

[zfs-discuss] space allocation vs. thin provisioning

[zfs-discuss] space allocation vs. thin provisioning

Possibly Parallel Threads