thr3ads.net - zfs discuss - [zfs-discuss] ZFS on-disk DDT block arrangement [Apr 2010]

If this information is useful, please help other people find it:
Share via:

taemun

2010-Apr-06 15:52 UTC

[zfs-discuss] ZFS on-disk DDT block arrangement

I was wondering if someone could explain why the DDT is seemingly
(from empirical observation) kept in a huge number of individual blocks,
randomly written across the pool, rather than just a large binary chunk
somewhere.

Having been victim of the reaaaally long times it takes to destroy a dataset
that has dedup=on, I was wondering why that was. From memory, when the
destroy process was running, something like iopattern -r showed constant 99%
random reads. This seems like a very wasteful approach to allocating blocks
for the DDT.

Having deleted the 900GB dataset, finally, I now only have around 152GB
(allocated PSIZE) left deduped on that pool.
# zdb -DD tank
DDT-sha256-zap-duplicate: 310684 entries, size 578 on disk, 380 in core
DDT-sha256-zap-unique: 1155817 entries, size 2438 on disk, 1783 in core

So 1466501 DDT blocks. For 152GB of data, that''s around 108KB/block on
average, which seems sane.

To destroy the dataset holding the files which reference the DDT, I''m
looking at 1.46 million random reads to complete the operation (less those
elements in ARC or L2ARC). That''s a lot of read operations for my poor
spindles.

I''ve seen some people saying that the DDT blocks are around 270 bytes
each,
but does it really matter, if the smallest block that zfs can read/write
(for obvious reasons) is 512 bytes? Clearly 2x 270B > 512B, but
couldn''t
there be some way of grouping DDT elements together (in say, 1MB blocks)?

Thoughts?

(side note: can someone explain the "size xxx on disk, xxx in core"
statements in that zdb output for me? The numbers never seem related to the
number of entries or .... anything.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/a38fd589/attachment.html>

Daniel Carosone

2010-Apr-06 20:44 UTC

head link

[zfs-discuss] ZFS on-disk DDT block arrangement

On Wed, Apr 07, 2010 at 01:52:23AM +1000, taemun wrote:> I was wondering if someone could explain why the DDT is seemingly
> (from empirical observation) kept in a huge number of individual blocks,
> randomly written across the pool, rather than just a large binary chunk
> somewhere.
It''s not really a question of physical allocation contiguity, or
pre-allocating in larger chunks.  Remember that this would not be
maintained after updates in a CoW system anyway. 

It''s a question of access pattern.  The DDT is indexed by block
hash. Hashes are effectively random (for the purposes of this
discussion), and so updates to the DDT for blocks in any other
order than block hash, is effectively random-order. 

There''s not really an effective way to (say) remove blocks in
block-hash order.  There might be room for some optimisations here and
there (maybe freeing the blocks of each object in hash-order) but the
overall access pattern is still going to be heavily random-order.
> (side note: can someone explain the "size xxx on disk, xxx in
core"
> statements in that zdb output for me? The numbers never seem related to the
> number of entries or .... anything.)
I''ve not yet seen a good one, though there has been some speculation,
from me included.

--
Dan.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/1253e911/attachment.bin>

zfs discuss - Apr 2010 - ZFS on-disk DDT block arrangement

[zfs-discuss] ZFS on-disk DDT block arrangement

[zfs-discuss] ZFS on-disk DDT block arrangement