I was wondering if someone could explain why the DDT is seemingly
(from empirical observation) kept in a huge number of individual blocks,
randomly written across the pool, rather than just a large binary chunk
somewhere.
Having been victim of the reaaaally long times it takes to destroy a dataset
that has dedup=on, I was wondering why that was. From memory, when the
destroy process was running, something like iopattern -r showed constant 99%
random reads. This seems like a very wasteful approach to allocating blocks
for the DDT.
Having deleted the 900GB dataset, finally, I now only have around 152GB
(allocated PSIZE) left deduped on that pool.
# zdb -DD tank
DDT-sha256-zap-duplicate: 310684 entries, size 578 on disk, 380 in core
DDT-sha256-zap-unique: 1155817 entries, size 2438 on disk, 1783 in core
So 1466501 DDT blocks. For 152GB of data, that''s around 108KB/block on
average, which seems sane.
To destroy the dataset holding the files which reference the DDT, I''m
looking at 1.46 million random reads to complete the operation (less those
elements in ARC or L2ARC). That''s a lot of read operations for my poor
spindles.
I''ve seen some people saying that the DDT blocks are around 270 bytes
each,
but does it really matter, if the smallest block that zfs can read/write
(for obvious reasons) is 512 bytes? Clearly 2x 270B > 512B, but
couldn''t
there be some way of grouping DDT elements together (in say, 1MB blocks)?
Thoughts?
(side note: can someone explain the "size xxx on disk, xxx in core"
statements in that zdb output for me? The numbers never seem related to the
number of entries or .... anything.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/a38fd589/attachment.html>