thr3ads.net - zfs discuss - [zfs-discuss] Dedup Questions. [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Tom Hall

2010-Feb-09 01:03 UTC

[zfs-discuss] Dedup Questions.

Hi,

I am loving the new dedup feature.


Few questions:
If you enable it after data is on the filesystem, it will find the
dupes on read as well as write? Would a scrub therefore make sure the
DDT is fully populated.

Re the DDT, can someone outline it''s structure please? Some sort of
hash table? The blogs I have read so far dont specify.

Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
case (ie all blocks non identical)
What are average block sizes?

Cheers,
Tom

Kjetil Torgrim Homme

2010-Feb-09 02:04 UTC

head link

[zfs-discuss] Dedup Questions.

Tom Hall <thattommyhall at gmail.com> writes:
> If you enable it after data is on the filesystem, it will find the
> dupes on read as well as write? Would a scrub therefore make sure the
> DDT is fully populated.
no.  only written data is added to the DDT, so you need to copy the data
somehow.  zfs send/recv is the most convenient, but you could even do a
loop of commands like

  cp -p "$file" "$file.tmp" && mv
"$file.tmp" "$file"
> Re the DDT, can someone outline it''s structure please? Some sort
of
> hash table? The blogs I have read so far dont specify.
I can''t help here.
> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
> case (ie all blocks non identical)
the size of an entry is much larger:

| From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM>
| Subject: Re: Dedup memory overhead
| Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyoney at sun.com>
| Date: Thu, 04 Feb 2010 11:58:44 +0200
| 
| Approximately it''s 150 bytes per individual block.
> What are average block sizes?
as a start, look at your own data.  divide the used size in "df" with
used inodes in "df -i".  example from my home directory:

  $ /usr/gnu/bin/df -i ~
  Filesystem            Inodes     IUsed     IFree  IUse%    Mounted on
  tank/home          223349423   3412777 219936646     2%    /volumes/home

  $ df -k ~
  Filesystem            kbytes      used     avail capacity  Mounted on
  tank/home          573898752 257644703 109968254    71%    /volumes/home

so the average file size is 75 KiB, smaller than the recordsize of 128
KiB.  extrapolating to a full filesystem, we''d get 4.9M files.
unfortunately, it''s more complicated than that, since a file can
consist
of many records even if the *average* is smaller than a single record.

a pessimistic estimate, then, is one record for each of those 4.9M
files, plus one record for each 128 KiB of diskspace (2.8M), for a total
of 7.7M records.  the size of the DDT for this (quite small!) filesystem
would be something like 1.2 GB.  perhaps a reasonable rule of thumb is 1
GB DDT per TB of storage.

(disclaimer: I''m not a kernel hacker, I just read this list :-)
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Richard Elling

2010-Feb-09 06:11 UTC

head link

[zfs-discuss] Dedup Questions.

On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:
> Tom Hall <thattommyhall at gmail.com> writes:
> 
>> If you enable it after data is on the filesystem, it will find the
>> dupes on read as well as write? Would a scrub therefore make sure the
>> DDT is fully populated.
> 
> no.  only written data is added to the DDT, so you need to copy the data
> somehow.  zfs send/recv is the most convenient, but you could even do a
> loop of commands like
> 
>  cp -p "$file" "$file.tmp" && mv
"$file.tmp" "$file"
> 
>> Re the DDT, can someone outline it''s structure please? Some
sort of
>> hash table? The blogs I have read so far dont specify.
> 
> I can''t help here.
UTSL
>> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
>> case (ie all blocks non identical)
> 
> the size of an entry is much larger:
> 
> | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM>
> | Subject: Re: Dedup memory overhead
> | Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyoney at sun.com>
> | Date: Thu, 04 Feb 2010 11:58:44 +0200
> | 
> | Approximately it''s 150 bytes per individual block.
> 
>> What are average block sizes?
> 
> as a start, look at your own data.  divide the used size in "df"
with
> used inodes in "df -i".  example from my home directory:
> 
>  $ /usr/gnu/bin/df -i ~
>  Filesystem            Inodes     IUsed     IFree  IUse%    Mounted on
>  tank/home          223349423   3412777 219936646     2%    /volumes/home
> 
>  $ df -k ~
>  Filesystem            kbytes      used     avail capacity  Mounted on
>  tank/home          573898752 257644703 109968254    71%    /volumes/home
> 
> so the average file size is 75 KiB, smaller than the recordsize of 128
> KiB.  extrapolating to a full filesystem, we''d get 4.9M files.
> unfortunately, it''s more complicated than that, since a file can
consist
> of many records even if the *average* is smaller than a single record.
> 
> a pessimistic estimate, then, is one record for each of those 4.9M
> files, plus one record for each 128 KiB of diskspace (2.8M), for a total
> of 7.7M records.  the size of the DDT for this (quite small!) filesystem
> would be something like 1.2 GB.  perhaps a reasonable rule of thumb is 1
> GB DDT per TB of storage.
"zdb -D poolname" will provide details on the DDT size.  FWIW, I have
a
pool with 52M DDT entries and the DDT is around 26GB.

	$ pfexec zdb -D tank
	DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core
	DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core
	
	dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00

(you can tell by the stats that I''m not expecting much dedup :-)
 -- richard

Kjetil Torgrim Homme

2010-Feb-09 15:24 UTC

head link

[zfs-discuss] Dedup Questions.

Richard Elling <richard.elling at gmail.com> writes:
> On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:
>> the size of [a DDT] entry is much larger:
>> 
>> | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM>
>> | 
>> | Approximately it''s 150 bytes per individual block.
>
> "zdb -D poolname" will provide details on the DDT size.  FWIW, I
have a
> pool with 52M DDT entries and the DDT is around 26GB.
wow, that''s much larger than Mertol''s estimate: 500 bytes per
block.
>    $ pfexec zdb -D tank
>    DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core
>    DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core
>
>    dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies
= 1.00
how do you calculate the 26 GB size from this?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Richard Elling

2010-Feb-09 16:26 UTC

head link

[zfs-discuss] Dedup Questions.

On Feb 9, 2010, at 7:24 AM, Kjetil Torgrim Homme wrote:
> Richard Elling <richard.elling at gmail.com> writes:
> 
>> On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:
>>> the size of [a DDT] entry is much larger:
>>> 
>>> | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM>
>>> | 
>>> | Approximately it''s 150 bytes per individual block.
>> 
>> "zdb -D poolname" will provide details on the DDT size. 
FWIW, I have a
>> pool with 52M DDT entries and the DDT is around 26GB.
> 
> wow, that''s much larger than Mertol''s estimate: 500 bytes
per block.
argv!  I miscalculated, the size is approximately 14.2GB, not 26GB.
That leads to approximately 270 bytes per record.
> 
>>   $ pfexec zdb -D tank
>>   DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in
core
>>   DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in
core
>> 
>>   dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress /
copies = 1.00
> 
> how do you calculate the 26 GB size from this?
The exact size is not accounted. I''m inferring the size by looking at
the difference
between the space used for the (simple) pool and the sum of the file systems
under
the pool, where the top-level file system (/tank) is empty with mount points,
but no
snapshots.

$ zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   100G  43.9G  56.1G    43%  1.00x  ONLINE  -

$ zfs list -r tank
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank       44.0G  54.5G    25K  /tank
tank/d     18.4G  54.5G  18.4G  /tank/d
tank/d2    11.3G  54.5G  11.3G  /tank/d2
tank/test    22K  54.5G    22K  /tank/test

 -- richard

Daniel Carosone

2010-Feb-09 20:54 UTC

head link

[zfs-discuss] Dedup Questions.

On Tue, Feb 09, 2010 at 08:26:42AM -0800, Richard Elling
wrote:> >> "zdb -D poolname" will provide details on the DDT size. 
FWIW, I have a
> >> pool with 52M DDT entries and the DDT is around 26GB.
I wish -D was documented; I had forgotten about it and only found the
(expensive) -S variant, which wasn''t what I was looking for. 

Well, I wish zdb was documented, but in this case I wish -D was
in the usage message, which is all the documentation we get today.
> >>   $ pfexec zdb -D tank
> >>   DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153
in core
> >>   DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159
in core
> >> 
> >>   dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress /
copies = 1.00
What units are the "size X on disk, Y in core" figures?  It''s
very
hard to make sense of them, given the vast difference in entries and
small difference in size of the two rows.  One can assume that the
duplicate entries have more block addresses in them and are bigger, I
suppose, but that isn''t really enough to explain the gap.   

At least the on disk / in core values give a roughly consistent ratio,
both these and for a pool I have handy here - though I still don''t
know what that means.  
> > how do you calculate the 26 GB size from this?
> 
> The exact size is not accounted. I''m inferring the size by looking
at the difference
> between the space used for the (simple) pool and the sum of the file
systems under
> the pool, where the top-level file system (/tank) is empty with mount
points, but no
> snapshots.
Surely there has to be a better way.  If the numbers above don''t give
it, then this brings me back to the method I speculated about in a
previous question..

I presume the DDT pool object can be found and inspected with zdb, to
reveal a size.  If the ratio and guesswork interpretation above holds
true, we might derive the in-core memory requirement from there. 

I don''t know how to use zdb to do that for objects in general, nor how
to find or recognise the object in question. Could someone who does
please provide some hints?

I will go look at zdb sources, but (without yet having done so) I
suspect that it will just be printing out figures from zfs data
structures, and I will still need help with interpretation. 

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100210/073f9023/attachment.bin>

Matthew Ahrens

2010-Feb-09 23:13 UTC

head link

[zfs-discuss] Dedup Questions.

Tom Hall wrote:> Re the DDT, can someone outline it''s structure please? Some sort
of
> hash table? The blogs I have read so far dont specify.
It is stored in a ZAP object, which is an extensible hash table.  See 
zap.[ch], ddt_zap.c, ddt.h

--matt

zfs discuss - Feb 2010 - Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.

[zfs-discuss] Dedup Questions.