Hi, I am loving the new dedup feature. Few questions: If you enable it after data is on the filesystem, it will find the dupes on read as well as write? Would a scrub therefore make sure the DDT is fully populated. Re the DDT, can someone outline it''s structure please? Some sort of hash table? The blogs I have read so far dont specify. Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst case (ie all blocks non identical) What are average block sizes? Cheers, Tom
Tom Hall <thattommyhall at gmail.com> writes:> If you enable it after data is on the filesystem, it will find the > dupes on read as well as write? Would a scrub therefore make sure the > DDT is fully populated.no. only written data is added to the DDT, so you need to copy the data somehow. zfs send/recv is the most convenient, but you could even do a loop of commands like cp -p "$file" "$file.tmp" && mv "$file.tmp" "$file"> Re the DDT, can someone outline it''s structure please? Some sort of > hash table? The blogs I have read so far dont specify.I can''t help here.> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst > case (ie all blocks non identical)the size of an entry is much larger: | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM> | Subject: Re: Dedup memory overhead | Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyoney at sun.com> | Date: Thu, 04 Feb 2010 11:58:44 +0200 | | Approximately it''s 150 bytes per individual block.> What are average block sizes?as a start, look at your own data. divide the used size in "df" with used inodes in "df -i". example from my home directory: $ /usr/gnu/bin/df -i ~ Filesystem Inodes IUsed IFree IUse% Mounted on tank/home 223349423 3412777 219936646 2% /volumes/home $ df -k ~ Filesystem kbytes used avail capacity Mounted on tank/home 573898752 257644703 109968254 71% /volumes/home so the average file size is 75 KiB, smaller than the recordsize of 128 KiB. extrapolating to a full filesystem, we''d get 4.9M files. unfortunately, it''s more complicated than that, since a file can consist of many records even if the *average* is smaller than a single record. a pessimistic estimate, then, is one record for each of those 4.9M files, plus one record for each 128 KiB of diskspace (2.8M), for a total of 7.7M records. the size of the DDT for this (quite small!) filesystem would be something like 1.2 GB. perhaps a reasonable rule of thumb is 1 GB DDT per TB of storage. (disclaimer: I''m not a kernel hacker, I just read this list :-) -- Kjetil T. Homme Redpill Linpro AS - Changing the game
On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:> Tom Hall <thattommyhall at gmail.com> writes: > >> If you enable it after data is on the filesystem, it will find the >> dupes on read as well as write? Would a scrub therefore make sure the >> DDT is fully populated. > > no. only written data is added to the DDT, so you need to copy the data > somehow. zfs send/recv is the most convenient, but you could even do a > loop of commands like > > cp -p "$file" "$file.tmp" && mv "$file.tmp" "$file" > >> Re the DDT, can someone outline it''s structure please? Some sort of >> hash table? The blogs I have read so far dont specify. > > I can''t help here.UTSL>> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst >> case (ie all blocks non identical) > > the size of an entry is much larger: > > | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM> > | Subject: Re: Dedup memory overhead > | Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyoney at sun.com> > | Date: Thu, 04 Feb 2010 11:58:44 +0200 > | > | Approximately it''s 150 bytes per individual block. > >> What are average block sizes? > > as a start, look at your own data. divide the used size in "df" with > used inodes in "df -i". example from my home directory: > > $ /usr/gnu/bin/df -i ~ > Filesystem Inodes IUsed IFree IUse% Mounted on > tank/home 223349423 3412777 219936646 2% /volumes/home > > $ df -k ~ > Filesystem kbytes used avail capacity Mounted on > tank/home 573898752 257644703 109968254 71% /volumes/home > > so the average file size is 75 KiB, smaller than the recordsize of 128 > KiB. extrapolating to a full filesystem, we''d get 4.9M files. > unfortunately, it''s more complicated than that, since a file can consist > of many records even if the *average* is smaller than a single record. > > a pessimistic estimate, then, is one record for each of those 4.9M > files, plus one record for each 128 KiB of diskspace (2.8M), for a total > of 7.7M records. the size of the DDT for this (quite small!) filesystem > would be something like 1.2 GB. perhaps a reasonable rule of thumb is 1 > GB DDT per TB of storage."zdb -D poolname" will provide details on the DDT size. FWIW, I have a pool with 52M DDT entries and the DDT is around 26GB. $ pfexec zdb -D tank DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00 (you can tell by the stats that I''m not expecting much dedup :-) -- richard
Richard Elling <richard.elling at gmail.com> writes:> On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote: >> the size of [a DDT] entry is much larger: >> >> | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM> >> | >> | Approximately it''s 150 bytes per individual block. > > "zdb -D poolname" will provide details on the DDT size. FWIW, I have a > pool with 52M DDT entries and the DDT is around 26GB.wow, that''s much larger than Mertol''s estimate: 500 bytes per block.> $ pfexec zdb -D tank > DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core > DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core > > dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00how do you calculate the 26 GB size from this? -- Kjetil T. Homme Redpill Linpro AS - Changing the game
On Feb 9, 2010, at 7:24 AM, Kjetil Torgrim Homme wrote:> Richard Elling <richard.elling at gmail.com> writes: > >> On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote: >>> the size of [a DDT] entry is much larger: >>> >>> | From: Mertol Ozyoney <Mertol.Ozyoney at Sun.COM> >>> | >>> | Approximately it''s 150 bytes per individual block. >> >> "zdb -D poolname" will provide details on the DDT size. FWIW, I have a >> pool with 52M DDT entries and the DDT is around 26GB. > > wow, that''s much larger than Mertol''s estimate: 500 bytes per block.argv! I miscalculated, the size is approximately 14.2GB, not 26GB. That leads to approximately 270 bytes per record.> >> $ pfexec zdb -D tank >> DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core >> DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core >> >> dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00 > > how do you calculate the 26 GB size from this?The exact size is not accounted. I''m inferring the size by looking at the difference between the space used for the (simple) pool and the sum of the file systems under the pool, where the top-level file system (/tank) is empty with mount points, but no snapshots. $ zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 100G 43.9G 56.1G 43% 1.00x ONLINE - $ zfs list -r tank NAME USED AVAIL REFER MOUNTPOINT tank 44.0G 54.5G 25K /tank tank/d 18.4G 54.5G 18.4G /tank/d tank/d2 11.3G 54.5G 11.3G /tank/d2 tank/test 22K 54.5G 22K /tank/test -- richard
On Tue, Feb 09, 2010 at 08:26:42AM -0800, Richard Elling wrote:> >> "zdb -D poolname" will provide details on the DDT size. FWIW, I have a > >> pool with 52M DDT entries and the DDT is around 26GB.I wish -D was documented; I had forgotten about it and only found the (expensive) -S variant, which wasn''t what I was looking for. Well, I wish zdb was documented, but in this case I wish -D was in the usage message, which is all the documentation we get today.> >> $ pfexec zdb -D tank > >> DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core > >> DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core > >> > >> dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00What units are the "size X on disk, Y in core" figures? It''s very hard to make sense of them, given the vast difference in entries and small difference in size of the two rows. One can assume that the duplicate entries have more block addresses in them and are bigger, I suppose, but that isn''t really enough to explain the gap. At least the on disk / in core values give a roughly consistent ratio, both these and for a pool I have handy here - though I still don''t know what that means.> > how do you calculate the 26 GB size from this? > > The exact size is not accounted. I''m inferring the size by looking at the difference > between the space used for the (simple) pool and the sum of the file systems under > the pool, where the top-level file system (/tank) is empty with mount points, but no > snapshots.Surely there has to be a better way. If the numbers above don''t give it, then this brings me back to the method I speculated about in a previous question.. I presume the DDT pool object can be found and inspected with zdb, to reveal a size. If the ratio and guesswork interpretation above holds true, we might derive the in-core memory requirement from there. I don''t know how to use zdb to do that for objects in general, nor how to find or recognise the object in question. Could someone who does please provide some hints? I will go look at zdb sources, but (without yet having done so) I suspect that it will just be printing out figures from zfs data structures, and I will still need help with interpretation. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100210/073f9023/attachment.bin>
Tom Hall wrote:> Re the DDT, can someone outline it''s structure please? Some sort of > hash table? The blogs I have read so far dont specify.It is stored in a ZAP object, which is an extensible hash table. See zap.[ch], ddt_zap.c, ddt.h --matt