thr3ads.net - zfs discuss - [zfs-discuss] Dedup relationship between pool and filesystem [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Peter Taps

2010-Sep-23 22:36 UTC

[zfs-discuss] Dedup relationship between pool and filesystem

Folks,

I am a bit confused on the dedup relationship between the filesystem and its
pool.

The dedup property is set on a filesystem, not on the pool.

However, the dedup ratio is reported on the pool and not on the filesystem.

Why is it this way?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org

Darren J Moffat

2010-Sep-23 22:49 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

On 09/23/10 15:36, Peter Taps wrote:> I am a bit confused on the dedup relationship between the filesystem and
its pool.
>
> The dedup property is set on a filesystem, not on the pool.
Dedup is a pool wide concept, blocks from multiple filesystems
maybe deduplicated.
> However, the dedup ratio is reported on the pool and not on the filesystem.
The dedup property is on the dataset (filesystem | ZVOL) so that
you can opt in/out on a per dataset basis.  For example if you have
one or two datasets you know will never have duplicate data then don''t
enable dedup on those.  For example:

zpool create tank ....

zfs set dedup=on tank
zfs create tank/1
zfs create tank/1/1
zfs create tank/2
zfs create -o dedup=off tank/2/2
zfs create tank/2/2/3

In this case all datasets in the pool will participate in deduplication
with the exception of tank/2/2 and its decendents.

-- 
Darren J Moffat

zfs user

2010-Sep-23 23:08 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

I believe it goes a something like this -

ZPS filesystems with dedupe turned on can be thought of as hippie/socialist 
filesystems, wanting to "share", etc.  Filesystems with dedupe turned
off are
a grey Randian landscape where sharing blocks between files is seen as a 
weakness/defect. They all live together in a zpool, let''s call it
"San
Francisco"...

The hippies store their shared blocks together in a communal store at the pool 
level and everything works pretty well until one of the hippie filesystems 
wants to pull a large number of their blocks out of the communal store; then 
all hell breaks loose and the grey Randians laugh at the hippies and their 
chaos but it is a joyless laughter.

That is the technical explanation, someone else may have a better explanation 
in layman''s terms.

On 9/23/10 3:36 PM, Peter Taps wrote:> Folks,
>
> I am a bit confused on the dedup relationship between the filesystem and
its pool.
>
> The dedup property is set on a filesystem, not on the pool.
>
> However, the dedup ratio is reported on the pool and not on the filesystem.
>
> Why is it this way?
>
> Thank you in advance for your help.
>
> Regards,
> Peter

Scott Meilicke

2010-Sep-23 23:22 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

Hi Peter,

dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple
file systems are set to dedupe, then they all benefit from using the same pool
of deduped blocks. In this way, if two files share some of the same blocks, even
if they are in different file systems, they will dedupe.

I am not sure why reporting is not done at the file system level. It may be an
accounting issue, i.e. which file system owns the dedupe blocks. But it seems
some fair estimate could be made. Maybe the overhead to keep a file system
updated with these stats is too high?

-Scott
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2010-Sep-24 01:14 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Peter Taps
> 
> The dedup property is set on a filesystem, not on the pool.
> 
> However, the dedup ratio is reported on the pool and not on the
> filesystem.
As with most other ZFS concepts, the core functionality of ZFS is
implemented in zpool.  Hence, zpool is up to what ... version 25 or so now?
Think of ZFS (the posix filesystem) as just an interface which tightly
integrates the zpool features.  ZFS is only up to what, version 4 now?

Perfect example:  

If you create a zvol in linux, without formatting it zfs, and format it
ext3/4, then you can snapshot it, and I believe you can even "zfs
send" and
receive.  And so on.  The core functionality is mostly present.  But if you
want to access the snapshot, you have to create some mountpoint, and mount
read-only the snapshot zvol to the mountpoint.  It''s not automatic. 
It''s
barely any better than the crappy "snapshot" concept linux has in LVM.
If
you want good automatic snapshot creation & seamless mounting &
automatic
mounting, then you need the ZFS filesystem on top of the zpool.  Cuz the ZFS
filesystem knows about that underlying zpool feature, and makes it
convenient and easy good experience.  ;-)

Brad Stone

2010-Sep-25 04:26 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

For de-duplication to perform well you need to be able to fit the de-dup table
in memory. Is a good rule-of-thumb for needed RAM  Size=(pool capacity/avg block
size)*270 bytes? Or perhaps it''s Size/expected_dedup_ratio?

And if you limit de-dup to certain datasets in the pool, how would this
calculation change?
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2010-Sep-25 12:33 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Brad Stone
> 
> For de-duplication to perform well you need to be able to fit the de-
> dup table in memory. Is a good rule-of-thumb for needed RAM  Size=(pool
> capacity/avg block size)*270 bytes? Or perhaps it''s
> Size/expected_dedup_ratio?
For now, the rule of thumb is 3G ram for every 1TB of unique data, including
snapshots and vdev''s.

After a system is running, I don''t know how/if you can measure current
mem
usage, to gauge the results of your own predictions.

Roy Sigurd Karlsbakk

2010-Sep-25 16:57 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

> > For de-duplication to perform well you need to be able to fit the
> > de-
> > dup table in memory. Is a good rule-of-thumb for needed RAM
> > Size=(pool
> > capacity/avg block size)*270 bytes? Or perhaps it''s
> > Size/expected_dedup_ratio?
> 
> For now, the rule of thumb is 3G ram for every 1TB of unique data,
> including
> snapshots and vdev''s.
> 
> After a system is running, I don''t know how/if you can measure
current
> mem usage, to gauge the results of your own predictions.
3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you have
small files.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Scott Meilicke

2010-Sep-25 17:28 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

When I do the calculations, assuming 300bytes per block to be conservative, with
128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped data. But
block size is dynamic, so you will need more than this.

Scott
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2010-Sep-26 02:51 UTC

head link

[zfs-discuss] Dedup relationship between pool and filesystem

> From: Roy Sigurd Karlsbakk [mailto:roy at karlsbakk.net]
> 
> > For now, the rule of thumb is 3G ram for every 1TB of unique data,
> > including
> > snapshots and vdev''s.
> 
> 3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you
> have small files.
http://opensolaris.org/jive/thread.jspa?threadID=131761

The true answer is "it varies" depending on things like block size,
etc, so if you want to say 1G or 3G, despite sounding like a big difference,
it''s in the noise.  We''re only talking "rule of
thumb" here, based on vague (vague) and widely variable estimates of your
personal usage characteristics.

It''s just a rule of thumb, and slightly over 1G ~= slightly under 3G in
this context.

Hence, the comment:
> After a system is running, I don''t know how/if you can measure
current
> mem usage, to gauge the results of your own predictions.

zfs discuss - Sep 2010 - Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem

[zfs-discuss] Dedup relationship between pool and filesystem