Peter Taps
2010-Sep-23 22:36 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
Folks, I am a bit confused on the dedup relationship between the filesystem and its pool. The dedup property is set on a filesystem, not on the pool. However, the dedup ratio is reported on the pool and not on the filesystem. Why is it this way? Thank you in advance for your help. Regards, Peter -- This message posted from opensolaris.org
Darren J Moffat
2010-Sep-23 22:49 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
On 09/23/10 15:36, Peter Taps wrote:> I am a bit confused on the dedup relationship between the filesystem and its pool. > > The dedup property is set on a filesystem, not on the pool.Dedup is a pool wide concept, blocks from multiple filesystems maybe deduplicated.> However, the dedup ratio is reported on the pool and not on the filesystem.The dedup property is on the dataset (filesystem | ZVOL) so that you can opt in/out on a per dataset basis. For example if you have one or two datasets you know will never have duplicate data then don''t enable dedup on those. For example: zpool create tank .... zfs set dedup=on tank zfs create tank/1 zfs create tank/1/1 zfs create tank/2 zfs create -o dedup=off tank/2/2 zfs create tank/2/2/3 In this case all datasets in the pool will participate in deduplication with the exception of tank/2/2 and its decendents. -- Darren J Moffat
zfs user
2010-Sep-23 23:08 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
I believe it goes a something like this - ZPS filesystems with dedupe turned on can be thought of as hippie/socialist filesystems, wanting to "share", etc. Filesystems with dedupe turned off are a grey Randian landscape where sharing blocks between files is seen as a weakness/defect. They all live together in a zpool, let''s call it "San Francisco"... The hippies store their shared blocks together in a communal store at the pool level and everything works pretty well until one of the hippie filesystems wants to pull a large number of their blocks out of the communal store; then all hell breaks loose and the grey Randians laugh at the hippies and their chaos but it is a joyless laughter. That is the technical explanation, someone else may have a better explanation in layman''s terms. On 9/23/10 3:36 PM, Peter Taps wrote:> Folks, > > I am a bit confused on the dedup relationship between the filesystem and its pool. > > The dedup property is set on a filesystem, not on the pool. > > However, the dedup ratio is reported on the pool and not on the filesystem. > > Why is it this way? > > Thank you in advance for your help. > > Regards, > Peter
Scott Meilicke
2010-Sep-23 23:22 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
Hi Peter, dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple file systems are set to dedupe, then they all benefit from using the same pool of deduped blocks. In this way, if two files share some of the same blocks, even if they are in different file systems, they will dedupe. I am not sure why reporting is not done at the file system level. It may be an accounting issue, i.e. which file system owns the dedupe blocks. But it seems some fair estimate could be made. Maybe the overhead to keep a file system updated with these stats is too high? -Scott -- This message posted from opensolaris.org
Edward Ned Harvey
2010-Sep-24 01:14 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Peter Taps > > The dedup property is set on a filesystem, not on the pool. > > However, the dedup ratio is reported on the pool and not on the > filesystem.As with most other ZFS concepts, the core functionality of ZFS is implemented in zpool. Hence, zpool is up to what ... version 25 or so now? Think of ZFS (the posix filesystem) as just an interface which tightly integrates the zpool features. ZFS is only up to what, version 4 now? Perfect example: If you create a zvol in linux, without formatting it zfs, and format it ext3/4, then you can snapshot it, and I believe you can even "zfs send" and receive. And so on. The core functionality is mostly present. But if you want to access the snapshot, you have to create some mountpoint, and mount read-only the snapshot zvol to the mountpoint. It''s not automatic. It''s barely any better than the crappy "snapshot" concept linux has in LVM. If you want good automatic snapshot creation & seamless mounting & automatic mounting, then you need the ZFS filesystem on top of the zpool. Cuz the ZFS filesystem knows about that underlying zpool feature, and makes it convenient and easy good experience. ;-)
Brad Stone
2010-Sep-25 04:26 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
For de-duplication to perform well you need to be able to fit the de-dup table in memory. Is a good rule-of-thumb for needed RAM Size=(pool capacity/avg block size)*270 bytes? Or perhaps it''s Size/expected_dedup_ratio? And if you limit de-dup to certain datasets in the pool, how would this calculation change? -- This message posted from opensolaris.org
Edward Ned Harvey
2010-Sep-25 12:33 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Brad Stone > > For de-duplication to perform well you need to be able to fit the de- > dup table in memory. Is a good rule-of-thumb for needed RAM Size=(pool > capacity/avg block size)*270 bytes? Or perhaps it''s > Size/expected_dedup_ratio?For now, the rule of thumb is 3G ram for every 1TB of unique data, including snapshots and vdev''s. After a system is running, I don''t know how/if you can measure current mem usage, to gauge the results of your own predictions.
Roy Sigurd Karlsbakk
2010-Sep-25 16:57 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
> > For de-duplication to perform well you need to be able to fit the > > de- > > dup table in memory. Is a good rule-of-thumb for needed RAM > > Size=(pool > > capacity/avg block size)*270 bytes? Or perhaps it''s > > Size/expected_dedup_ratio? > > For now, the rule of thumb is 3G ram for every 1TB of unique data, > including > snapshots and vdev''s. > > After a system is running, I don''t know how/if you can measure current > mem usage, to gauge the results of your own predictions.3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you have small files. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Scott Meilicke
2010-Sep-25 17:28 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
When I do the calculations, assuming 300bytes per block to be conservative, with 128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped data. But block size is dynamic, so you will need more than this. Scott -- This message posted from opensolaris.org
Edward Ned Harvey
2010-Sep-26 02:51 UTC
[zfs-discuss] Dedup relationship between pool and filesystem
> From: Roy Sigurd Karlsbakk [mailto:roy at karlsbakk.net] > > > For now, the rule of thumb is 3G ram for every 1TB of unique data, > > including > > snapshots and vdev''s. > > 3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you > have small files.http://opensolaris.org/jive/thread.jspa?threadID=131761 The true answer is "it varies" depending on things like block size, etc, so if you want to say 1G or 3G, despite sounding like a big difference, it''s in the noise. We''re only talking "rule of thumb" here, based on vague (vague) and widely variable estimates of your personal usage characteristics. It''s just a rule of thumb, and slightly over 1G ~= slightly under 3G in this context. Hence, the comment:> After a system is running, I don''t know how/if you can measure current > mem usage, to gauge the results of your own predictions.