Does dedup work at the pool level or the filesystem/dataset level? For example, if I were to do this: bash-3.2$ mkfile 100m /tmp/largefile bash-3.2$ zfs set dedup=off tank bash-3.2$ zfs set dedup=on tank/dir1 bash-3.2$ zfs set dedup=on tank/dir2 bash-3.2$ zfs set dedup=on tank/dir3 bash-3.2$ cp /tmp/largefile /tank/dir1/largefile bash-3.2$ cp /tmp/largefile /tank/dir2/largefile bash-3.2$ cp /tmp/largefile /tank/dir3/largefile Would largefile get dedup''ed? Would I need to set dedup on for the pool, and then disable where it isn''t wanted/needed? Also, will we need to move our data around (send/recv or whatever your preferred method is) to take advantage of dedup? I was hoping the blockpointer rewrite code would allow an admin to simply turn on dedup and let ZFS process the pool, eliminating excess redundancy as it went. -- Breandan Dezendorf breandan at dezendorf.com
it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. so largefile will get deduped in the example below. Enda Breandan Dezendorf wrote:> Does dedup work at the pool level or the filesystem/dataset level? > For example, if I were to do this: > > bash-3.2$ mkfile 100m /tmp/largefile > bash-3.2$ zfs set dedup=off tank > bash-3.2$ zfs set dedup=on tank/dir1 > bash-3.2$ zfs set dedup=on tank/dir2 > bash-3.2$ zfs set dedup=on tank/dir3 > bash-3.2$ cp /tmp/largefile /tank/dir1/largefile > bash-3.2$ cp /tmp/largefile /tank/dir2/largefile > bash-3.2$ cp /tmp/largefile /tank/dir3/largefile > > Would largefile get dedup''ed? Would I need to set dedup on for the > pool, and then disable where it isn''t wanted/needed? > > Also, will we need to move our data around (send/recv or whatever your > preferred method is) to take advantage of dedup? I was hoping the > blockpointer rewrite code would allow an admin to simply turn on dedup > and let ZFS process the pool, eliminating excess redundancy as it > went. >-- Enda O''Connor x19781 Software Product Engineering Patch System Test : Ireland : x19781/353-1-8199718
On Mon, Nov 2, 2009 at 9:41 AM, Enda O''Connor <Enda.Oconnor at sun.com> wrote:> it works at a pool wide level with the ability to exclude at a dataset > level, or the converse, if set to off at top level dataset can then set > lower level datasets to on, ie one can include and exclude depending on the > datasets contents.Great! I''ve been looking forward to this code for a long time. All the work and energy is very much appreciated. -- Breandan Dezendorf breandan at dezendorf.com
Enda O''Connor wrote:> it works at a pool wide level with the ability to exclude at a dataset > level, or the converse, if set to off at top level dataset can then set > lower level datasets to on, ie one can include and exclude depending on > the datasets contents. > > so largefile will get deduped in the example below.And you can use ''zdb -S'' (which is a lot better now than it used to be before dedup) to see how much benefit is there (without even turning dedup on): bash-3.2# zdb -S rpool Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 625K 9.9G 7.90G 7.90G 625K 9.9G 7.90G 7.90G 2 9.8K 184M 132M 132M 20.7K 386M 277M 277M 4 1.21K 16.6M 10.8M 10.8M 5.71K 76.9M 48.6M 48.6M 8 395 764K 745K 745K 3.75K 6.90M 6.69M 6.69M 16 125 2.71M 888K 888K 2.60K 54.2M 17.9M 17.9M 32 56 2.10M 750K 750K 2.33K 85.6M 29.8M 29.8M 64 9 22.0K 22.0K 22.0K 778 2.04M 2.04M 2.04M 128 4 6.00K 6.00K 6.00K 594 853K 853K 853K 256 2 8K 8K 8K 711 2.78M 2.78M 2.78M 512 2 4.50K 4.50K 4.50K 1.47K 3.52M 3.52M 3.52M 8K 1 128K 128K 128K 15.9K 1.99G 1.99G 1.99G 16K 2 8K 8K 8K 50.7K 203M 203M 203M Total 637K 10.1G 8.04G 8.04G 730K 12.7G 10.5G 10.5G dedup = 1.30, compress = 1.22, copies = 1.00, dedup * compress / copies = 1.58 bash-3.2# Be careful - can eat lots of RAM! Many thanks to Jeff and all the team! Regards, Victor> Enda > > Breandan Dezendorf wrote: >> Does dedup work at the pool level or the filesystem/dataset level? >> For example, if I were to do this: >> >> bash-3.2$ mkfile 100m /tmp/largefile >> bash-3.2$ zfs set dedup=off tank >> bash-3.2$ zfs set dedup=on tank/dir1 >> bash-3.2$ zfs set dedup=on tank/dir2 >> bash-3.2$ zfs set dedup=on tank/dir3 >> bash-3.2$ cp /tmp/largefile /tank/dir1/largefile >> bash-3.2$ cp /tmp/largefile /tank/dir2/largefile >> bash-3.2$ cp /tmp/largefile /tank/dir3/largefile >> >> Would largefile get dedup''ed? Would I need to set dedup on for the >> pool, and then disable where it isn''t wanted/needed? >> >> Also, will we need to move our data around (send/recv or whatever your >> preferred method is) to take advantage of dedup? I was hoping the >> blockpointer rewrite code would allow an admin to simply turn on dedup >> and let ZFS process the pool, eliminating excess redundancy as it >> went. >> >
On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote:> Enda O''Connor wrote: >> it works at a pool wide level with the ability to exclude at a >> dataset level, or the converse, if set to off at top level dataset >> can then set lower level datasets to on, ie one can include and >> exclude depending on the datasets contents. >> so largefile will get deduped in the example below. > > And you can use ''zdb -S'' (which is a lot better now than it used to > be before dedup) to see how much benefit is there (without even > turning dedup on):forgive my ignorance, but what''s the advantage of this new dedup over the existing compression option? Wouldn''t full-filesystem compression naturally de-dupe? -Jeremy
On Mon, Nov 2, 2009 at 9:01 PM, Jeremy Kitchen <kitchen at scriptkitchen.com> wrote:> > forgive my ignorance, but what''s the advantage of this new dedup over the > existing compression option? ?Wouldn''t full-filesystem compression naturally > de-dupe?No, the compression works on the block level. If there are two identical blocks, then compression will reduce the number of bytes to store on disk in both of them. However, there still will be two identical copies of the compressed data. Dedup will remove the extra copy. So that compression and dedup are complimenting each other. -- Regards, Cyril
>forgive my ignorance, but what''s the advantage of this new dedup over >the existing compression option?it may provide another space saving advantage. depending on your data, the savings can be very significant.>Wouldn''t full-filesystem compression >naturally de-dupe?no. compression doesn`t look forth and back, only the actual data block is compressed and redundant information being removed. compression != deduplication ! -- This message posted from opensolaris.org
On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote:> forgive my ignorance, but what''s the advantage of this new dedup over > the existing compression option? Wouldn''t full-filesystem compression > naturally de-dupe?If you snapshot/clone as you go, then yes, dedup will do little for you because you''ll already have done the deduplication via snapshots and clones. But dedup will give you that benefit even if you don''t snapshot/clone all your data. Not all data can be managed hierarchically, with a single dataset at the root of a history tree. For example, suppose you want to create two VirtualBox VMs running the same guest OS, sharing as much on-disk storage as possible. Before dedup you had to: create one VM, then snapshot and clone that VM''s VDI files, use an undocumented command to change the UUID in the clones, import them into VirtualBox, and setup the cloned VM using the cloned VDI files. (I know because that''s how I manage my VMs; it''s a pain, really.) With dedup you need only enable dedup and then install the two VMs. Clearly the dedup approach is far, far easier to use than the snapshot/clone approach. And since you can''t always snapshot/clone... There are many examples where snapshot/clone isn''t feasible but dedup can help. For example: mail stores (though they can do dedup at the application layer by using message IDs and hashes). For example: home directories (think of users saving documents sent via e-mail). For example: source code workspaces (ONNV, Xorg, Linux, whatever), where users might not think ahead to snapshot/clone a local clone (I also tend to maintain a local SCM clone that I then snapshot/clone to get workspaces for bug fixes and projects; it''s a pain, really). I''m sure there are many, many other examples. The workspace example is particularly interesting: with the snapshot/clone approach you get to deduplicate the _source code_, but not the _object code_, while with dedup you get both dedup''ed automatically. As for compression, that helps whether you dedup or not, and it helps by about the same factor either way -- dedup and compression are unrelated, really. Nico --
Jeremy Kitchen wrote:> > On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote: > >> Enda O''Connor wrote: >>> it works at a pool wide level with the ability to exclude at a >>> dataset level, or the converse, if set to off at top level dataset >>> can then set lower level datasets to on, ie one can include and >>> exclude depending on the datasets contents. >>> so largefile will get deduped in the example below. >> >> And you can use ''zdb -S'' (which is a lot better now than it used to be >> before dedup) to see how much benefit is there (without even turning >> dedup on): > > forgive my ignorance, but what''s the advantage of this new dedup over > the existing compression option? Wouldn''t full-filesystem compression > naturally de-dupe?See this for example: Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 625K 9.9G 7.90G 7.90G 625K 9.9G 7.90G 7.90G 2 9.8K 184M 132M 132M 20.7K 386M 277M 277M Allocated means what is actually allocated on disk, referenced - what would be allocated on disk without deduplication; then LSIZE denotes logical size, PSIZE denotes physical size after compression. Row with reference count of 1 shows the same figures both in "allocated" and "referenced" and this is expected - there only one reference to a block. But row with reference count of 2 shows good difference - without deduplication it is 20.7 thousands blocks on disk with logical size totalling to 386M and physical size after compression 277M. With deduplication there would be only 9.8 thousands blocks on disk (dedup factor of over 2x!), with logical size totalling to 184M and physical size of 132M. So with compression without deduplication it is 277M on disk, with deduplication it would be only 132M - good savings! Hope this helps, victor
On Mon, Nov 2, 2009 at 2:16 PM, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote: >> forgive my ignorance, but what''s the advantage of this new dedup over >> the existing compression option? ?Wouldn''t full-filesystem compression >> naturally de-dupe? > > If you snapshot/clone as you go, then yes, dedup will do little for you > because you''ll already have done the deduplication via snapshots and > clones. ?But dedup will give you that benefit even if you don''t > snapshot/clone all your data. ?Not all data can be managed > hierarchically, with a single dataset at the root of a history tree. > > For example, suppose you want to create two VirtualBox VMs running the > same guest OS, sharing as much on-disk storage as possible. ?Before > dedup you had to: create one VM, then snapshot and clone that VM''s VDI > files, use an undocumented command to change the UUID in the clones, > import them into VirtualBox, and setup the cloned VM using the cloned > VDI files. ?(I know because that''s how I manage my VMs; it''s a pain, > really.) ?With dedup you need only enable dedup and then install the two > VMs.The big difference here is when you consider a life cycle that ends long after provisioning is complete. With clones, the images will diverge. If a year after you install each VM you decide to do an OS upgrade, they will still be linked but are quite unlikely to both reference many of the same blocks. However, with deduplication, the similar changes (e.g. same patch applied, multiple of the same application installed, upgrade to the same newer OS) will result in fewer stored copies. This isn''t a big deal if you have 2 VM''s. It because quite significant if you have 5000 (e.g. on a ZFS-based file server). Assuming that the deduped blocks stay deduped in the ARC, it means that it is feasible to every block that is accessed with any frequency to be in memory. Oh yeah, and you save a lot of disk space. -- Mike Gerdts http://mgerdts.blogspot.com/
I just stumbled across a clever visual representation of deduplication: http://loveallthis.tumblr.com/post/166124704 It''s a flowchart of the lyrics to "Hey Jude". =-) Nothing is compressed, so you can still read all of the words. Instead, all of the duplicates have been folded together. -cheers, CSB -- This message posted from opensolaris.org
On 2-Nov-09, at 3:16 PM, Nicolas Williams wrote:> On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote: >> forgive my ignorance, but what''s the advantage of this new dedup over >> the existing compression option? Wouldn''t full-filesystem >> compression >> naturally de-dupe? > ... > There are many examples where snapshot/clone isn''t feasible but dedup > can help. For example: mail stores (though they can do dedup at the > application layer by using message IDs and hashes). For example: home > directories (think of users saving documents sent via e-mail). For > example: source code workspaces (ONNV, Xorg, Linux, whatever), where > users might not think ahead to snapshot/clone a local clone (I also > tend > to maintain a local SCM clone that I then snapshot/clone to get > workspaces for bug fixes and projects; it''s a pain, really). I''m sure > there are many, many other examples.A couple that come to mind... Some patterns become much cheaper with dedup: - The Subversion working copy format where you have the reference checked out file alongside the working file - QA/testing system where you might have dozens or hundreds of builds of iterations an application, mostly identical Exposing checksum metadata might have interesting implications for operations like diff, cmp, rsync, even tar. --Toby> > The workspace example is particularly interesting: with the > snapshot/clone approach you get to deduplicate the _source code_, but > not the _object code_, while with dedup you get both dedup''ed > automatically. > > As for compression, that helps whether you dedup or not, and it > helps by > about the same factor either way -- dedup and compression are > unrelated, > really. > > Nico > -- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 11/ 2/09 07:42 PM, Craig S. Bell wrote:> I just stumbled across a clever visual representation of deduplication: > > http://loveallthis.tumblr.com/post/166124704 > > It''s a flowchart of the lyrics to "Hey Jude". =-) > > Nothing is compressed, so you can still read all of the words. Instead, all of the duplicates have been folded together. -cheers, CSB >This should reference the prior (April 1, 1984) research by Donald Knuth at http://www.cs.utexas.edu/users/arvindn/misc/knuth_song_complexity.pdf :-) Jeff -- Jeff Savit Principal Field Technologist Sun Microsystems, Inc. Phone: 732-537-3451 (x63451) 2398 E Camelback Rd Email: jeff.savit at sun.com Phoenix, AZ 85016 http://blogs.sun.com/jsavit/
Folks, I''ve been reading Jeff Bonwick''s fascinating dedup post. This is going to sound like either the dumbest or the most obvious question ever asked, but, if you don''t know and can''t produce meaningful RTFM results....ask...so here goes: Assuming you have a dataset in a zfs pool that''s been deduplicated, with pointers all nicely in place and so on. Doesn''t this mean that you''re now always and forever tied to ZFS (and why not? I''m certainly not saying that''s a Bad Thing) because no other "wannabe file system" will be able to read those ZFS pointers? Or am I horribly misunderstanding the concept in some way? Regards - and as always - TIA, -Me -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091124/eb419cca/attachment.html>
Colin Raven wrote:> Folks, > I''ve been reading Jeff Bonwick''s fascinating dedup post. This is going > to sound like either the dumbest or the most obvious question ever > asked, but, if you don''t know and can''t produce meaningful RTFM > results....ask...so here goes: > > Assuming you have a dataset in a zfs pool that''s been deduplicated, with > pointers all nicely in place and so on. > > Doesn''t this mean that you''re now always and forever tied to ZFS (and > why not? I''m certainly not saying that''s a Bad Thing) because no other > "wannabe file system" will be able to read those ZFS pointers?no other filesystem (unless it''s ZFS-compatible ;-) will be able to read any "zfs pointers" (or much of any zfs internal data) - and it is completely independent of whether you use deduplication or not. If you want to have your data on a different FS, you''ll have to copy it off of zfs and onto your other FS with something like cpio or tar or maybe a backup tool that understands both - ZFS and OFS (other ...).> Or am I horribly misunderstanding the concept in some way?maybe - OTOH, maybe I misread your question: is this about a different FS *on top of* zpools/zvols? If so, I''ll have to defer to Team ZFS. HTH Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''