I''m trying to see if zfs dedupe is effective on our datasets, but I''m having a hard time figuring out how to measure the space saved. When I sent one backup set to the filesystem, the usage reported by "zfs list" and "zfs get used <my zfs>" are the expected values based on the data size. When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off. My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we''re saving. Thanks, Stacy -- This message posted from opensolaris.org
On Thu, Dec 17, 2009 at 8:57 PM, Stacy Maydew <stacy.maydew at sun.com> wrote:> I''m trying to see if zfs dedupe is effective on our datasets, but I''m having a hard time figuring out how to measure the space saved. > > When I sent one backup set to the filesystem, the usage reported by "zfs list" and "zfs get used <my zfs>" are the expected values based on the data size. > > When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off. > > My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we''re saving.Try "zpool list" For example: $ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT rpool 87G 76.3G 10.7G 87% 1.17x ONLINE - In this case the dedup ratio is 1.17 -- Regards, Cyril
On Fri 18/12/09 07:57 , "Stacy Maydew" stacy.maydew at sun.com sent:> I''m trying to see if zfs dedupe is effective on our datasets, but I''m > having a hard time figuring out how to measure the space saved. > When I sent one backup set to the filesystem, the usage reported by > "zfs list" and "zfs get used <my zfs>" are the > expected values based on the data size. > When I store a second copy, which should dedupe entirely, the zfs commands > report the doubled used space that would be occupied if dedupe was turned > off. > My question is, are the numbers being reported by the zfs command taking > into account the deduplication, or is there some other way to see how much > space we''re saving.What does zpool list show? -- Ian
On Thu, Dec 17, 2009 at 10:57 AM, Stacy Maydew <stacy.maydew at sun.com> wrote:> When I sent one backup set to the filesystem, the usage reported by "zfs list" and "zfs get used <my zfs>" are the expected values based on the data size. > > When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off.It''s how zfs does accounting with dedupe. Even when the blocks are deduped, they still count toward the size of the volume. It''s my understanding that this is done out of "fairness". If the space used were split between all duplicates and of the copies were deleted, then the remaining copy could push the user over quota (or fs past it''s limit, etc.)> My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we''re saving.''zpool list'' or ''zpool get dedup ${zpool_name}'' -B -- Brandon High : bhigh at freaks.com When in doubt, use brute force.
The commands "zpool list" and "zpool get dedup <pool>" both show a ratio of 1.10. So thanks for that answer. I''m a bit confused though if the dedup is applied per zfs filesystem, not zpool, why can I only see the dedup on a per pool basis rather than for each zfs filesystem? Seems to me there should be a way to get this information for a given zfs filesystem? Thanks again, Stacy -- This message posted from opensolaris.org
A Darren Dunham
2009-Dec-17 21:08 UTC
[zfs-discuss] How do I determine dedupe effectiveness?
On Thu, Dec 17, 2009 at 12:30:29PM -0800, Stacy Maydew wrote:> So thanks for that answer. I''m a bit confused though if the dedup is > applied per zfs filesystem, not zpool, why can I only see the dedup on > a per pool basis rather than for each zfs filesystem? > > Seems to me there should be a way to get this information for a given > zfs filesystem?You can enable and disable it on a fileystem basis, but the dedup is across the entire pool. -- Darren
Stacy Maydew wrote:> The commands "zpool list" and "zpool get dedup <pool>" both show a ratio of 1.10. > > So thanks for that answer. I''m a bit confused though if the dedup is applied per zfs filesystem, not zpool, why can I only see the dedup on a per pool basis rather than for each zfs filesystem? > > Seems to me there should be a way to get this information for a given zfs filesystem? > >The information, if present, would probably be meaningless. Consider which filesystem holds the block and which the dupe? What happens if the original is removed? -- Ian.
On Sat, Dec 19, 2009 at 05:25, Ian Collins <ian at ianshome.com> wrote:> Stacy Maydew wrote: > >> The commands "zpool list" and "zpool get dedup <pool>" both show a ratio >> of 1.10. >> So thanks for that answer. I''m a bit confused though if the dedup is >> applied per zfs filesystem, not zpool, why can I only see the dedup on a per >> pool basis rather than for each zfs filesystem? >> >> Seems to me there should be a way to get this information for a given zfs >> filesystem? >> >> >> > The information, if present, would probably be meaningless. Consider which > filesystem holds the block and which the dupe? What happens if the original > is removed? >AHA - "original/copy" I fell into the same trap. This is the question I had back in November. Michael Schuster http://blogs.sun.com/recursion helped me out and that''s my reference point. Here was my scenario: in /home/fred there''s a photo collection> another collection exists in /home/janet > at some point in the past, fred sent janet a party picture, let''s call > it DSC4456.JPG > In the dataset, there are now two copies of the file, which are > genuinely identical. > > So then: > - When you de-dupe, which copy of the file gets flung? >Michael provided the following really illuminating explanation: dedup (IIRC) operates at block level, not file level, so the question, as it> stands, has no answer. what happens - again, from what I read in Jeff''s blog > - is this: zfs detects that a copy of a block with the same hash is being > created, so instead of storing the block again, it just increments the > reference count and makes sure whatever "thing" references this piece of > data points to the "old" data. > > In that sense, you could probably argue that the "new" copy never gets > created. >("Jeff''s blog" referred to above is here: http://blogs.sun.com/bonwick/entry/zfs_dedup) OK, fair enough but I still could quite get my head around what''s actually happening, so I posed this followup question, in order to cement the idea in my silly head (because I still wasn''t focused on "new copy never gets created").... Fred has an image (DSC4456.JPG in my example) in his home directory, he''s sent it to Janet. Arguably - when Janet pulled the attachment out of the email and saved it to her $HOME - that copy never got written! Instead, the reference count was incremented by one. Fair enough, but what is Janet "seeing" when she does an ls and greps for that image? What is she seeing: - a symlink? - an "apparition" of some kind? she sees the file, it''s there, but what exactly is she seeing? Michael stepped in and described this: they''re going to see the same file (the blocks of which now have a ref.> counter that is one less than it was before). > > think posix-style hard links: two directory entries pointing to the same > inode - both "files" are actually one, but as long as you don''t change it, > it doesn''t matter. when you "remove" one (by removing the name), the other > remains, the ref. count in the inode is decremented by one. >So, coming around full circle to your question; "What happens if the original is removed?" it can be answered this way: There is no original, there is no copy. There is one block with reference counters. - Fred can rm his "file" (because clearly it isn''t a file, it''s a filename and that''s all) - result: the reference count is decremented by one - the data remains on disk. OR - Janet can rm her "filename" - result: the reference count is decremented by one - the data remains on disk OR -both can rm the filename the reference count is now decremented by two - but there were only two so now it''s really REALLY gone. Or is it really REALLY gone? Nope, If you snapshotted the pool it isn''t! :) For me, within the core of the explanation, the posix hard link reference somehow tipped the scales and made me understand, but we all have mental hooks into different parts of an explanation (the "aha" moment) so YMMV :) Dedup is fascinating, I hope you don''t mind me sharing this little list-anecdote because it honestly made a huge difference to my understanding of the concept. Once again, many thanks to Michael Schuster at Sun for having the patience to walk a n00b through the steps towards enlightenment. -- -Me -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091219/08e836ec/attachment.html>
Bob Friesenhahn
2009-Dec-19 16:20 UTC
[zfs-discuss] How do I determine dedupe effectiveness?
On Sat, 19 Dec 2009, Colin Raven wrote:> > There is no original, there is no copy. There is one block with reference counters. > > - Fred can rm his "file" (because clearly it isn''t a file, it''s a filename and that''s all) > - result: the reference count is decremented by one - the data remains on disk.While the similarity to hard links is a good analogy, there really is a unique "file" in this case. If Fred does a ''rm'' on the file then the reference count on all the file blocks is reduced by one, and the block is freed if the reference count goes to zero. Behavior is similar to the case where a snapshot references the file block. If Janet updates a block in the file, then that updated block becomes unique to her "copy" of the file (and the reference count on the original is reduced by one) and it remains unique unless it happens to match a block in some other existing file (or snapshot of a file). When we are children, we are told that sharing is good. In the case or references, sharing is usually good, but if there is a huge amount of sharing, then it can take longer to delete a set of files since the mutual references create a "hot spot" which must be updated sequentially. Files are usually created slowly so we don''t notice much impact from this sharing, but we expect (hope) that files will be deleted almost instantaneously. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, Dec 19, 2009 at 17:20, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Sat, 19 Dec 2009, Colin Raven wrote: > >> >> There is no original, there is no copy. There is one block with reference >> counters. >> >> - Fred can rm his "file" (because clearly it isn''t a file, it''s a filename >> and that''s all) >> - result: the reference count is decremented by one - the data remains on >> disk. >> > > While the similarity to hard links is a good analogy, there really is a > unique "file" in this case. If Fred does a ''rm'' on the file then the > reference count on all the file blocks is reduced by one, and the block is > freed if the reference count goes to zero. Behavior is similar to the case > where a snapshot references the file block. If Janet updates a block in the > file, then that updated block becomes unique to her "copy" of the file (and > the reference count on the original is reduced by one) and it remains unique > unless it happens to match a block in some other existing file (or snapshot > of a file). >Wait...whoah, hold on. If snapshots reside within the confines of the pool, are you saying that dedup will also count what''s contained inside the snapshots? I''m not sure why, but that thought is vaguely disturbing on some level. Then again (not sure how gurus feel on this point) but I have this probably naive and foolish belief that snapshots (mostly) oughtta reside on a separate physical box/disk_array..."someplace else" anyway. I say "mostly" because I s''pose keeping 15 minute snapshots on board is perfectly OK - and in fact handy. Hourly...ummm, maybe the same - but Daily/Monthly should reside "elsewhere".> > When we are children, we are told that sharing is good. In the case or > references, sharing is usually good, but if there is a huge amount of > sharing, then it can take longer to delete a set of files since the mutual > references create a "hot spot" which must be updated sequentially.Y''know, that is a GREAT point. Taking this one step further then - does that also imply that there''s one "hot spot" physically on a disk that keeps getting read/written to? if so then your point has even greater merit for more reasons...disk wear for starters, and other stuff too, no doubt.> Files are usually created slowly so we don''t notice much impact from this > sharing, but we expect (hope) that files will be deleted almost > instantaneously. <http://www.GraphicsMagick.org/>Indeed, that''s is completely logical. Also, something most of us don''t spend time thinking about. Bob, thanks. Your thoughts and insights are always interesting - and usually most revealing! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091219/e187012d/attachment.html>
Bob Friesenhahn
2009-Dec-19 16:47 UTC
[zfs-discuss] How do I determine dedupe effectiveness?
On Sat, 19 Dec 2009, Colin Raven wrote:> ? > Wait...whoah, hold on. > If snapshots reside within the confines of the pool, are you saying that dedup will also count > what''s contained inside the snapshots? I''m not sure why, but that thought is vaguely disturbing on > some level.Yes, of course. Any block in the pool which came from a filesystem participating in dedup is a candidate for deduplication. This includes snapshots. In fact, the block in the snapshot may already have been deduped before the snapshot was even taken. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Andrey Kuzmin
2009-Dec-19 17:31 UTC
[zfs-discuss] How do I determine dedupe effectiveness?
On Sat, Dec 19, 2009 at 7:20 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Sat, 19 Dec 2009, Colin Raven wrote: >> >> There is no original, there is no copy. There is one block with reference >> counters. >> >> - Fred can rm his "file" (because clearly it isn''t a file, it''s a filename >> and that''s all) >> - result: the reference count is decremented by one - the data remains on >> disk. > > While the similarity to hard links is a good analogy, there really is a > unique "file" in this case. ?If Fred does a ''rm'' on the file then the > reference count on all the file blocks is reduced by one, and the block is > freed if the reference count goes to zero. ?Behavior is similar to the case > where a snapshot references the file block. ?If Janet updates a block in the > file, then that updated block becomes unique to her "copy" of the file (and > the reference count on the original is reduced by one) and it remains unique > unless it happens to match a block in some other existing file (or snapshot > of a file). > > When we are children, we are told that sharing is good. ?In the case or > references, sharing is usually good, but if there is a huge amount of > sharing, then it can take longer to delete a set of files since the mutual > references create a "hot spot" which must be updated sequentially. ?Files > are usually created slowly so we don''t notice much impact from this sharing, > but we expect (hope) that files will be deleted almost instantaneously.I believe this has been taken care of in space maps design (http://blogs.sun.com/bonwick/entry/space_maps provides a nice overview). Regards, Andrey> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 19-Dec-09, at 4:35 AM, Colin Raven wrote:> ... > There is no original, there is no copy. There is one block with > reference counters.Many blocks, potentially shared, make up a de-dup''d file. Not sure why you write "one" here.> > - Fred can rm his "file" (because clearly it isn''t a file, it''s a > filename and that''s all) > - result: the reference count is decremented by one - the data > remains on disk. > OR > - Janet can rm her "filename" > - result: the reference count is decremented by one - the data > remains on disk > OR > -both can rm the filename the reference count is now decremented by > two - but there were only two so now it''s really REALLY gone.That explanation describes hard links. --Toby
On 19-Dec-09, at 11:34 AM, Colin Raven wrote:> > ... > Wait...whoah, hold on. > If snapshots reside within the confines of the pool, are you saying > that dedup will also count what''s contained inside the snapshots?Snapshots themselves are only references, so yes.> I''m not sure why, but that thought is vaguely disturbing on some > level. > > Then again (not sure how gurus feel on this point) but I have this > probably naive and foolish belief that snapshots (mostly) oughtta > reside on a separate physical box/disk_array...That is not possible, except in the case of a mirror, where one side is recoverable separately. You seem to be confusing "snapshots" with "backup".> "someplace else" anyway. I say "mostly" because I s''pose keeping 15 > minute snapshots on board is perfectly OK - and in fact handy. > Hourly...ummm, maybe the same - but Daily/Monthly should reside > "elsewhere". > > When we are children, we are told that sharing is good. In the > case or references, sharing is usually good, but if there is a huge > amount of sharing, then it can take longer to delete a set of files > since the mutual references create a "hot spot" which must be > updated sequentially. > > Y''know, that is a GREAT point. Taking this one step further then - > does that also imply that there''s one "hot spot" physically on a > disk that keeps getting read/written to? > if so then your point has even greater merit for more > reasons...disk wear for starters,That is not a problem. Disks don''t "wear" - it is a non-contact medium. --Toby> and other stuff too, no doubt. > > Files are usually created slowly so we don''t notice much impact > from this sharing, but we expect (hope) that files will be deleted > almost instantaneously. > Indeed, that''s is completely logical. Also, something most of us > don''t spend time thinking about.... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091219/cdc85226/attachment.html>
On Sat, Dec 19, 2009 at 19:08, Toby Thain <toby at telegraphics.com.au> wrote:> > On 19-Dec-09, at 11:34 AM, Colin Raven wrote > > Then again (not sure how gurus feel on this point) but I have this probably > naive and foolish belief that snapshots (mostly) oughtta reside on a > separate physical box/disk_array... > > > > That is not possible, except in the case of a mirror, where one side is > recoverable separately. >I was referring to zipping up a snapshot and getting it outta Dodge onto another physical box, or separate array.> You seem to be confusing "snapshots" with "backup". >No, I wasn''t confusing them at all. Backups are backups. Snapshots however, do have some limited value as backups. They''re no substitute, but augment a planned backup schedule rather nicely in many situations. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091219/f9506503/attachment.html>
On 19-Dec-09, at 2:01 PM, Colin Raven wrote:> > > On Sat, Dec 19, 2009 at 19:08, Toby Thain > <toby at telegraphics.com.au> wrote: > > On 19-Dec-09, at 11:34 AM, Colin Raven wrote > >> Then again (not sure how gurus feel on this point) but I have this >> probably naive and foolish belief that snapshots (mostly) oughtta >> reside on a separate physical box/disk_array... > > > That is not possible, except in the case of a mirror, where one > side is recoverable separately. > I was referring to zipping up a snapshot and getting it outta Dodge > onto another physical box, or separate array.or zfs send> > You seem to be confusing "snapshots" with "backup". > > No, I wasn''t confusing them at all. Backups are backups. Snapshots > however, do have some limited value as backups. They''re no > substitute, but augment a planned backup schedule rather nicely in > many situations.--T -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091219/3ac5d875/attachment.html>
On 19-Dec-09, at 11:34 AM, Colin Raven wrote:> > ... > When we are children, we are told that sharing is good. In the > case or references, sharing is usually good, but if there is a huge > amount of sharing, then it can take longer to delete a set of files > since the mutual references create a "hot spot" which must be > updated sequentially. > > Y''know, that is a GREAT point. Taking this one step further then - > does that also imply that there''s one "hot spot" physically on a > disk that keeps getting read/written to?Also, copy-on-write generally means that physical location of updates is ever-changing. --T
> Wait...whoah, hold > on.<br>If snapshots reside within the confines of the > pool, are you saying that dedup will also count > what's contained inside the snapshots? I'm > not sure why, but that thought is vaguely disturbing > on some level. > > <br>Then again (not sure how gurus feel on this > point) but I have this probably naive and foolish > belief that snapshots (mostly) oughtta reside on a > separate physical box/disk_array..."someplace > else" anyway. I say "mostly" because I > s'pose keeping 15 minute snapshots on board is > perfectly OK - and in fact handy. Hourly...ummm, > maybe the same - but Daily/Monthly should reside > "elsewhere".IMHO, snapshots are not a replacement for backups. Backups should definitely reside outside the system, so that if you lose your entire array, SAN, controller, etc., you can recover somewhere else. Snapshots, on the other hand, give you the ability to quickly recover to a point in time when something not-so-catastrophic happens - like a user deletes a file, an O/S update fails and hoses your system, etc. - without going to a backup system. Snapshots are nice, but they''re no replacement for backups. -- This message posted from opensolaris.org
On Sun, Dec 20, 2009 at 16:23, Nick <nick.couchman at seakr.com> wrote:> > IMHO, snapshots are not a replacement for backups. Backups should > definitely reside outside the system, so that if you lose your entire array, > SAN, controller, etc., you can recover somewhere else. Snapshots, on the > other hand, give you the ability to quickly recover to a point in time when > something not-so-catastrophic happens - like a user deletes a file, an O/S > update fails and hoses your system, etc. - without going to a backup system. > Snapshots are nice, but they''re no replacement for backups. >I agree, and said so, in response to:> You seem to be confusing "snapshots" with "backup". >To which I replied: No, I wasn''t confusing them at all. Backups are backups. Snapshots however, do have some limited value as backups. They''re no substitute, but augment a planned backup schedule rather nicely in many situations. Please note, that I said that snapshots AUGMENT a well planned backup schedule, and in no way are they - nor should they be - considered a replacement. Your quoted scenario is the perfect illustration, a user-deleted file, a rollback for that update that "didn''t quite work out as you hoped" and so forth. Agreed, no argument. The (one and only) point that I was making was that - like backups - snapshots should be kept "elsewhere" whether by using zfs-send, or zipping up the whole shebang and ssh''ing it someplace...."elsewhere" meaning beyond the pool. Rolling 15 minute and hourly snapshots....no, they stay local, but daily/weekly/monthly snapshots get stashed "offsite" (off-box). Apart from anything else, it''s one heck of a spacesaver - in the long run. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091220/785b6c87/attachment.html>
Nick Couchman
2009-Dec-20 22:09 UTC
[zfs-discuss] How do I determine dedupe effectiveness?
> The (one and only) point that I was making was that - like backups - > snapshots should be kept "elsewhere" whether by using zfs-send, or zipping > up the whole shebang and ssh''ing it someplace...."elsewhere" meaning beyond > the pool. Rolling 15 minute and hourly snapshots....no, they stay local, but > daily/weekly/monthly snapshots get stashed "offsite" (off-box). Apart from > anything else, it''s one heck of a spacesaver - in the long run.I guess that depends on what you''re doing with them and how big a part they play in your operations. On my SAN, I don''t roll my snapshots off-site, because I''m comfortable losing those snapshots and still being able to recover backup data, and I''d rather not duplicate storage infrastructure just to have snapshots around. I consider the snapshots a "nice-to-have" that saves me time periodically, but not critical to my infrastructure, therefore it doesn''t make sense to spend the time/money to send them off-site - if something bad happens where I cannot recover snapshots, I''m probably going to be spending a lot of time recovering, and the snapshots probably aren''t that useful to me. However, if the snapshots are critical to your operations and your ability to service user requests, then, yes, putting them onto a secondary storage location is a good idea. -Nick -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On Sat, Dec 19, 2009 at 8:34 AM, Colin Raven <colin at clearcutnetworks.com> wrote:> If snapshots reside within the confines of the pool, are you saying that > dedup will also count what''s contained inside the snapshots? I''m not sure > why, but that thought is vaguely disturbing on some level.Sure, why not? Let''s say you have snapshots enabled on a dataset with 1TB of files in it, and then decide to move 500GB to a new dataset for other sharing options, or what have you. If dedup didn''t count the snapshots you''d wind up with 500GB in your original live dataset, an additional 500GB in the snapshots, and an additional 500GB in the new dataset. For instance, tank/export/samba/backups used to be a directory in tank/export/samba/public. Snapshots being used in dedup saved me 700+GB. tank/export/samba/backups 704G 3.35T 704G /export/samba/backups tank/export/samba/public 816G 3.35T 101G /export/samba/public> in fact handy. Hourly...ummm, maybe the same - but Daily/Monthly should > reside "elsewhere".That''s what replication to another system via send/recv is for. See backups, DR.> Y''know, that is a GREAT point. Taking this one step further then - does that > also imply that there''s one "hot spot" physically on a disk that keeps > getting read/written to? if so then your point has even greater merit for > more reasons...disk wear for starters, and other stuff too, no doubt.I believe I read that there is a max ref count for blocks, and beyond that the data is written out once again. This is for resilience and to avoid hot spots. -B -- Brandon High : bhigh at freaks.com Indecision is the key to flexibility.
Brandon High wrote:> On Sat, Dec 19, 2009 at 8:34 AM, Colin Raven <colin at clearcutnetworks.com> wrote: > >> If snapshots reside within the confines of the pool, are you saying that >> dedup will also count what''s contained inside the snapshots? I''m not sure >> why, but that thought is vaguely disturbing on some level. >> > > Sure, why not? Let''s say you have snapshots enabled on a dataset with > 1TB of files in it, and then decide to move 500GB to a new dataset for > other sharing options, or what have you. > > If dedup didn''t count the snapshots you''d wind up with 500GB in your > original live dataset, an additional 500GB in the snapshots, and an > additional 500GB in the new dataset. > > For instance, tank/export/samba/backups used to be a directory in > tank/export/samba/public. Snapshots being used in dedup saved me > 700+GB. > tank/export/samba/backups 704G 3.35T 704G > /export/samba/backups > tank/export/samba/public 816G 3.35T 101G > /export/samba/public > >Architecturally, it is madness NOT to store (known) common data within the same local concept, in this case, a pool. Snapshots need to be retained close to their original parent (as do clones, et al.), and the abstract concept that holds them in ZFS is the pool. Frankly, I''d have a hard time thinking up of another structure (abstract or concrete) where it would make sense to store such an item (i.e. snapshots). Remember, that snapshot are A POINT IN TIME PICTURE of the filesystem/volume. No more, no less. As such, it makes logical sense to retain them "close" to their originator. People tend to slap all sorts of other inferences about what snapshots "mean", which is incorrect, both from a conceptual standpoint (a rose is a rose, not a pig, just because you want call it a pig) and at an implementation level. As for exactly what is meant by "counting" something inside a snapshot. Remember, a snapshot is already a form of dedup - that is, it is nothing more than a list of block pointers to blocks which existed at the time the snapshot was taken. I''ll have to check, but since I believe that the dedup metric is counting blocks which have more than one reference to them, it currently DOES influence the dedup count if you have a snapshot. I''m not in front of a sufficiently late-version install to check this; please, would someone check if taking a snapshot does or does not influence the dedup metric. (it''s a simple test - create a pool with 1 dataset, turn on dedup, then copy X amount of data to that dataset. check the dedup ratio. Then take a snapshot of the dataset, and re-check the dedup ratio) Conceptually speaking, it would be nice to exclude snapshots when computing the dedup ratio; implementation wise, I''m not sure how the ratio is really computed, so I can''t say if it''s simple or impossible.>> in fact handy. Hourly...ummm, maybe the same - but Daily/Monthly should >> reside "elsewhere". >> > > That''s what replication to another system via send/recv is for. See backups, DR. > >Once again, these are concepts that have no bearing on what a snapshot /IS/. What one want to /do/ with a snapshot is up to the user, but that''s not a decision to be made at the architecture level. That''s a decision for further up the application abstraction stack.>> Y''know, that is a GREAT point. Taking this one step further then - does that >> also imply that there''s one "hot spot" physically on a disk that keeps >> getting read/written to? if so then your point has even greater merit for >> more reasons...disk wear for starters, and other stuff too, no doubt. >> > > I believe I read that there is a max ref count for blocks, and beyond > that the data is written out once again. This is for resilience and to > avoid hot spots. > > -B >Various ZFS metadata blocks are far more "hot" than anything associated with dedup. Brandon is correct in that ZFS will tend to re-write such frequently-WRITTEN blocks (whether meta or real data) after a certain point. In the dedup case, this is irrelevant, since dedup is READ-only (if you change the block, by definition, it is no longer a dedup of it''s former "mates"). If anything, dedup blocks are /far/ more likely to end up in the L2ARC (read cache) than a typical block, everything else being equal. Now, if we can get a defrag utility/feature implemented (possibly after the BP rewrite stuff is committed), it would make sense to put frequently ACCESSED blocks at the highest-performing portions of the underlying media. This of course means that such a utility would have to be informed as to the characteristics of the underlying media (SSD, hard drive, RAM disk, etc.) and understand each of the limitations therein; case in point: for HDs, the highest-performing location is the outer sectors, while for MLC SSDs it is the "least used" ones, and it''s irrelevant for solid-state (NVRAM) drives. Honestly, now that I''ve considered it, I''m thinking that it''s not worth any real effort to do this kind of optimization. One futher thing to remember: ZFS dedup is a block-level action, so it is entirely possible for a FILE to "share" portions of it with others, while still having other blocks unique to it. As such, it differs from hard links, which are "file pointers". For example: if I write a new file B, which ZFS determines is entirely identical to another file A, then I have a x2 dedup ratio. However, it is still very possible for me to change 1 single bit in file B. File A remains the same, while file B consists of all dedup''d blocks pointing to those shared with A, EXCEPT the block where I changed the single bit. This is the same process that happens when updates are made after a snapshot. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)