Kjetil Torgrim Homme
2009-Dec-09 11:40 UTC
[zfs-discuss] will deduplication know about old blocks?
I''m planning to try out deduplication in the near future, but started wondering if I can prepare for it on my servers. one thing which struck me was that I should change the checksum algorithm to sha256 as soon as possible. but I wonder -- is that sufficient? will the dedup code know about old blocks when I store new data? let''s say I have an existing file img0.jpg. I turn on dedup, and copy it twice, to img0a.jpg and img0b.jpg. will all three files refer to the same block(s), or will only img0a and img0b share blocks? -- Kjetil T. Homme Redpill Linpro AS - Changing the game
Adam Leventhal
2009-Dec-09 17:43 UTC
[zfs-discuss] will deduplication know about old blocks?
Hi Kjetil, Unfortunately, dedup will only apply to data written after the setting is enabled. That also means that new blocks cannot dedup against old block regardless of how they were written. There is therefore no way to "prepare" your pool for dedup -- you just have to enable it when you have the new bits. Adam On Dec 9, 2009, at 3:40 AM, Kjetil Torgrim Homme wrote:> I''m planning to try out deduplication in the near future, but started > wondering if I can prepare for it on my servers. one thing which struck > me was that I should change the checksum algorithm to sha256 as soon as > possible. but I wonder -- is that sufficient? will the dedup code know > about old blocks when I store new data? > > let''s say I have an existing file img0.jpg. I turn on dedup, and copy > it twice, to img0a.jpg and img0b.jpg. will all three files refer to the > same block(s), or will only img0a and img0b share blocks? > > -- > Kjetil T. Homme > Redpill Linpro AS - Changing the game > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
Kjetil Torgrim Homme
2009-Dec-09 18:01 UTC
[zfs-discuss] will deduplication know about old blocks?
Adam Leventhal <ahl at eng.sun.com> writes:> Unfortunately, dedup will only apply to data written after the setting > is enabled. That also means that new blocks cannot dedup against old > block regardless of how they were written. There is therefore no way > to "prepare" your pool for dedup -- you just have to enable it when > you have the new bits.thank you for the clarification! -- Kjetil T. Homme Redpill Linpro AS - Changing the game
Adam, So therefore, the best way is to set this at pool creation time....OK, that makes sense, it operates only on fresh data that''s coming over the fence. BUT What happens if you snapshot, send, destroy, recreate (with dedup on this time around) and then write the contents of the cloned snapshot to the various places in the pool - which properties are in the ascendancy here? the "host pool" or the contents of the clone? The host pool I assume, because clone contents are (in this scenario) "just some new data"? -Me On Wed, Dec 9, 2009 at 18:43, Adam Leventhal <ahl at eng.sun.com> wrote:> Hi Kjetil, > > Unfortunately, dedup will only apply to data written after the setting is > enabled. That also means that new blocks cannot dedup against old block > regardless of how they were written. There is therefore no way to "prepare" > your pool for dedup -- you just have to enable it when you have the new > bits. >> On Dec 9, 2009, at 3:40 AM, Kjetil Torgrim Homme wrote: > > > I''m planning to try out deduplication in the near future, but started > > wondering if I can prepare for it on my servers. one thing which struck > > me was that I should change the checksum algorithm to sha256 as soon as > > possible. but I wonder -- is that sufficient? will the dedup code know > > about old blocks when I store new data? > > > > let''s say I have an existing file img0.jpg. I turn on dedup, and copy > > it twice, to img0a.jpg and img0b.jpg. will all three files refer to the > > same block(s), or will only img0a and img0b share blocks? > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091209/d44ae820/attachment.html>
Adam Leventhal
2009-Dec-09 19:36 UTC
[zfs-discuss] will deduplication know about old blocks?
> What happens if you snapshot, send, destroy, recreate (with dedup on this time around) and then write the contents of the cloned snapshot to the various places in the pool - which properties are in the ascendancy here? the "host pool" or the contents of the clone? The host pool I assume, because clone contents are (in this scenario) "just some new data"?The dedup property applies to all writes so the settings for the pool of origin don''t matter, just those on the destination pool. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
On 10/12/2009, at 5:36 AM, Adam Leventhal wrote:> The dedup property applies to all writes so the settings for the pool of origin don''t matter, just those on the destination pool.Just a quick related question I?ve not seen answered anywhere else: Is it safe to have dedup running on your rpool? (at install time, or if you need to migrate your rpool to new media) cheers, James
Cyril Plisko
2009-Dec-10 05:37 UTC
[zfs-discuss] will deduplication know about old blocks?
On Thu, Dec 10, 2009 at 12:37 AM, James Lever <j at jamver.id.au> wrote:> > On 10/12/2009, at 5:36 AM, Adam Leventhal wrote: > >> The dedup property applies to all writes so the settings for the pool of origin don''t matter, just those on the destination pool. > > Just a quick related question I?ve not seen answered anywhere else: > > Is it safe to have dedup running on your rpool? (at install time, or if you need to migrate your rpool to new media)I have it on on my laptop and and a couple of other machines. I also have number of frsh installations (albeit in VB) where the dedup is on from the very beginning. Meanwhile works ok. BTW, are there any implications of having dedup=on on rpool/dump ? I know that the compression is turned off explicitly for rpool/dump. -- Regards, Cyril
Darren J Moffat
2009-Dec-10 09:20 UTC
[zfs-discuss] will deduplication know about old blocks?
Cyril Plisko wrote:> On Thu, Dec 10, 2009 at 12:37 AM, James Lever <j at jamver.id.au> wrote: >> On 10/12/2009, at 5:36 AM, Adam Leventhal wrote: >> >>> The dedup property applies to all writes so the settings for the pool of origin don''t matter, just those on the destination pool. >> Just a quick related question I?ve not seen answered anywhere else: >> >> Is it safe to have dedup running on your rpool? (at install time, or if you need to migrate your rpool to new media) > > I have it on on my laptop and and a couple of other machines. I also > have number of frsh installations (albeit in VB) where the dedup is on > from the very beginning. > Meanwhile works ok. > > BTW, are there any implications of having dedup=on on rpool/dump ? I > know that the compression is turned off explicitly for rpool/dump.It will be ignored because when you write to the dump ZVOL it doesn''t go through the normal ZIO pipeline so the deduplication code is never run in that case. -- Darren J Moffat
Cyril Plisko
2009-Dec-10 09:47 UTC
[zfs-discuss] will deduplication know about old blocks?
>> >> BTW, are there any implications of having dedup=on on rpool/dump ? I >> know that the compression is turned off explicitly for rpool/dump. > > It will be ignored because when you write to the dump ZVOL it doesn''t go > through the normal ZIO pipeline so the deduplication code is never run in > that case.Yeah, that''s what I thought. I would imaging that same is true for the compression as well. If so, what is the reason for setting compression=off explicitly on rpool/dump ? -- Regards, Cyril
Darren J Moffat
2009-Dec-10 10:10 UTC
[zfs-discuss] will deduplication know about old blocks?
Cyril Plisko wrote:>>> BTW, are there any implications of having dedup=on on rpool/dump ? I >>> know that the compression is turned off explicitly for rpool/dump. >> It will be ignored because when you write to the dump ZVOL it doesn''t go >> through the normal ZIO pipeline so the deduplication code is never run in >> that case. > > Yeah, that''s what I thought. I would imaging that same is true for the > compression as well. If so, what is the reason for setting > compression=off explicitly on rpool/dump ?I believe it is so that it is obvious that ZFS isn''t doing the compression. The dump system does the compression - and may actually use a compression algorithm that ZFS doesn''t use (bzip2). -- Darren J Moffat