thr3ads.net - zfs discuss - [zfs-discuss] dedup existing data [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Steven Sim

2009-Dec-16 13:15 UTC

[zfs-discuss] dedup existing data

Hello;

How do we dedup existing data?

Will a ZFS send to an output file in a temporary staging area in the 
same pool and a subsequent reconstruct (zfs receive) from the file be 
sufficient?

Or do I have to completely move the data out of the pool and back in again?

Warmest Regards
Steven Sim

Darren J Moffat

2009-Dec-16 13:32 UTC

head link

[zfs-discuss] dedup existing data

Steven Sim wrote:> Hello;
> 
> How do we dedup existing data?
Currently by running a zfs send | zfs recv.
> Will a ZFS send to an output file in a temporary staging area in the 
> same pool and a subsequent reconstruct (zfs receive) from the file be 
> sufficient?
Yes but you can avoid the temp file and just do zfs send | zfs recv.
> Or do I have to completely move the data out of the pool and back in again?
That is what zfs send and recv actually does.

-- 
Darren J Moffat

Cyril Plisko

2009-Dec-16 13:59 UTC

head link

[zfs-discuss] dedup existing data

On Wed, Dec 16, 2009 at 3:32 PM, Darren J Moffat
<darrenm at opensolaris.org> wrote:> Steven Sim wrote:
>>
>> Hello;
>>
>> How do we dedup existing data?
>
> Currently by running a zfs send | zfs recv.
>
>> Will a ZFS send to an output file in a temporary staging area in the
same
>> pool and a subsequent reconstruct (zfs receive) from the file be
sufficient?
>
> Yes but you can avoid the temp file and just do zfs send | zfs recv.
This kinda assumes you have enough capacity to accommodate up to twice
(worst case) your current data. Correct ?

I have my home server ~ 80% full, so I am moving data into new
filesystem like that:

rsync --remove-source-files


-- 
Regards,
        Cyril

Steven Sim

2009-Dec-16 14:17 UTC

head link

[zfs-discuss] dedup existing data

Darren;

Ahhhh.... zfs send | zfs receive onto the same filesystem???

er... I tried the following...

#zfs snapshot myplace/mydata at preDedup

The above created the following...

admin at sunlight:~$ zfs list -t snapshot -r myplace/mydata            
NAME                          USED  AVAIL  REFER  MOUNTPOINT
myplace/mydata at scriptsnap1   675K      -   289M  -
myplace/mydata at scriptsnap2   675K      -   289M  -
myplace/mydata at scriptsnap3   476K      -   289M  -
myplace/mydata at preDedup            0      -   289M  - <---- Snapshot 
created manually
admin at sunlight:~$

The snapshots named scriptsnap was created by a rotating zfs snapshot 
script i wrote, one which i CRON to run every night...

er... then..

root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive -v 
myplace/mydata
cannot receive new filesystem stream: destination
''myplace/fujitsu'' exists
must specify -F to overwrite it

root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive -Fv 
myplace/mydata
cannot receive new filesystem stream: destination has snapshots (eg. 
myplace/mydata at scriptsnap3)
must destroy them to overwrite it

Could you advice....

I am guessing I must destroy all related snapshots first?

Warmest Regards
Steven Sim

Darren J Moffat wrote:> Steven Sim wrote:
>> Hello;
>>
>> How do we dedup existing data?
>
> Currently by running a zfs send | zfs recv.
>
>> Will a ZFS send to an output file in a temporary staging area in the 
>> same pool and a subsequent reconstruct (zfs receive) from the file be 
>> sufficient?
>
> Yes but you can avoid the temp file and just do zfs send | zfs recv.
>
>> Or do I have to completely move the data out of the pool and back in 
>> again?
>
> That is what zfs send and recv actually does.
>

Brandon High

2009-Dec-17 22:38 UTC

head link

[zfs-discuss] dedup existing data

On Wed, Dec 16, 2009 at 6:17 AM, Steven Sim <unixandme at gmail.com>
wrote:> root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive
-v
> myplace/mydata
> cannot receive new filesystem stream: destination
''myplace/fujitsu'' exists
> must specify -F to overwrite it
Try something like this:

zfs create -o mountpoint=none myplace/dedup
zfs unmount myplace/mydata # Make sure the source isn''t changing
anymore
zfs snapshot myplace/mydata at preDedup
zfs send -R myplace/mydata at preDedup | zfs receive -du myplace/dedup

It''ll create a new filesystem myplace/dedup/mydata

zfs rename myplace/mydata myplace/mydata_old
zfs rename myplace/dedup/mydata myplace/mydata
zfs mount myplace/mydata

You can now destroy the old dataset.

I''m also adding a user property to the dedup''d copy so I
don''t
accidentally do it again, eg:
zfs set com.freaks:deduped=yes myplace/dedup/mydata
prior to the ''zfs rename''.

There''s a little more finesse you can use to limit the time your
source dataset is unmounted. Do a snapshot and send|receive to get
most of the data over, then unmount and create a new snapshot and
send|receive to catch any changes since the first.

-B

-- 
Brandon High : bhigh at freaks.com
Mistakes are often the stepping stones to utter failure.

Anil

2009-Dec-17 23:10 UTC

head link

[zfs-discuss] dedup existing data

If you have another partition with enough space, you could technically just do:

mv src /some/other/place
mv /some/other/place src

Anyone see a problem with that? Might be the best way to get it de-duped.
-- 
This message posted from opensolaris.org

Brandon High

2009-Dec-17 23:54 UTC

head link

[zfs-discuss] dedup existing data

On Thu, Dec 17, 2009 at 3:10 PM, Anil <anilj at entic.net>
wrote:> If you have another partition with enough space, you could technically just
do:
>
> mv src /some/other/place
> mv /some/other/place src
>
> Anyone see a problem with that? Might be the best way to get it de-duped.
You''d lose any existing snapshots. You may lose ACLs.

If you have snapshots of the source, the space will still be used
until you destroy the snapshots.

-B

-- 
Brandon High : bhigh at freaks.com
Indecision is the key to flexibility.

Kjetil Torgrim Homme

2009-Dec-18 11:20 UTC

head link

[zfs-discuss] dedup existing data

Anil <anilj at entic.net> writes:
> If you have another partition with enough space, you could technically
> just do:
>
> mv src /some/other/place
> mv /some/other/place src
>
> Anyone see a problem with that? Might be the best way to get it
> de-duped.
I get uneasy whenever I see mv(1) used to move directory trees between
filesystems, that is, whenever mv(1) can''t do a simple rename(2), but
has to do a recursive copy of files.  it is essentially not restartable,
if mv(1) is interrupted, you must clean up the mess with rsync or
similar tools.  so why not use rsync from the get go?  (or zfs send/recv
of course.)

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

zfs discuss - Dec 2009 - dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data

[zfs-discuss] dedup existing data