Hello; How do we dedup existing data? Will a ZFS send to an output file in a temporary staging area in the same pool and a subsequent reconstruct (zfs receive) from the file be sufficient? Or do I have to completely move the data out of the pool and back in again? Warmest Regards Steven Sim
Steven Sim wrote:> Hello; > > How do we dedup existing data?Currently by running a zfs send | zfs recv.> Will a ZFS send to an output file in a temporary staging area in the > same pool and a subsequent reconstruct (zfs receive) from the file be > sufficient?Yes but you can avoid the temp file and just do zfs send | zfs recv.> Or do I have to completely move the data out of the pool and back in again?That is what zfs send and recv actually does. -- Darren J Moffat
On Wed, Dec 16, 2009 at 3:32 PM, Darren J Moffat <darrenm at opensolaris.org> wrote:> Steven Sim wrote: >> >> Hello; >> >> How do we dedup existing data? > > Currently by running a zfs send | zfs recv. > >> Will a ZFS send to an output file in a temporary staging area in the same >> pool and a subsequent reconstruct (zfs receive) from the file be sufficient? > > Yes but you can avoid the temp file and just do zfs send | zfs recv.This kinda assumes you have enough capacity to accommodate up to twice (worst case) your current data. Correct ? I have my home server ~ 80% full, so I am moving data into new filesystem like that: rsync --remove-source-files -- Regards, Cyril
Darren; Ahhhh.... zfs send | zfs receive onto the same filesystem??? er... I tried the following... #zfs snapshot myplace/mydata at preDedup The above created the following... admin at sunlight:~$ zfs list -t snapshot -r myplace/mydata NAME USED AVAIL REFER MOUNTPOINT myplace/mydata at scriptsnap1 675K - 289M - myplace/mydata at scriptsnap2 675K - 289M - myplace/mydata at scriptsnap3 476K - 289M - myplace/mydata at preDedup 0 - 289M - <---- Snapshot created manually admin at sunlight:~$ The snapshots named scriptsnap was created by a rotating zfs snapshot script i wrote, one which i CRON to run every night... er... then.. root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive -v myplace/mydata cannot receive new filesystem stream: destination ''myplace/fujitsu'' exists must specify -F to overwrite it root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive -Fv myplace/mydata cannot receive new filesystem stream: destination has snapshots (eg. myplace/mydata at scriptsnap3) must destroy them to overwrite it Could you advice.... I am guessing I must destroy all related snapshots first? Warmest Regards Steven Sim Darren J Moffat wrote:> Steven Sim wrote: >> Hello; >> >> How do we dedup existing data? > > Currently by running a zfs send | zfs recv. > >> Will a ZFS send to an output file in a temporary staging area in the >> same pool and a subsequent reconstruct (zfs receive) from the file be >> sufficient? > > Yes but you can avoid the temp file and just do zfs send | zfs recv. > >> Or do I have to completely move the data out of the pool and back in >> again? > > That is what zfs send and recv actually does. >
On Wed, Dec 16, 2009 at 6:17 AM, Steven Sim <unixandme at gmail.com> wrote:> root at sunlight:/root# zfs send myplace/mydata at preDedup | zfs receive -v > myplace/mydata > cannot receive new filesystem stream: destination ''myplace/fujitsu'' exists > must specify -F to overwrite itTry something like this: zfs create -o mountpoint=none myplace/dedup zfs unmount myplace/mydata # Make sure the source isn''t changing anymore zfs snapshot myplace/mydata at preDedup zfs send -R myplace/mydata at preDedup | zfs receive -du myplace/dedup It''ll create a new filesystem myplace/dedup/mydata zfs rename myplace/mydata myplace/mydata_old zfs rename myplace/dedup/mydata myplace/mydata zfs mount myplace/mydata You can now destroy the old dataset. I''m also adding a user property to the dedup''d copy so I don''t accidentally do it again, eg: zfs set com.freaks:deduped=yes myplace/dedup/mydata prior to the ''zfs rename''. There''s a little more finesse you can use to limit the time your source dataset is unmounted. Do a snapshot and send|receive to get most of the data over, then unmount and create a new snapshot and send|receive to catch any changes since the first. -B -- Brandon High : bhigh at freaks.com Mistakes are often the stepping stones to utter failure.
If you have another partition with enough space, you could technically just do: mv src /some/other/place mv /some/other/place src Anyone see a problem with that? Might be the best way to get it de-duped. -- This message posted from opensolaris.org
On Thu, Dec 17, 2009 at 3:10 PM, Anil <anilj at entic.net> wrote:> If you have another partition with enough space, you could technically just do: > > mv src /some/other/place > mv /some/other/place src > > Anyone see a problem with that? Might be the best way to get it de-duped.You''d lose any existing snapshots. You may lose ACLs. If you have snapshots of the source, the space will still be used until you destroy the snapshots. -B -- Brandon High : bhigh at freaks.com Indecision is the key to flexibility.
Anil <anilj at entic.net> writes:> If you have another partition with enough space, you could technically > just do: > > mv src /some/other/place > mv /some/other/place src > > Anyone see a problem with that? Might be the best way to get it > de-duped.I get uneasy whenever I see mv(1) used to move directory trees between filesystems, that is, whenever mv(1) can''t do a simple rename(2), but has to do a recursive copy of files. it is essentially not restartable, if mv(1) is interrupted, you must clean up the mess with rsync or similar tools. so why not use rsync from the get go? (or zfs send/recv of course.) -- Kjetil T. Homme Redpill Linpro AS - Changing the game