Orvar Korvar
2009-Jul-14 21:18 UTC
[zfs-discuss] De-duplication: possible to identify duplicate files?
With dedup, will it be possible somehow to identify files that are identical but has different names? Then I can find and remove all duplicates. I know that with dedup, removal is not really needed because the duplicate will just be a reference to an existing file. But nevertheless I want to keep down the file count. -- This message posted from opensolaris.org
Toby Thain
2009-Jul-15 01:09 UTC
[zfs-discuss] De-duplication: possible to identify duplicate files?
On 14-Jul-09, at 5:18 PM, Orvar Korvar wrote:> With dedup, will it be possible somehow to identify files that are > identical but has different names? Then I can find and remove all > duplicates. I know that with dedup, removal is not really needed > because the duplicate will just be a reference to an existing file. > But nevertheless I want to keep down the file count.You can do this on any filesystem easily enough by taking hashes. --Toby> -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
James Lever
2009-Jul-15 01:30 UTC
[zfs-discuss] De-duplication: possible to identify duplicate files?
On 15/07/2009, at 7:18 AM, Orvar Korvar wrote:> With dedup, will it be possible somehow to identify files that are > identical but has different names? Then I can find and remove all > duplicates. I know that with dedup, removal is not really needed > because the duplicate will just be a reference to an existing file. > But nevertheless I want to keep down the file count.Based on Jeff and Bill''s talk this morning, dedup (v1.0) is initially based on the block level hashes used within zfs - so data, regardless of zfs filesystem or zvol within a pool (for those vols/fs'' have dedup enabled) will keep single copies of each block that has the same hash. I''m sure more detailed information will come as it is put back into ON and information is published on docs.sun.com and other places. cheers, James
Richard Elling
2009-Jul-15 05:30 UTC
[zfs-discuss] De-duplication: possible to identify duplicate files?
James Lever wrote:> > On 15/07/2009, at 7:18 AM, Orvar Korvar wrote: > >> With dedup, will it be possible somehow to identify files that are >> identical but has different names? Then I can find and remove all >> duplicates. I know that with dedup, removal is not really needed >> because the duplicate will just be a reference to an existing file. >> But nevertheless I want to keep down the file count. > > Based on Jeff and Bill''s talk this morning, dedup (v1.0) is initially > based on the block level hashes used within zfs - so data, regardless > of zfs filesystem or zvol within a pool (for those vols/fs'' have dedup > enabled) will keep single copies of each block that has the same hash.Yes, you''d rather not dedup at the file level :-) -- richard