thr3ads.net - zfs discuss - [zfs-discuss] De-duplication: possible to identify duplicate files? [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Orvar Korvar

2009-Jul-14 21:18 UTC

[zfs-discuss] De-duplication: possible to identify duplicate files?

With dedup, will it be possible somehow to identify files that are identical but
has different names? Then I can find and remove all duplicates. I know that with
dedup, removal is not really needed because the duplicate will just be a
reference to an existing file. But nevertheless I want to keep down the file
count.
-- 
This message posted from opensolaris.org

Toby Thain

2009-Jul-15 01:09 UTC

head link

[zfs-discuss] De-duplication: possible to identify duplicate files?

On 14-Jul-09, at 5:18 PM, Orvar Korvar wrote:
> With dedup, will it be possible somehow to identify files that are  
> identical but has different names? Then I can find and remove all  
> duplicates. I know that with dedup, removal is not really needed  
> because the duplicate will just be a reference to an existing file.  
> But nevertheless I want to keep down the file count.
You can do this on any filesystem easily enough by taking hashes.

--Toby
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

James Lever

2009-Jul-15 01:30 UTC

head link

[zfs-discuss] De-duplication: possible to identify duplicate files?

On 15/07/2009, at 7:18 AM, Orvar Korvar wrote:
> With dedup, will it be possible somehow to identify files that are  
> identical but has different names? Then I can find and remove all  
> duplicates. I know that with dedup, removal is not really needed  
> because the duplicate will just be a reference to an existing file.  
> But nevertheless I want to keep down the file count.
Based on Jeff and Bill''s talk this morning, dedup (v1.0) is initially  
based on the block level hashes used within zfs - so data, regardless  
of zfs filesystem or zvol within a pool (for those vols/fs'' have dedup
enabled) will keep single copies of each block that has the same hash.

I''m sure more detailed information will come as it is put back into ON
and information is published on docs.sun.com and other places.

cheers,
James

Richard Elling

2009-Jul-15 05:30 UTC

head link

[zfs-discuss] De-duplication: possible to identify duplicate files?

James Lever wrote:>
> On 15/07/2009, at 7:18 AM, Orvar Korvar wrote:
>
>> With dedup, will it be possible somehow to identify files that are 
>> identical but has different names? Then I can find and remove all 
>> duplicates. I know that with dedup, removal is not really needed 
>> because the duplicate will just be a reference to an existing file. 
>> But nevertheless I want to keep down the file count.
>
> Based on Jeff and Bill''s talk this morning, dedup (v1.0) is
initially
> based on the block level hashes used within zfs - so data, regardless 
> of zfs filesystem or zvol within a pool (for those vols/fs'' have
dedup
> enabled) will keep single copies of each block that has the same hash.
Yes, you''d rather not dedup at the file level :-)
 -- richard

zfs discuss - Jul 2009 - De-duplication: possible to identify duplicate files?

[zfs-discuss] De-duplication: possible to identify duplicate files?

[zfs-discuss] De-duplication: possible to identify duplicate files?

[zfs-discuss] De-duplication: possible to identify duplicate files?

[zfs-discuss] De-duplication: possible to identify duplicate files?