I don't believe that what you are asking for can be done with rsync. At
first thought you can't mix --ignore-existing with --ignore-non-existing
as that would ignore everything. Something would have to at least exist
and not be ignored for rsync to link to it.
Anyway, for a laugh, I asked chatgpt to make something to do this.
After I got my laugh I cleaned up some of the silly stuff it did and
came up with this:
#!/bin/bash
# Define the directories to compare
dir1="$1"
dir2="$2"
# Recursively list all files in both directories
files1=$(find "$dir1" -type f)
# Loop through files in first directory
for file1 in $files1; do
# Get relative path of file1
rel_path="${file1#$dir1}"
file2="$dir2$rel_path"
# Check if file exists in the second directory
if [ -f "$file2" ]; then
# Get metadata of both files
metadata1=$(stat -c "%Y%s" "$file1")
metadata2=$(stat -c "%Y%s" "$file2")
# Compare metadata
if [ "$metadata1" -eq "$metadata2" ]; then
# Delete file1 and create a hard link to file2
# rm "$file1"
# ln "$file2" "$file1"
echo "Hard linked: $file2 to $File1"
# else
# echo "Different: $file1"
fi
fi
done
Note that I only tested it a little bit which is why anything actually
destructive is commented.
On 5/1/24 19:34, B via rsync wrote:> Recently I was thinking about --link-dest= and if it was possible to use
> rsync to de-duplicate two nearly-identical directory structures.
>
> Normally I would use a tool like hardlink, jdupes, or rdfind, but in
> this case the files are huge and numerous, so hashing them would take
> forever. I did a test run and these tools mostly choked to death after a
> few hours.
>
> These directories were made using rsync in the first place, so I know
> the files are duplicate and I would be willing to use rsync's
> quick-check (path/filename, mtime, size) to assume uniqueness of the files.
>
> My objective is to hard-link files with the same relative path/filename,
> mtime, and size. Nothing more. Files which are different should not be
> touched. Files which exist in the destination but not the source should
> not be deleted. Files which exist in the source but not the destination
> should not be transferred.
>
> The problem is that I don't want to create any new files in the
> destination. That's the sticking point.
>
> I thought maybe I could do something wacky like 'rsync -a
> --ignore-existing --ignore-non-existing --link-dest="../new/"
old/ new',
> but that doesn't work. The existing files get ignored and nothing is
> linked.
>
> Is there a way to do this with rsync?
>
>
>