Hello everyone,
Image you rsynced your mp3 archive. Later you do some cleanup renaming
and start splitting up the directory into a hierarchy and do some file
move around.
Data-wise you did nothing, meta-data-wise you did a lot. --fuzzy comes
into mind for the next rsync. Unfortunately fuzzy matching does not
include other (sub-)directories and cares a little too much about
modification times for this case.
I was thinking about introducing a superset of the current fuzzy
matching (works initially like the original, but tries more base files
if nothing matched so far), and/or two new threshold values with e.g.
--fuzzy-thresholds 1000:20000
where the numbers refer to the file size on the sender-side, the first
meaning ?below this size, don?t even consider fuzzy matching? and the
second number meaning ?above this size try harder to find a base file?.
This could default to --fuzzy-thresholds 0:<unlimited>, the old
behaviour.
In case of the more aggressive search: when running out of base files
with the original algorithm, try _all_ files in the destination
hierarchy with just the same size, possibly sorted by
Levenshtein-distance for the file name with full path.
The idea is to catch simple copy/move arounds, while still keeping
unreasonable base files away. Especially with bigger files, the
likeliness of exact same size collisions is pretty small. The risk is:
unnecessary checksum calculations with a wrong base file. If you think
that risk is too high, don?t use that option...
Is there a good reason why this functionality is not in rsync yet?
Regards,
Robert