thr3ads.net - rsync - great feature idea (well, hopefully) [Feb 2015]

If this information is useful, please help other people find it:
Share via:

QUBE RUBBIK

2015-Feb-11 14:03 UTC

great feature idea (well, hopefully)

Hello

I was just thinking about a killer feature for rsync, the ability to detect
files name changes or move within the source and destination.
At this time rsync has to re-transfer a file if it has been renamed or moved
inside a subfolder, with a heavy waste of ressources and bandwidth.

It could be smarter :
with a --smart switch, rsync could take a hash of every file within the source
and destination BEFORE TRANSFERING,
then for existing (matching hash) files, it only needs to alter metadata (name,
location, chmod etc...) saving plenty of bandwidth

Okay destination has to handle this, I expect the rsync daemon has to handle
server side file hashing.

We would have a clever tool to replicate data who only been reorganised with no
changes on the files themselves.
No need to resync the whole structure if you added a dir in the path, or someone
renamed this particular heavy file

this may save big data on automatic backups, ftp mirrors etc...


What do you think about it?

--smart ?
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.samba.org/pipermail/rsync/attachments/20150211/f6132017/attachment.html>

Joe

2015-Feb-12 08:20 UTC

head link

great feature idea (well, hopefully)

I haven't used it yet, but take a look at --fuzzy specified once or
twice. I think it does what you want or at least something very similar.

Joe

On 02/11/2015 09:03 AM, QUBE RUBBIK wrote:> Hello
>
> I was just thinking about a killer feature for rsync, the ability to
> detect files name changes or move within the source and destination.
> At this time rsync has to re-transfer a file if it has been renamed or
> moved inside a subfolder, with a heavy waste of ressources and bandwidth.
>
> It could be smarter :
> with a --smart switch, rsync could take a hash of every file within
> the source and destination BEFORE TRANSFERING,
> then for existing (matching hash) files, it only needs to alter
> metadata (name, location, chmod etc...) saving plenty of bandwidth
>
> Okay destination has to handle this, I expect the rsync daemon has to
> handle server side file hashing.
>
> We would have a clever tool to replicate data who only been
> reorganised with no changes on the files themselves.
> No need to resync the whole structure if you added a dir in the path,
> or someone renamed this particular heavy file
>
> this may save big data on automatic backups, ftp mirrors etc...
>
>
> What do you think about it?
>
> --smart ?
>
>

Matthias Schniedermeyer

2015-Feb-12 09:10 UTC

head link

great feature idea (well, hopefully)

On 11.02.2015 14:03, QUBE RUBBIK wrote:> Hello
> 
> I was just thinking about a killer feature for rsync, the ability to detect
files name changes or move within the source and destination.
> At this time rsync has to re-transfer a file if it has been renamed or
moved inside a subfolder, with a heavy waste of ressources and bandwidth.
> 
> It could be smarter :
> with a --smart switch, rsync could take a hash of every file within the
source and destination BEFORE TRANSFERING,
> then for existing (matching hash) files, it only needs to alter metadata
(name, location, chmod etc...) saving plenty of bandwidth
Imagine doing that for a couple GB of data. The hashing might take 
longer than the time saved coping it.
This would only work with a persistence layer that remembers the hashes 
of unchanged files. This has been a topic in the past, altough i don't 
remember the details. (And i'm to lazy to google for it.)
Otherwise the only time it really saves time is when you have really 
asynchronous bandwithes:
Fast local access on both sides (to create the hashes), terrible 
bandwith on the link inbetween (for the coping of new/changed files)
> Okay destination has to handle this, I expect the rsync daemon has to
handle server side file hashing.
> 
> We would have a clever tool to replicate data who only been reorganised
with no changes on the files themselves.
> No need to resync the whole structure if you added a dir in the path, or
someone renamed this particular heavy file
> 
> this may save big data on automatic backups, ftp mirrors etc...
> 
> 
> What do you think about it?
The 'workaround' i personally use are hardlinks. Just hardlink all files
into a directory that sorts alphabetically before everything else, for 
me personally i use a '.z'-directory in the root of directory i treat 
that way.
That reason for that is rsync has to work through that directory first, 
otherwise it wouldn't work like intended.

After that you can move around the files and when you:
rsync ... -H --delete ... ...
rsync just deletes and re-hardlinks the moved file(s).

If you remove a file:
find .z -type f -links 1 -delete
removes the 'dangling' file(s) with only 1 link remaining.
(And in the meantime you have a backup, in case you accidentally deleted 
a file.)

You would also need to make plans for maintaing the .z-directory. 
Initial creating, adding new files, can files change? ...

The solution has some caveats, like maintaining the .z-directory, but it 
works fine for me.




-- 

Matthias

Seemingly Similar Threads

Search for more maybe matching threads

rsync - Feb 2015 - great feature idea (well, hopefully)

great feature idea (well, hopefully)

great feature idea (well, hopefully)

great feature idea (well, hopefully)

Seemingly Similar Threads