thr3ads.net - rsync - "intelligent" rsync scripts? [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Tomasz Chmielewski

2005-Oct-26 12:59 UTC

"intelligent" rsync scripts?

I use rsync for backing up user data, profiles, important network shares 
etc. (from several locations over WAN).

Overall it works flawlessly, as it transfers only changes, but sometimes 
there are some serious hiccups.

Suppose this scenario, suppose it's 1 GB of files:

user shares:

/home/joe/data/file1
               /file2
               /...
               /file1000

Now the user _moves_ that data to some other folder:

/home/joe/WAN_goes_crazy/file1
                           /file2
                           /...
                           /file1000

...and we start a backup process.

rsync will first transfer data from
"/home/joe/WAN_goes_crazy/file...",
and then deletes "/home/joe/data/data...".

Basically, this is how rsync works, but in the end, we transfer 1 GB of 
files over WAN that we already have locally - the only thing that 
changed was the folder where that data is.

Is there some workaround for this (some intelligent script etc.)?


-- 
Tomek
http://wpkg.org
WPKG - software deployment and upgrades with Samba

Eberhard Moenkeberg

2005-Oct-26 13:12 UTC

head link

"intelligent" rsync scripts?

Hi,

On Wed, 26 Oct 2005, Tomasz Chmielewski wrote:
> I use rsync for backing up user data, profiles, important network shares
etc.
> (from several locations over WAN).
>
> Overall it works flawlessly, as it transfers only changes, but sometimes 
> there are some serious hiccups.
>
> Suppose this scenario, suppose it's 1 GB of files:
>
> user shares:
>
> /home/joe/data/file1
>              /file2
>              /...
>              /file1000
>
> Now the user _moves_ that data to some other folder:
>
> /home/joe/WAN_goes_crazy/file1
>                          /file2
>                          /...
>                          /file1000
>
> ...and we start a backup process.
>
> rsync will first transfer data from
"/home/joe/WAN_goes_crazy/file...", and
> then deletes "/home/joe/data/data...".
>
> Basically, this is how rsync works, but in the end, we transfer 1 GB of
files
> over WAN that we already have locally - the only thing that changed was the
> folder where that data is.
>
> Is there some workaround for this (some intelligent script etc.)?
I guess it needs some intelligent users.
If you can teach your clients to process their moves in three steps:

   1. hard link the old file to the new location
   2. wait until the next rsync has run
   3. delete the file at the old location

then rsync with -H will detect the hard link and not fetch the file over 
the net.

Cheers -e
-- 
Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)

Chris Shoemaker

2005-Oct-26 18:05 UTC

head link

"intelligent" rsync scripts?

On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski
wrote:> I use rsync for backing up user data, profiles, important network shares 
> etc. (from several locations over WAN).
> 
> Overall it works flawlessly, as it transfers only changes, but sometimes 
> there are some serious hiccups.
> 
> Suppose this scenario, suppose it's 1 GB of files:
> 
> user shares:
> 
> /home/joe/data/file1
>               /file2
>               /...
>               /file1000
> 
> Now the user _moves_ that data to some other folder:
> 
> /home/joe/WAN_goes_crazy/file1
>                           /file2
>                           /...
>                           /file1000
> 
> ...and we start a backup process.
> 
> rsync will first transfer data from
"/home/joe/WAN_goes_crazy/file...",
> and then deletes "/home/joe/data/data...".
> 
> Basically, this is how rsync works, but in the end, we transfer 1 GB of 
> files over WAN that we already have locally - the only thing that 
> changed was the folder where that data is.
> 
> Is there some workaround for this (some intelligent script etc.)?
ISTM it would be quite useful to make rsync "rename-aware".  Caveat: I
haven't hacked on rsync for quite a while, so my understand may be
wrong or outdated.  But, I think this could be implemented thusly:

You'd want to make this optional, say --detect-renames, because it
does incur an extra processing cost.  That option should imply at
least, --checksum and --delete-after if --delete at all.  Then you
just need the generator to be slightly more clever.  For each file on
the sender which is *missing* from the receiver, it needs to search
the checksums of all of receiver's existing files for a checksum
match.  If it finds a match, it can simply use that matched file and
either copy or move it to the new filename.  Then that file just gets
skipped.

I don't think this would require any changes to sender, receiver or
protocol.  What I described would only handle
rename-without-modification, but it's cost is not very high.  I think
it's O(N*M), N=# of files on sender that are missing on receiver, M=#
of files on sender.  That's the cost over and above whatever
--checksum costs.  

I don't see how rename-with-modification could be handled efficiently,
though.  Better not to go there.

If nobody says I'm way off base here, I might be inspired to try to
implement this.  Unless someone else has the time and inclination...

-chris

Maybe Matching Threads

Search for more seemingly similar threads

rsync - Oct 2005 - "intelligent" rsync scripts?

"intelligent" rsync scripts?

"intelligent" rsync scripts?

"intelligent" rsync scripts?

Maybe Matching Threads