I use rsync for backing up user data, profiles, important network shares
etc. (from several locations over WAN).
Overall it works flawlessly, as it transfers only changes, but sometimes
there are some serious hiccups.
Suppose this scenario, suppose it's 1 GB of files:
user shares:
/home/joe/data/file1
/file2
/...
/file1000
Now the user _moves_ that data to some other folder:
/home/joe/WAN_goes_crazy/file1
/file2
/...
/file1000
...and we start a backup process.
rsync will first transfer data from
"/home/joe/WAN_goes_crazy/file...",
and then deletes "/home/joe/data/data...".
Basically, this is how rsync works, but in the end, we transfer 1 GB of
files over WAN that we already have locally - the only thing that
changed was the folder where that data is.
Is there some workaround for this (some intelligent script etc.)?
--
Tomek
http://wpkg.org
WPKG - software deployment and upgrades with Samba
Hi, On Wed, 26 Oct 2005, Tomasz Chmielewski wrote:> I use rsync for backing up user data, profiles, important network shares etc. > (from several locations over WAN). > > Overall it works flawlessly, as it transfers only changes, but sometimes > there are some serious hiccups. > > Suppose this scenario, suppose it's 1 GB of files: > > user shares: > > /home/joe/data/file1 > /file2 > /... > /file1000 > > Now the user _moves_ that data to some other folder: > > /home/joe/WAN_goes_crazy/file1 > /file2 > /... > /file1000 > > ...and we start a backup process. > > rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...", and > then deletes "/home/joe/data/data...". > > Basically, this is how rsync works, but in the end, we transfer 1 GB of files > over WAN that we already have locally - the only thing that changed was the > folder where that data is. > > Is there some workaround for this (some intelligent script etc.)?I guess it needs some intelligent users. If you can teach your clients to process their moves in three steps: 1. hard link the old file to the new location 2. wait until the next rsync has run 3. delete the file at the old location then rsync with -H will detect the hard link and not fetch the file over the net. Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)
On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:> I use rsync for backing up user data, profiles, important network shares > etc. (from several locations over WAN). > > Overall it works flawlessly, as it transfers only changes, but sometimes > there are some serious hiccups. > > Suppose this scenario, suppose it's 1 GB of files: > > user shares: > > /home/joe/data/file1 > /file2 > /... > /file1000 > > Now the user _moves_ that data to some other folder: > > /home/joe/WAN_goes_crazy/file1 > /file2 > /... > /file1000 > > ...and we start a backup process. > > rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...", > and then deletes "/home/joe/data/data...". > > Basically, this is how rsync works, but in the end, we transfer 1 GB of > files over WAN that we already have locally - the only thing that > changed was the folder where that data is. > > Is there some workaround for this (some intelligent script etc.)?ISTM it would be quite useful to make rsync "rename-aware". Caveat: I haven't hacked on rsync for quite a while, so my understand may be wrong or outdated. But, I think this could be implemented thusly: You'd want to make this optional, say --detect-renames, because it does incur an extra processing cost. That option should imply at least, --checksum and --delete-after if --delete at all. Then you just need the generator to be slightly more clever. For each file on the sender which is *missing* from the receiver, it needs to search the checksums of all of receiver's existing files for a checksum match. If it finds a match, it can simply use that matched file and either copy or move it to the new filename. Then that file just gets skipped. I don't think this would require any changes to sender, receiver or protocol. What I described would only handle rename-without-modification, but it's cost is not very high. I think it's O(N*M), N=# of files on sender that are missing on receiver, M=# of files on sender. That's the cost over and above whatever --checksum costs. I don't see how rename-with-modification could be handled efficiently, though. Better not to go there. If nobody says I'm way off base here, I might be inspired to try to implement this. Unless someone else has the time and inclination... -chris