Hello rsyncers, I have long wished for a feature in rsync to detect files that have renamed on the sender side since the last time a sync was performed, and avoid transfering those files to the destination the same way it avoids transfering files that haven't changed. Example 1: a log directory (like /var/log) is backed up every day. Most of the time, rsync transfers very little data, but once a week it basically performs a full copy without finding any basis files on the destination copy. This is because the logs have been rotated and what was /var/log/example.1.gz has been renamed to /var/log/example.2.gz, /var/log/example.2.gz has become /var/log/example.3.gz, and so on. rsync has no way to know this. Example 2: a home directory is similarly backed up every day. One day, the user decides to clean house and move lots of files around, creating new directories, and moving hundreds of files around among different directories. Once again, rsync is going to have to retransfer each of those files. My solution is to add a stable, unchanging, name for each file in the transfer. As long as the --hard-links (-H) option is used, this stable name will provide a name that already exists on the receiver side, and the receiver can create a hard link to this name even when the file appears under a completely new name. These stable names do not actually exist on the sender side, they are synthesized by rsync from the unchanging attributes that confer the file its identity: its device and inode number. I have attached a patch for a proof of concept of this feature. With it, I can start with this directory structure on the sender side: testfiles/ testfiles/one testfiles/two ...and rsync it to the destination. On the destination there is an extra directory called "byinode" which contains hard links to both regular files. Then I rename testfiles/one to testfiles/oneone . When I rsync again, instead of deleting the file called "one" and transfering the full contents of an apparently new file called "oneone", the file "one" is deleted and "oneone" is created by hardlinking to the stable filename. I would like to know the following: - Are people interested in a feature like this? - Is there a better way to do it? - It only works with protocol version 30 at the moment. Would there be any interest in making it work with older protocol versions? (it has to be done very differently in older versions) - Could this patch be committed once I take it beyond proof of concept state? Notes, if you want to test it: - It is hardcoded to always enable the feature synthesize a directory called "byinode" and add it to the file list. The final version will make this a command line option, of course. - It only works with protocol version 30. - Use with at least --delete --no-i-r -r - Only the sender side requires the patch - The patch is against rsync-3.0.3pre2 Thank you for your feedback -Phil -------------- next part -------------- A non-text attachment was scrubbed... Name: rsync-3.0.3pre2.byinode.patch Type: text/x-diff Size: 6138 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20080622/50552978/rsync-3.0.3pre2.byinode.bin
Wayne Davison
2008-Sep-09 14:49 UTC
New feature: detect and avoid transfering renamed files
Sorry for the slow reply -- I marked your message for more in-depth study, and failed to get back to it until now. On Sun, Jun 22, 2008 at 08:01:16PM -0400, Phil Vandry wrote:> I have long wished for a feature in rsync to detect files that have > renamed on the sender side since the last time a sync was performed, > and avoid transfering those files to the destination the same way it > avoids transfering files that haven't changed.The detect-renamed patch in the patches directory has one possible implementation of this, but it fails to handle things like the /var/log rotation where files get renamed over the top of other files in the transfer. Your solution is quite an interesting one, but it does have some minor drawbacks: - It creates a single (potentially really big) directory of files on the receiver for the byinode/* files. - The file list increases in size significantly (around double). - The transfer must remain identical to prior transfers, or the synthesized directory will not match (and could be truncated with --delete). - It disables incremental recursion (as does the detect-renamed patch, but it would be nice to avoid this). - While it avoids doing an extra scan of the destination files (unlike the detect-renamed patch), the processing of all the files in the synthesized directory is akin to an extra scan pass. However, as long as those trade-offs are acceptable, it does do a great job of finding renamed files. I'd like something a little more flexible for a future rsync, though. I had been thinking of extending the db patch to add the ability to track files by checksum in a database. This would allow a run that used the DB to be an efficient checksum run (reading the checksums from the DB, not slowly generating them) and look up matching checksums in the DB on the receiving side to facilitate either renaming and/or efficient copying. Using a simple DB for the data (such as SQLite) would be easy to support, would work regardless of how much of a hierarchy was being copied, and would not require an extra hierarchy scan for each transfer (though it would require that the DB info be double-checked and ignored if not accurate, and it would be most efficient if the receiving side was not prone to being reorganized without updating the DB). To facilitate the typical log-dir rotation idiom, I was thinking of doing a directory-at-a-time of delayed-update (unless the user asked for a whole-transfer delayed-update). That idea is my current favorite for adding rename support. What do folks think? ..wayne..
Seemingly Similar Threads
- read.table() errors with tab as separator (PR#9061)
- [ANNOUNCE] libnftnl 1.0.1 release
- WinNT4 reports "Network path not found" when accessing share over certain length
- [PATCH] nictype.c32: PXELINUX module to display UNDI NIC bus type...
- [syslinux:master] com32/modules: Split build by architecture. Add dir.c32