I'm running a mirror of several repositories that are fetched using separate rsync runs. Since some of those repositories are hosting related files, I'm using the hardlink utility[1] in order to save disk space. However, I've noticed an issue that may lead to potential file metadata inconsistencies when using hardlink. Consider the following scenario: - two repositories (rep_a and rep_b) are mirrored that initially have a file in common. - after rsync mirrored the files, there are two identical copies in my mirror - hardlink detects those copies and links them together - the file attributes (permissions, ownership, mtime, etc.) of a hardlinked file in rep_a change upstream - rsync detects this change, updates the destination file in-place (it doesn't break the hardlinks) which leads to an inconsistant view of the file attributes in rep_b - depending on the order of the mirroring commands, the second rsync job for rep_b might reverse the attribute change (again for both repositories) Recapitulating, once two files are hardlinked, rsync will break the hardlink only if one files _data_ changes, if only the metadata (mode, ownership, times) changes, the file will be updated in-place, leading to an inconsistent mirror. Unfortunately I couldn't find an option for rsync to apply even metadata-changes to a new copy of the file. (Another option could be checking the link-count of the inode and create a new copy of the file only if it is greater than one.) Is there any workaround for this issue? Thanks in advance, --leo [1] http://code.google.com/p/hardlinkpy/ -- e-mail ::: Alexander.Bergolth (at) wu-wien.ac.at fax ::: +43-1-31336-906050 location ::: Computer Center | Vienna University of Economics | Austria
On 4/26/07, Alexander 'Leo' Bergolth <leo@strike.wu-wien.ac.at> wrote:> Recapitulating, once two files are hardlinked, rsync will break the > hardlink only if one files _data_ changes, if only the metadata (mode, > ownership, times) changes, the file will be updated in-place, leading to > an inconsistent mirror. > > Unfortunately I couldn't find an option for rsync to apply even > metadata-changes to a new copy of the file. (Another option could be > checking the link-count of the inode and create a new copy of the file > only if it is greater than one.)C Sights and I have been discussing the possibility of adding --no-tweak and --no-tweak-hlinked options that would do those two things: http://lists.samba.org/archive/rsync/2007-April/017613.html> Is there any workaround for this issue?Receive into a new, temporary destination specifying the original destination as a --link-dest basis dir. Files in the original destination that match the source in both data and preserved attributes will be hard-linked directly into the temporary destination, while unmatched source files will be written anew. Then move the temporary destination over the real one. In other words, replace this: rsync -a /path/to/src/ rep_a/ with this: rsync -a --link-dest=../rep_a/ /path/to/src/ rep_a.new/ rm -rf rep_a mv rep_a.new rep_a (Note: when you adapt these commands for your setup, remember that if the link-dest path is relative, rsync interprets it relative to the destination directory.) If you're concerned about concurrent access to the repository while the mirroring is in progress, you can use a symlink to do an atomic cutover: # one-time setup (not safe for concurrent access) mv rep_a rep_a.0 ln -s rep_a.0 rep_a # sync oldr=$(readlink rep_a) newr=${oldr%.*}.$((1-${oldr##*.})) rsync -a --link-dest=../$oldr/ src/ $newr/ ln -s $newr rep_a.tmp mv -T rep_a.tmp rep_a rm -r $oldr This is a slight improvement of the technique used by the script "support/atomic-rsync" in the rsync source tree. Please feel free to reply if you need any more help. Matt