Hi,
I have to synchronize a directory tree of one machine with a tree
on a remote machine. The tree on the remote machine is used read
only. In addition I clone the current target tree before each
synchronizing using hardlinks to be able to switch back to any
previous version if the synchronization failed or if it contained
any bad data. This works since 2002 using tar and a perl script.
After I got a direct network connection with ssh, I want to switch
to rsync instead.
At first sight rsync seems to be ideal, but there remain
a few questions to me:
a) avoiding inplace modifications:
As each target file is possibly hardlinked to some backup version,
i never want to perform any incremental backup modifying existing
files. Is this guaranteed if I don't use "--inplace"?
Or is there something like "--no-inplace"?
b) mixing symlinks and hardlinks:
There may be additional hardlinks within the transfer set.
This is handled properly by "--hard-links", but in my case
it is a little bit more complicated: The source tree also contains
symlinks which are to be dissolved using "--copy-links".
If three symlinks are pointing to the same source file, the file
should be copied once and hardlinked on the target system.
Unfortunately rsync does not recognize this situation as long as
the (hard) link count of the source inode is 1. In this case the
file is not considered as a candidate for target hardlink, and is
transferred three times instead. (In my case this does not happen
frequently as almost all of my files are excessively hardlinked
anyway.)
c) using time stamps:
An other problem arises with timestamps: Most of my files have
synchronized timestamps, but some few may differ. All files with
equal size and timestamp are considered equal, so I want to use
"--times". If, however, the time stamp differs, the file is
synchronized (transferred) without any further check. Would it be
possible to do an additional content check to verify, if the files
content really differ, and to leave the target file untouched,
if they differ only by their time stamps, not by content?
This is not only a matter of performance, but of the integrity
of my target tree.
Thanks for any comment,
Dieter.