Michael Tokarev
2010-May-14 07:54 UTC
skipping and not finding batched updates with extra --link-dest?
Hello. This is my first post here ;) I've been using rsync heavily for quite some time. For one, we use it to perform offline backups of numerous machines into single location. Since the systems are basically identical, it's wise to keep only different files, hard-linking identical ones. That worked well for many years, till rsync-3 were out. The procedure is like this: on the server-to-be-backed-up there's a "previous" directory, which gets compared with "current" using rsync, to generate a batched update. That update, or "delta", is sent to the central server (backup destination). There, it is applied to this system's "previous" state, to get the "current" one. This alone works as expected. The only difference between this version (which is an intended usage) and my actual usage is that I add extra --link-dest argument on the receiving end, to simplify finding identical files. So the receiving end, unlike the sending one, has two --link-dests -- one usual for the "previous state", and one extra, for "other machine" current state. Example command lines: sender: rsync -aHRSx \ --link-dest=$last \ --write-batch=batch \ directory... \ $cur receiver: rsync -aHRS \ --read-batch=batch \ --link-dest=$other \ --link-dest=$last \ $cur So the only one additional option is given: extra --link-dest=$other, for current backup dir of another machine. With this option, it fails on regular basis, in batches on the same place. Like this: (Skipping batched update for "etc/samba/smbpasswd") (No batched update for "etc/samba/smbpasswd") rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1526) [generator=3.0.7] Here are the files, with md5sums: -rw------- 2 root root 6118 Mar 16 07:22 $other/etc/samba/smbpasswd e473d16a8abad10a3f08e3fa96b5ed37 -rw------- 1 root root 6039 May 12 17:50 $last/etc/samba/smbpasswd d7bc141404920cd330f1ae0aff96389f -rw------- 1 root root 6039 ??? 14 10:35 $NEW/etc/samba/smbpasswd d7bc141404920cd330f1ae0aff96389f where $NEW is the actual file being backed up, which should be in the receiving batch. Note that the new file is of the same _size_ as in $other, but with different content. As you can see, $last and $new has the same content but different timestamps. I suspect it is due to some bug in older rsync, or maybe some my inaccuracy - I don't know, but the fact it that yes, some dates mismatches. When I set the date for the $last to match the one in the batch, it works. But what is more interesting is that it also works when I remove usage of the extra --link-dest, even with the wrong timestamp. It updates the timestamp and moves on. But with extra --link-dest in place, it fails like the above. To me it looks like the two places -- the one which checks if the "batched update" is actually necessary and the one which actually implements the update, do not match each other somehow. I'd say it's local operator/environment error, because the destination is not the same as the sender expects, if it were not working without the extra --link-dest. And what I can say for sure is that it always worked with rsync version 2. I tried to switch to v3 for serveral times, always ending up in this and reverting back to v2. Initially I tried updating rsync only on the receiving side (with immediate problem being trivial to fix by reverting back to v2 which processes the "bad" batches just fine), now I updated it on both ends, so v2 on the receiver does not work anymore, but the problem is still the same. Thanks! /mjt
Wayne Davison
2010-May-29 17:03 UTC
skipping and not finding batched updates with extra --link-dest?
On Fri, May 14, 2010 at 12:54 AM, Michael Tokarev <mjt at tls.msk.ru> wrote:> I add extra --link-dest argument on the receiving end, to simplify finding > identical files.That is not supported because it can radically change what rsync is doing. You may have an update in the batch file that is based on a different basis file than the one that rsync found on the receiving side, which could (attempt to) corrupt the file or cause other problems. With the --hard-link (-H) option, it can move the file's update around since it can change when rsync figures out that a file with multiple hard links is not going to be found elsewhere in the transfer. For a batch update, you should have an identical set of files for the batch-creation and batch-replay hosts. If you're wanting to find more files to hard-link together, you should probably do some kind of post-processing work. For instance, in 3.1.0dev (the upcoming release) there is a %C escape for the --out-format option that would output the md5 sum of every file in the transfer. If you had a list of the current md5 sums on the disk, you could match them and hard-link them together, updating the list with the new checksums. Or you could create something that scans through the parallel hierarchies looking for matching name+mtime+size entries. As for rsync 2 working with that idiom, it might be caused by a bug-fix in the hard-linking code, or perhaps by some bug-fixing to the batch code, or something similar. Rsync 2 might just be getting lucky, or it might be silently doing the wrong thing. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20100529/50dfbba6/attachment.html>