samba-bugs@samba.org
2006-Sep-28 00:49 UTC
DO NOT REPLY [Bug 4128] New: ignore-times with link-dest behaves unexpected / sematics not clear
https://bugzilla.samba.org/show_bug.cgi?id=4128 Summary: ignore-times with link-dest behaves unexpected / sematics not clear Product: rsync Version: 2.6.8 Platform: x86 OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: core AssignedTo: wayned@samba.org ReportedBy: jwagner@computing.dcu.ie QAContact: rsync-qa@samba.org Hi, I checked the following with rsync 2.6.8 from Fedora Core 5 (updated this week) and the current 2.6.9cvs. The behaviour is different, but still not as expected. The man page says that --ignore-times switches off any quick checks. Therefore, I concluded that this option makes sure that target data is correct in any case. Elsewhere, I even read that --ignore-times is an alternative for --checksum and that one or the other can be prefered depending on how many files are expected to be the same. However, when used together with --link-dest, the following happens. In 2.6.8, --ignore-times with --link-dest doesn't identify at all that files on the receiver can be hard-linked. This problem is known to the developers as revision r 1.273 of generator.c attempts to fix it. At least, rsync 2.6.8 uses the files in the link-dest directory to reduce network traffic, basically copying the while whithin the receiver. With 2.6.9cvs (2006-09-27, generator.c revision r 1.285), the same options cause the files to be hard-linked without being verified to have identical content. (Note: I installed rsync locally on both machines (configure --prefix=$HOME) and used option --rsync-path=$HOME/bin/rsync, see below.) Test script: Two machines A and B, Same user + numeric IDs. (I used Pentium 4 PCs with Fedora C5, updated with default repositories). export B=192.168.0.20 # <-- set this to the 2nd machine to be able to copy and paste from here (you might also want to configure ssh to avoid typing login passwords again and again) # step 1 - prepare data echo "one" >test1.txt echo "two" >test2.txt mkdir -p ref/data cp test1.txt ref/data/text101.txt cp test1.txt ref/data/text102.txt touch -d 060927 ref/data/* mkdir data cp test1.txt data/text101.txt cp test2.txt data/text102.txt touch -d 060927 data/* mkdir dst rsync -av ref dst `whoami`@$B:./ # step 2 - test rsync rsync -av --ignore-times --link-dest=../ref/ data `whoami`@$B:dst/ # note: dest=ref/ would be relative to dst/ # note2: if you had to type in a password for the first rsync, # copying'n'pasting the 2nd rsync might not have worked in one go # step 3 - analyze results ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*' # note: 3rd column gives the hard-link count # cleaning up for next run ssh `whoami`@$B 'rm -f dst/data/*' # test newest version $HOME/bin/rsync -av --rsync-path=$HOME/bin/rsync --ignore-times \ --link-dest=../ref/ data `whoami`@$B:dst/ ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*' Of course, it can be argued that the conclusion is wrong and the long description in the manpage missleading. --ignore-times simply does what is says: it ignores time stamps. However, the consequences when used with other options should be reasonable, or at least be documented. Motivation of combining these options: Machine B is a mirrow of machine A. Unfortunately, machine A turned out to have had a hardware defect that causes sporadic read errors. Files on B are likely to be damaged. Files on A might also be permanently damaged. For further analysis, I'd like to have all files on B. Without --link-dest, I don't have enough space. Without --ignore-times, files with same stat but with a bit error somewhere in the middle will not be detected. I'll now reconsider using --checksum although it seems to waste lots of time by calculating checksums sequentially first on machine A while B is idle, then, presumably (didn't get this far as I got impatient after 6 hours of high CPU and disk I/O load on A) on machine B while A is idle, to eventually apply the normal rsync algorithm on those files that are not identical. But this is a different story. Regards, Joachim -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Sep-28 09:44 UTC
DO NOT REPLY [Bug 4128] ignore-times with link-dest behaves unexpected / sematics not clear
https://bugzilla.samba.org/show_bug.cgi?id=4128 ------- Comment #1 from jwagner@computing.dcu.ie 2006-09-28 04:44 MST ------- Workaround ========= For those who find this report while searching for a solution: # clean up previous experiment export B=192.168.0.20 ssh `whoami`@$B 'rm -f dst/data/*' # run rsync with (default) quick check rsync -av --link-dest=../ref/ data `whoami`@$B:dst/ # find out which files are wrong md5sum data/* ssh $B md5sum dst/data/* # delete these files on B rm dst/data/text102.txt exit # rsync again without --link-dest to fill gaps rsync -av data `whoami`@$B:dst/ # everything is now fine: ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*' Notes: * login on B in a 2nd terminal to calculate checksums in parallel * if there is a risk that file differences are hand-crafted to be invisible to MD5 (in recent years feasible ways of doing this have been published), you should use slower but more secure sha1sum or sha512sum * for more than just a small directory, use something like find -type f -print0 | xargs --null md5sum | sort -k2 >B.md5 (see also "! -links 1" in 2nd bullet point below) * diff --speed-large-files A.md5 B.md5 should do fine in most cases to identify the differences * the only advantage over rsync --checksum is parallel checksum calculation; we are still wasting time on files that failed the quick check (time + size); improvement: generate file list with find -type f ! -links 1 -print0 on machine B and copy it to A (! means "not" for find) BTW: I can confirm that rsync --checksum would work as expected: ssh `whoami`@$B 'rm -f dst/data/*' rsync -av --checksum --link-dest=../ref/ data `whoami`@$B:dst/ ssh `whoami`@$B 'ls -li dst/data/ ; cat dst/data/*' Have fun, Joachim -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Sep-30 15:34 UTC
DO NOT REPLY [Bug 4128] Make --ignore-times work better with --link-dest by adding an after-transfer check
https://bugzilla.samba.org/show_bug.cgi?id=4128 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement Status|NEW |ASSIGNED Summary|ignore-times with link-dest |Make --ignore-times work |behaves unexpected / |better with --link-dest by |sematics not clear |adding an after-transfer | |check ------- Comment #2 from wayned@samba.org 2006-09-30 10:34 MST ------- The --link-dest option only links files together that are found to be identical during the pre-transfer identicality checking, never as an extra check after a file has been updated (though it would be nice to add that as an improvement at some point, it doesn't work that way at present). So, I think the new behavior in CVS is misguided in its attempt to make the --link-dest option play nice with the --ignore-times option. I've removed that code and also added a mention to the --link-dest section that --ignore-times will prevent any hard-linking from occurring (though it will use the hierarchy to make the transfers more efficient). I'll turn this bug report into a feature request: It would be nice if the receiver would notice that it copied 100% of the data from a --link-dest basis file to the destination temp file while also having all preserved attributes the same. Such a file would get its temp file dropped and the destination file hard-linked to the --link-dest basis file. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Sep-30 15:48 UTC
DO NOT REPLY [Bug 4128] Make --ignore-times work better with --link-dest by adding an after-transfer check
https://bugzilla.samba.org/show_bug.cgi?id=4128 ------- Comment #3 from wayned@samba.org 2006-09-30 10:47 MST ------- Let me comment about the checksum slowness: There is a diff in the patches dir named early-checksum.diff that makes the receiver do its checksumming at the same time that the sender is doing its checksumming. I'm considering including this patch, but I need to do some more performance testing first. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2006-Sep-30 16:30 UTC
DO NOT REPLY [Bug 4128] Make --ignore-times work better with --link-dest by adding an after-transfer check
https://bugzilla.samba.org/show_bug.cgi?id=4128 ------- Comment #4 from hashproduct+rsync@gmail.com 2006-09-30 11:30 MST ------- (In reply to comment #2)> It would be nice if the receiver would notice that it copied 100% of the data > from a --link-dest basis file to the destination temp file while also having > all preserved attributes the same.Along the same lines, if rsync notices that the temp file has the same data as the existing destination file, it could discard the temp file and instead tweak attributes of the existing destination file as it would have if the quick check had passed. This way, if I wished to copy into a destination in which make is being used, I could disable --times, and the mtime of a destination file would only be hit if its data actually changed. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-Aug-18 14:00 UTC
DO NOT REPLY [Bug 4128] Make --ignore-times work better with --link-dest by adding an after-transfer check
https://bugzilla.samba.org/show_bug.cgi?id=4128 matt@mattmccutchen.net changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |5583 ------- Comment #5 from matt@mattmccutchen.net 2008-08-18 09:00 CST ------- The post-transfer tweaks or --link-dest checks discussed here would be based on the identical-data check of bug 5583. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.