samba-bugs@samba.org
2004-Jul-03 01:12 UTC
[Bug 1489] New: Corrupt transfer with the fuzzy option
https://bugzilla.samba.org/show_bug.cgi?id=1489 Summary: Corrupt transfer with the fuzzy option Product: rsync Version: 2.6.2 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: core AssignedTo: wayned@samba.org ReportedBy: egmont@uhulinux.hu QAContact: rsync-qa@samba.org I've been using the rusty-fuzzy patch of rsync for a long time without problems, but now I've found a special circumstance where this option of the patched rsync 2.6.2 leads to data corruption. How to reproduce: Create a /tmp/foo1 directory with four files. foobar-1.0.txt contains one megabyte of '1' and then one megabyte of '0' chars. A possible way to create this file: { dd if=/dev/zero bs=1k count=1k | tr '\0' '1'; dd if=/dev/zero bs=1k count=1k | tr '\0' '0'; } > /tmp/foo1/foobar-1.0.txt Then create foobar-2.0.txt which has a megabyte of '2' and then a meg of '0'. Similarly create foobar-2.1.txt which has lots of '2' and then lots of '1' digits. (So the contents of these three files reflect their filename.) The md5sums are: f6535bdc24b1074a704ef0166f93b4f0 foobar-1.0.txt a1574b29877c570cb44436f7a404aa71 foobar-2.0.txt 80c3df2d255f201f842a89ebcb3f078c foobar-2.1.txt And let's create a zzz.txt with any content, it doesn't influence anything, only makes the example better. Create /tmp/foo2, cp -a /tmp/foo1/foobar-1.0.txt /tmp/foo2/ but other files should not yet be copied. Here we go (bwlimit is not important): $ rsync -a --fuzzy --bwlimit=100 localhost:/tmp/foo1/ /tmp/foo2/ receiving file list ... done ./ foobar-2.0.txt foobar-2.1.txt zzz.txt foobar-2.1.txt wrote 46464 bytes read 3155804 bytes 98531.32 bytes/sec total size is 6291460 speedup is 1.96 As the 'screenshot' shows foobar-2.1.txt is transferred two times. If rsync is not interrupted then at the end foobar-2.1.txt is okay. However, after the first transfer its content is invalid (2096928 '2' followed by 224 '1' chars, md5sum is 795cb4c484c711d902acd3011e70832e). The file size and the timestamp are correct. Hence if either rsync is interrupted or someone else mirrors us (using rsync without -c) during this process, the result will be a file with incorrect content but correct metadata and so further rsyncing will not repair it. First note: it seems based on this example that rsync has some self-protecting mechanism (a stupid program would most likely not even notice that the transfer was incorrect and wouldn't restart it). However, this way this self-protecting mechanism isn't really perfect. It should check whether the whole file is okay before renaming it to its final name so that an interrupt cannot leave a corrupt file on the disk. If it is not trivial to solve due to some technical difficulties, then at least the time stamp should be set to some fake value to force a recheck of this file if rsync gets interrupted. Second note: I think rsync --fuzzy starts to misbehave when the closest filename changes during the operation. In my example, initially foobar-1.0.txt was the closest filename to foobar-2.1.txt, however, during the operation, foobar-2.0.txt has appeared which is even closer to foobar-2.1.txt. Somehow I guess rsync cannot clearly decide which one of these two files to use as the reference, and this might be the cause of the problem. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
Reasonably Related Threads
- [Bug 1812] New: fuzzy not working in 2.6.3-pre{1,2}
- [Bug 1489] Corrupt transfer with the fuzzy option
- How to move an internal function to external keeping same environment?
- potential file.copy() or documentation bug when copy.date = TRUE
- About list to list - thanks