samba-bugs@samba.org
2004-Jul-03  01:12 UTC
[Bug 1489] New: Corrupt transfer with the fuzzy option
https://bugzilla.samba.org/show_bug.cgi?id=1489
           Summary: Corrupt transfer with the fuzzy option
           Product: rsync
           Version: 2.6.2
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: core
        AssignedTo: wayned@samba.org
        ReportedBy: egmont@uhulinux.hu
         QAContact: rsync-qa@samba.org
I've been using the rusty-fuzzy patch of rsync for a long time without
problems,
but now I've found a special circumstance where this option of the patched
rsync 2.6.2 leads to data corruption.
How to reproduce:
Create a /tmp/foo1 directory with four files.
foobar-1.0.txt contains one megabyte of '1' and then one megabyte of
'0' chars.
A possible way to create this file:
{ dd if=/dev/zero bs=1k count=1k | tr '\0' '1'; dd if=/dev/zero
bs=1k count=1k |
tr '\0' '0'; } > /tmp/foo1/foobar-1.0.txt
Then create foobar-2.0.txt which has a megabyte of '2' and then a meg of
'0'.
Similarly create foobar-2.1.txt which has lots of '2' and then lots of
'1'
digits. (So the contents of these three files reflect their filename.)
The md5sums are:
f6535bdc24b1074a704ef0166f93b4f0  foobar-1.0.txt
a1574b29877c570cb44436f7a404aa71  foobar-2.0.txt
80c3df2d255f201f842a89ebcb3f078c  foobar-2.1.txt
And let's create a zzz.txt with any content, it doesn't influence
anything,
only makes the example better.
Create /tmp/foo2, cp -a /tmp/foo1/foobar-1.0.txt /tmp/foo2/  but other files
should not yet be copied.
Here we go (bwlimit is not important):
$ rsync -a --fuzzy --bwlimit=100 localhost:/tmp/foo1/ /tmp/foo2/
receiving file list ... done
./
foobar-2.0.txt
foobar-2.1.txt
zzz.txt
foobar-2.1.txt
wrote 46464 bytes  read 3155804 bytes  98531.32 bytes/sec
total size is 6291460  speedup is 1.96
As the 'screenshot' shows foobar-2.1.txt is transferred two times. If
rsync
is not interrupted then at the end foobar-2.1.txt is okay.
However, after the first transfer its content is invalid (2096928 '2'
followed
by 224 '1' chars, md5sum is 795cb4c484c711d902acd3011e70832e). The file
size
and the timestamp are correct.
Hence if either rsync is interrupted or someone else mirrors us (using rsync
without -c) during this process, the result will be a file with incorrect
content but correct metadata and so further rsyncing will not repair it.
First note: it seems based on this example that rsync has some self-protecting
mechanism (a stupid program would most likely not even notice that the transfer
was incorrect and wouldn't restart it). However, this way this
self-protecting
mechanism isn't really perfect. It should check whether the whole file is
okay
before renaming it to its final name so that an interrupt cannot leave a corrupt
file on the disk. If it is not trivial to solve due to some technical
difficulties, then at least the time stamp should be set to some fake value to
force a recheck of this file if rsync gets interrupted.
Second note: I think rsync --fuzzy starts to misbehave when the closest filename
changes during the operation. In my example, initially foobar-1.0.txt was the
closest filename to foobar-2.1.txt, however, during the operation,
foobar-2.0.txt has appeared which is even closer to foobar-2.1.txt. Somehow
I guess rsync cannot clearly decide which one of these two files to use as the
reference, and this might be the cause of the problem.
-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
Reasonably Related Threads
- [Bug 1812] New: fuzzy not working in 2.6.3-pre{1,2}
 - [Bug 1489] Corrupt transfer with the fuzzy option
 - How to move an internal function to external keeping same environment?
 - potential file.copy() or documentation bug when copy.date = TRUE
 - About list to list - thanks
 
