samba-bugs at samba.org
2010-Nov-05 20:38 UTC
DO NOT REPLY [Bug 7778] New: --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 Summary: --inplace does extra WRITE operations Product: rsync Version: 3.0.7 Platform: Other OS/Version: Linux Status: NEW Severity: minor Priority: P3 Component: core AssignedTo: wayned at samba.org ReportedBy: ildar at altlinux.ru QAContact: rsync-qa at samba.org Even if a block contents in dst is the same as in src, it gets written anyway. It is fine with no --inplace. But with --inplace it is: 1. Excessive 2. Unexpected. Very troublesome if dst is (partially) sparse. Is it possible to be fixed? (I guess it's trivial) -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-06 15:23 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #1 from wayned at samba.org 2010-11-06 10:23 CST ------- What makes you think matching locations are being written? In the verbose output, a matching offset is when a seek happens. e.g.: chunk[391] of size 920 at 359720 offset=359720 I'm adding a " (seek)" suffix to that line for 3.1.0, just to make it clearer. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-06 19:05 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #2 from ildar at altlinux.ru 2010-11-06 14:05 CST ------- I am sorry if I was wrong. Here's my testcase: $ dd bs=1M seek=1 count=0 of=f1 $ dd bs=1M seek=1 count=0 of=f2 $ du f? 0 f1 0 f2 $ rsync --inplace f1 f2 $ du f? 0 f1 1,1M f2 Since files are identical, I expect nothing is written. But a sparse file became filled so something goes wrong. Any idea what? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-06 22:04 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #3 from matt at mattmccutchen.net 2010-11-06 17:04 CST ------- --inplace only avoids rewriting unchanged blocks when the delta-transfer algorithm is on, and it is off by default for a local run. You can turn it on explicitly with --no-whole-file. I'm not sure whether adding another mode where the receiver checks for unchanged blocks would be worth the effort. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-07 18:56 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #4 from ildar at altlinux.ru 2010-11-07 12:56 CST ------- Hey, Matt! I think you're right. I never knew rsync has two different protocols. But I still have a problem: if I do this: $ echo > f1 $ dd bs=1M seek=1 count=0 of=f1 $ dd bs=1M seek=1 count=0 of=f2 $ du -h f? 4,0K f1 0 f2 $ rsync --inplace --no-whole-file f1 f2 $ du -h f? 4,0K f1 1,1M f2 I still get target filled. That is rsync writes 1M byte into target while 1 byte would be enough. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-07 20:34 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #5 from matt at mattmccutchen.net 2010-11-07 14:34 CST ------- (In reply to comment #4)> But I still have a problem: if I do this: > $ echo > f1 > $ dd bs=1M seek=1 count=0 of=f1 > $ dd bs=1M seek=1 count=0 of=f2 > $ du -h f? > 4,0K f1 > 0 f2 > $ rsync --inplace --no-whole-file f1 f2 > $ du -h f? > 4,0K f1 > 1,1M f2 > > I still get target filled.I see what is happening. As the sender goes through the source file, it always matches the data against a basis file block as soon as possible and then skips the entire matched region of the source file. So in this case, it skips the '\n' and makes matches at offsets 1, 1025, ... of the source file against arbitrary basis file blocks; it never gets to an aligned offset k*1024 where it could match against basis data at the same offset. To fix this, when updating_basis_file is on, the sender would have to postpone making a nonaligned match until it checks whether the next "block" of the source file matches the basis file at the same offset. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-07 20:47 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #6 from ildar at altlinux.ru 2010-11-07 14:47 CST ------- (In reply to comment #5) Ok, BUT since the second block it should see just blocks with zeros in both src and tgt. I can agree with 1st and 2nd 1k-blocks written. But it does write the whole file! (while the difference was just 1 byte). Possible to limit the affected area? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-09 02:24 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 wayned at samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #7 from wayned at samba.org 2010-11-08 20:24 CST ------- This is caused by the repetition of the file's data. When rsync checks at offset 1 in the receiving file for a matching block, it finds a match (because all the blocks are identical after the first byte), and rsync never gets back to the 1024-byte aligned blocks on the sender to notice that the data is identical again. If your data was not so repetitive, rsync would quickly sync up and skip the rest of the writes. (You can see what it is doing via either the 3.1.0dev option --debug=deltasum3 or via -vvvv.) I'm not sure how best the code could be improved to try to avoid this. Matt's idea of block-aligned checks could be made to work (given enough read-ahead), but I'm not sure it's worth it, since it only affects very repetitive files. I do note that the code that is looking for a (preferential) identical-position block is wasting time when the receiving side block is not aligned with the sending-side's blocks. That is something that should be optimized. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Nov-11 19:30 UTC
DO NOT REPLY [Bug 7778] --inplace does extra WRITE operations
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #8 from ildar at altlinux.ru 2010-11-11 13:30 CST ------- (In reply to comment #7)> I'm not sure how best the code could be improved to try to avoid this. Matt's > idea of block-aligned checks could be made to work (given enough read-ahead), > but I'm not sure it's worth it, since it only affects very repetitive files.Sorry, I can't agree. This issue may be crucial in some cases. Overwriting blocks is bad: not just for sparse targets but also for filesystems like JFFS2, where "overwrinting" means enlarging target space. This means: for 1M target overwritten 10 times it takes 11M of space. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2011-Jan-18 07:10 UTC
DO NOT REPLY [Bug 7778] Extra writes with --inplace due to misaligned block matching
https://bugzilla.samba.org/show_bug.cgi?id=7778 ------- Comment #9 from wayned at samba.org 2011-01-18 01:10 CST ------- The latest 3.1.0dev version now re-aligns for sequences of zeros. I toyed with generalizing it for any repetitive blocks, but that would have caused extra (useless) checksumming for any inplace file update where the data moved toward the start of the file -- it doesn't seem worthwhile to slow things down in the more common cases to try to optimize the more rare data cases. So, the current code will re-sync for non-repetitive data (as it always would) and also (now) for zeros (the most common repetitive data). Further improvements may yet be possible, but I'm not looking for any at the moment. I've also optimized away the search loop that used to be there to find the right sum record for the current position in the file. That will especially help files that have a lot of identical sum records in a particular hash chain. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2011-Jun-04 19:57 UTC
[Bug 7778] Extra writes with --inplace due to misaligned block matching
https://bugzilla.samba.org/show_bug.cgi?id=7778 Wayne Davison <wayned at samba.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #10 from Wayne Davison <wayned at samba.org> 2011-06-04 19:57:56 UTC --- Closing due to already deployed fixes. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Reasonably Related Threads
- DO NOT REPLY [Bug 7194] New: Getting --inplace and --sparse to work together
- DO NOT REPLY [Bug 7337] New: sparse files not equally sparse on destination
- DO NOT REPLY [Bug 5201] New: Rsync lets user corrupt dest by applying non-inplace batch in inplace mode
- DO NOT REPLY [Bug 3693] New: rsync can use same --link-dest file several times, leading to incorrect hard links
- DO NOT REPLY [Bug 4834] New: --inplace with --backup --backup-dir does not work