samba-bugs at samba.org
2009-Oct-15 15:24 UTC
DO NOT REPLY [Bug 6816] New: Delta-transfer algorithm does not reuse already transmitted identical blocks
https://bugzilla.samba.org/show_bug.cgi?id=6816 Summary: Delta-transfer algorithm does not reuse already transmitted identical blocks Product: rsync Version: 3.0.5 Platform: Other OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: core AssignedTo: wayned at samba.org ReportedBy: martin at scharrer-online.de QAContact: rsync-qa at samba.org Hi, I observed the following behavior of rsync: If a file contains identical blocks (e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the delta-transfer algorithm - BUT only if one of these blocks is already in the destination file. If not or if the destination file does not exists yet, all identical blocks are copied over and over again. In some special cases (e.g. large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is much better to interrupt the rsync operations after a while and restart it so that the identical blocks are reused, not re-transfered. A good (but kind of trivial) example whould be a big file (say 1GB) only containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered without the -S option. If the file does not exists at the destination it is copied as a whole like e.g. 'scp' whould do it. I my case it is copied with about 2MB/s. But if the file already exists, even which only a very small size, the identical blocks are reused and the "transfer speed" is around the destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs ramdisk). I also tested this with a file with pseudo-random, but repeating content (dd if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp > file). If the first rsync process is aborted and restarted after the first repeating block was transfered the second rsync process is only sending meta-data, because the existing content is just replicated. It would be great if the delta-transfer algorithm would be extended to account for identical to-be-send data blocks, i.e. first send the first appearance of such a block and then simply reuse it during the same rsync process. IMHO this should not be so difficult to implement, because most needed functionality is already there. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2009-Oct-15 15:29 UTC
DO NOT REPLY [Bug 6816] Delta-transfer algorithm does not reuse already transmitted identical blocks
https://bugzilla.samba.org/show_bug.cgi?id=6816 ------- Comment #1 from martin at scharrer-online.de 2009-10-15 10:29 CST ------- This enhancement would also effectively solve bug #5801, also reported by me. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
Seemingly Similar Threads
- DO NOT REPLY [Bug 5801] New: Sparse (-S) option doesn't work with new files
- rsync transfers whole content when a new hardlink is created
- DO NOT REPLY [Bug 6788] New: rsync does not abort early but needlessly transfers data if destination is write protected
- Feature request: rsync of device content
- transfer cost with delta-transfer algorithm