On Mon, 2008-11-10 at 09:57 -0600, Steve Bergman wrote:> In the 3.0.4 version of the man pagem dated June 29, 2008, it still
> states:
>
> """
> (5) the efficiency of rsync?s delta-transfer algorithm may be reduced if
> some data in the destination file is overwritten before it can be copied
> to a position later in the file
> """
>
> Also, I know I have read somewhere in the past that the limitation stems
> from the fact that "rsync does not yet sort the blocks to be
updated". I
> presume that the current manual is accurate, but would be interested in
> confirmation.
Yes, this is still the case.
> Also, how how significant is this effect? I have some
> possible uses for --inplace which could benefit from not having to copy
> all that data locally every time, but which also require good network
> efficiency.
It depends on how the source file is changing. An insertion in the
middle of the file moves the data after the insertion to a later offset,
so rsync will retransmit all of that data because it is overwritten on
the destination before it can be copied to the later offset. This is
easily demonstrated:
$ cp PATH/TO/eclipse-SDK-3.4-linux-gtk.tar.gz dest
$ cat <(head -c 100000000 dest) <(echo "INSERTED DATA") \
<(tail -c +100000001 dest) >src
$ cat <(echo hi) dest >src
$ rsync --only-write-batch=batch --no-whole-file --inplace --stats src dest
Number of files: 1
Number of files transferred: 1
Total file size: 158375423 bytes
Total transferred file size: 158375423 bytes
Literal data: 58382959 bytes # everything after insertion retransmitted
Matched data: 99992464 bytes
File list size: 18
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 28
Total bytes received: 88133
sent 28 bytes received 88133 bytes 16029.27 bytes/sec
total size is 158375423 speedup is 1796.43 (BATCH ONLY)
Compare to:
$ rsync --only-write-batch=batch --no-whole-file --stats src dest
Number of files: 1
Number of files transferred: 1
Total file size: 158375423 bytes
Total transferred file size: 158375423 bytes
Literal data: 12598 bytes # only the affected block retransmitted
Matched data: 158362825 bytes
File list size: 18
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 28
Total bytes received: 88133
sent 28 bytes received 88133 bytes 35264.40 bytes/sec
total size is 158375423 speedup is 1796.43 (BATCH ONLY)
On the other hand, in-place changes (common for databases) do not move
any data and thus do not incur any efficiency loss with --inplace:
# The "echo" replaces bytes 100000001 through 100000013
# (counting from 1, like tail does).
$ cat <(head -c 100000000 dest) <(echo "CHANGED DATA") \
<(tail -c +100000014 dest) >src
$ rsync --only-write-batch=batch --no-whole-file --inplace --stats src dest
Number of files: 1
Number of files transferred: 1
Total file size: 158375409 bytes
Total transferred file size: 158375409 bytes
Literal data: 12584 bytes # only the affected block retransmitted
Matched data: 158362825 bytes
File list size: 18
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 28
Total bytes received: 88133
sent 28 bytes received 88133 bytes 35264.40 bytes/sec
total size is 158375409 speedup is 1796.43 (BATCH ONLY)
Matt