thr3ads.net - rsync - Query re: rolling checksum algorithm of rsync [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Naveen Athresh

2005-Feb-09 07:32 UTC

Query re: rolling checksum algorithm of rsync

Hi,
 
I had a query wrt the topic of rsync's rolling checksum algorithm:
 
If I have a fileA that is a database file of size 100 MB on local machine.
I back it up first time (full backup) using rsync to the server assuming
block_size to be 30 KB and --compress option to compress data as it is
transferred.
 
Next time, I modify the fileA with another 100 MB new contents towards the end
(assuming my database appends that new data towards the end of the physical
fileA)
 
I now again run rsync on my fileA to back it up to server. rsync performs an
incremental backup on that fileA using its rolling checksum and when a match is
found it verifies the match using the stronger checksum.
 
Query is that during the rolling checksum algorithm, the initial 100 MB is found
to be matching and hence it does not really transfer those blocks on the
network. However, when it encounters the newly appended 100 MB towards the end
of physical fileA, it starts rolling to see if it can find a matching block in
the hashtable or till it hits block_size and then again repeats the rolling
checksum process.
 
Since all the contents are new, it goes on rolling and hence does not find a
match and hence the literal data is transmitted over the network.
 
Does this rolling for every byte addition and removal process slow down the
speed of rsync and cause any sort of a latency in incremental backups and if
not, how has this case been handled within match.c or any other associated file?
 
Should block_size be modified for varying file sizes to optimize the above
condition?
 
Any help is appreciated.
 
Thanks in anticipation,
 
Regards,
Naveen

		
---------------------------------
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'
-------------- next part --------------
HTML attachment scrubbed and removed

Wayne Davison

2005-Feb-10 08:40 UTC

head link

Query re: rolling checksum algorithm of rsync

On Tue, Feb 08, 2005 at 11:31:59PM -0800, Naveen Athresh
wrote:> Does this rolling for every byte addition and removal process slow
> down the speed of rsync and cause any sort of a latency in incremental
> backups
The purpose of the rsync algorithm is to trade increased CPU use and
increased local disk I/O (on the receiving side) in exchange for
reducing the data sent over the socket.  As long as the net connection
is the bottleneck, the rsync algorithm should result in a faster
transfer.  When it is not, disabling the algorithm (using --whole-file)
can be faster.

Certainly checksumming new data that has no match in the basis file uses
CPU with no reduction in the data sent, but rsync has no way of knowing
there will be no matches without actually doing the checksums.  The
speed of the transfer can thus suffer if the computers involved start to
run low on available CPU cycles.

I'm not sure what you mean by "latency in incremental backups",
though,
so I don't know if that fully answers your question.

..wayne..

Reasonably Related Threads

Search for more maybe matching threads

rsync - Feb 2005 - Query re: rolling checksum algorithm of rsync

Query re: rolling checksum algorithm of rsync

Query re: rolling checksum algorithm of rsync

Reasonably Related Threads