Hi all-
My test results so far indicate a pretty decent improvement in overall rsync
performance when using a slightly more sophisticated checksum calculation.
The attached patch has the required changes (in hindsight, I should have
compressed this using zlib with the new algorithm :-) ).
Some things to know about the patch:
First, it is against the zlib library - NOT the gzip application.
By default, rsyncable computations are turned on, and the default behavior is to
use the new rolling checksum algorithm. The window and reset block sizes are
set to 30 bytes and 4096 bytes respectively. I've found that this gets much
better rsync performance when used with the Z_RSYNCABLE_RSSUM checksum
algorithm. If you want to play with the Z_RSYNCABLE_SIMPLESUM, and you want to
keep your window sizes small, be sure you run several different window sizes -
you'll be amazed at how much the compression ratio and rsync performance
vary for small window sizes with that algorithm. With Z_RSYNCABLE_RSSUM, the
compression ratios and rsync performance are quite well behaved, even for block
sizes down to 10 or 15 - but 30 seems like a safe value for the time being.
In my test runs, I'm seeing approximately 20-30% improvement in the total
number of changed bytes identified by the rsync algorithm, without any impact on
the zlib compression ratio as compared to the simpler rolling checksum
algorithm. Your results, of course, may vary :-)
This patch includes the patch for adding rsyncable behavior, plus my changes.
If you just want the basic patch without my changes, it is located at
https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch
You can configure the rsyncable behavior (which checksum to use, window size and
block size) dynamically (instead of adjusting the #define lines at the beginning
of defelate.c) by calling the deflateSetRsyncParameters() function immediately
following stream initialization, and before writing anything to the stream.
This is good if you want to play with parametric studies, etc...
If you set the rolling checksum algorithm to Z_RSYNCABLE_OFF, you will get the
exact behavior as zlib without the patch - it will be a hair slower, but
compared to the rest of what's going on in zlib, the overhead of this should
be quite negligible.
I'd love to hear feedback/comments!
Cheers,
- Kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsyncable_checksum.patch
Type: application/octet-stream
Size: 28163 bytes
Desc: not available
Url :
http://lists.samba.org/archive/rsync/attachments/20050217/5b6afa8d/rsyncable_checksum.obj