For anyone who'd like to check out the latest release of my "rzync" [sic] test release, I've just released a new version. For those that might not have time to look at the code but could provide some feedback based on a rough description, I've created the following simple web page: http://www.clari.net/~wayne/new-protocol.html Here's the tar file of the new release: http://www.clari.net/~wayne/rzync-0.03.tar.gz Changes in this version: I've optimized the protocol to make the transferred-byte overhead smaller; I've used an rsync-like file-list compression to make the directory data smaller; I've gotten rid of some previous limitations (such as the 4-byte file-size limit and the lack of reallocating various buffers for really large file-count transfers); I've re-enabled the "move" versions of the various get/put commands (which were disabled in the last release); and I've fixed several bugs. The resulting program seems to be working quite well in my limited testing. The count of transferred bytes in the latest protocol is now below what rsync sends for many commands -- both a start-from-scratch update or a fully-up-to-date update are usually smaller, for instance. This is mainly because my file-list data is smaller, but it's also because I reduced the protocol overhead quite a bit. Transferred bytes for partially-changed files are still bigger than rsync because librsync creates unusually large delta sizes (though there's a patch that makes it work much better, it's still not as good as rsync). In my speed testing, one test was sending around 8.5 meg of data on a local system, and while rsync took only .5 seconds, my rzync app took around 2 seconds. A quick gprof run reveals that 98% of the runtime is being spent in 2 librsync routines, so it looks like librsync needs to be optimized a bit. One potential next steps might include optimizing rsync to make the transferred file-list size a little smaller (e.g. making the transfer of the "size" attribute only as long as needed to store the number would save ~4-5 bytes per file entry on typical files). It looks like work needs to be done on making librsync more efficient. Until I can get some better speed tests, I'm unsure if I should attempt to make rsync talk my new protocol. Opinions welcomed. ..wayne..
On Fri, Jun 21, 2002 at 03:46:39AM -0700, Wayne Davison wrote:> The count of transferred bytes in the latest protocol is now below what > rsync sends for many commands -- both a start-from-scratch update or a > fully-up-to-date update are usually smaller, for instance. This is > mainly because my file-list data is smaller, but it's also because I > reduced the protocol overhead quite a bit. Transferred bytes for > partially-changed files are still bigger than rsync because librsync > creates unusually large delta sizes (though there's a patch that makes > it work much better, it's still not as good as rsync).I believe that the remaining difference is rsync does "context compression" using zlib. I believe librsync does no compression at all yet. Even if you zlib compress librsync's delta's, they will still be bigger than rsync because of the "context" it uses... it compresses the whole file, hits and misses, but only sends the compressed output for the misses. This means the compressor is "primed" with data from the hits. I think that the best solution for this is to do what xdelta is planning to do... toss zlib and include target references as well as source references in the delta instruction stream; do the compression yourself. One way to do this is implement xdelta-style non-block aligned matches against the target, building a rollsum hash-tree as you go through it, and run it alongside the rsync block match algorithm. However, this might not work well in practice...> In my speed testing, one test was sending around 8.5 meg of data on a > local system, and while rsync took only .5 seconds, my rzync app took > around 2 seconds. A quick gprof run reveals that 98% of the runtime is > being spent in 2 librsync routines, so it looks like librsync needs to > be optimized a bit. > > One potential next steps might include optimizing rsync to make the > transferred file-list size a little smaller (e.g. making the transfer of > the "size" attribute only as long as needed to store the number would > save ~4-5 bytes per file entry on typical files). > > It looks like work needs to be done on making librsync more efficient.I'm going to get onto this after this week end. I know what needs to be done... I just need the time to do it. -- ---------------------------------------------------------------------- ABO: finger abo@minkirri.apana.org.au for more info, including pgp key ----------------------------------------------------------------------
Wayne Davison
2002-Jun-21 17:58 UTC
rZync 0.04 -- a faster next-generation protocol test app
FYI, I decided to release a new version of my next-generation protocol test app because I created an optimized transfer mode when files are being sent whole (it bypasses all calls to librsync). This makes my "rZync" test app faster than rsync for sending whole files (rather than 4x slower, like it was). This is significant because it helps to assure me that my single-process generator/receiver will be able to keep up with rsync's dual process implementation. A full-file transfer appears to be faster than rsync, even on a dual processor system. For instance, this test was 775 files in 126 directories: ---------------------------------- rsync ---------------------------------- wrote 32920749 bytes read 12420 bytes 9409476.86 bytes/sec total size is 32869747 speedup is 1.00 rsync -av foo /tmp 2.23s user 1.54s system 162% cpu 2.314 total wrote 32920749 bytes read 12420 bytes 7318482.00 bytes/sec total size is 32869747 speedup is 1.00 rsync -av foo /tmp 2.23s user 1.55s system 105% cpu 3.588 total ---------------------------------- rZync ---------------------------------- wrote 32900189 bytes (16813) read 5534 bytes (5534) 13162289.20 bytes/sec total size is 32869700 speedup is 1.00 rs -av foo /tmp 0.34s user 0.56s system 39% cpu 2.274 total wrote 32900064 bytes (16688) read 5534 bytes (5534) 13162239.20 bytes/sec total size is 32869700 speedup is 1.00 rs -av foo /tmp 0.42s user 0.69s system 58% cpu 1.910 total --------------------------------------------------------------------------- I've also updated my new-protocol web page to explain what I'm trying to accomplish (which some folks probably missed the first-time around): http://www.clari.net/~wayne/new-protocol.html Here's the tar file of the new release: http://www.clari.net/~wayne/rzync-0.04.tar.gz For that that want to try this out, use the "rs" perl script to control rZync in an rsync-like manner (a temporary, test-mode situation), or control it yourself by sending it commands on stdin. ..wayne..