I've been having a lot of fun improving my new-protocol testing app.
It's seems to be in pretty good shape (for test code), so I figured I'd
announce another release for those brave souls that may want to help me
in my thinking about a (potential) new rsync protocol. It's a tar.gz
file this time because I broke up the code into multiple files. I named
it "rzync" just for fun (a very confusing name, no?):
http://www.clari.net/~wayne/rzync.tar.gz
The new stuff in this release is that it can get/put an entire directory
tree of files via getd/putd, and it has conditional get/put commands
that handle both files and directories (cget/cput). (For those that
missed the first announcement, the program can be totally controlled by
an external application via a simple set of commands on stdin.)
I've included a perl script named "rs" that will take an
rsync-like
command line (as long as the destination is a directory and not a file)
and drive rzync with it. Keep in mind that rzync still has the -a
option hard-wired to on, so "rs -v /path/foo remote:/path" works like
"rsync -av /path/foo remote:/path".
Things I've noticed so far:
- My single-proc generator/receiver seems to perform well when I send
data over my DSL connection, but it goes much slower than rsync when
sending data over a local pipe. I'm guessing that this is because a
multi-process setup can keep the generator pipeline filled to a
greater degree. If this is true, one solution would be to add a
thread that would be responsible for handling all the generator tasks
(and perhaps using the GNU portable thread library if we want to be
compatible with systems that don't support process threads).
- The deltas produced by librsync are sometimes considerably larger than
those produced by rsync, so the speedup of rzync sometimes suffers
compared to rsync. I believe that this is because (even without -z)
rsync does some compression of the delta data that librsync does not
do.
- The incremental directory scanning seems to work quite well. I have
not fleshed out all the areas that would need to grow dynamically for
_really_ large jobs, so if someone wants to try to send some huge
directory trees, we'll have to flesh out some more of the code first.
- My directory-scanning code does not attempt to handle symlinks,
devices, or named sockets yet (it just skips them).
- Since the directory-scan data is shared between the two sides using
the rsync algorithm, it has the potential to save a lot of transfer
bytes when the directories on each side are similar.
Feel free to let me know what you think.
..wayne..