Jeff Kowalczyk
2003-Jun-08 00:31 UTC
state of the rsync nation? (revisited 6/2003 from 11/2000)
I'm interested in these very questions (librsync-rsync relationship, remaining limitations of rsync, active prospects for ground-up rewrites), Google searches for rsync info have proved a little too vague due to the programs ubiquity. Much has certainly changed since this was written, could some people with knowledge in these areas could update martin's response for the state of rsync, June 2003? Thanks. On 13 Nov 2000, Jason Ozolins wrote: http://lists.samba.org/pipermail/rsync/2000-November/003147.html> Just a quick question: is the librsync contained within the rproxy > source code meant to be tracking the development of the mainstream > rsync, or is it a stripped-down thing meant only to support rproxy?On 13 Nov 2000, Martin Pool Responded: Here's a quick history: In the beginning was rsync, which is a file transfer protocol. At the moment I look after the day-to-day stuff, and tridge watches the evolution. rsync gave rise to Josh Macdonald's XDelta, which is optimized for the case where old and new versions are on the same machine, and so it can generate more efficient deltas. tridge extracted the algorithm into librsync, which I renamed to libhsync when I changed the wire format. The code currently checked in as librsync is in my opinion not very good. It tries to make the algorithm available at various levels to programs that would like to use it, though the only user at the moment is rproxy. rsync doesn't use libhsync -- possibly it never will, as we care enough about rsync performance that tighter integration is justified. Well, if we were starting from scratch it might be separated out, but it's not worth doing it retrospectively now. The problems with rsync at the moment are basically: * Quirks of design ('triangular' TCP sockets, etc) tend to provoke bugs in operating systems or remote shells. * Useful features have been added in ad-hoc, and so the code is fairly crufty in places. * People still want even more features for special cases. To avoid feature hell, my opinion is that we need a clean scripting or plugin mechanism.=20 * rsync is optimized for transferring relatively small trees (e.g. the rsync source tree) across slow links (e.g. 56kbps ppp). This is fine and important, but people want to use it for different situations (10GB, 100Mbps, 50 in parallel) where some design decisions (e.g. traverse the whole tree up front) are no longer optimal or even adequate. rproxy uses the rsync algorithm to improve HTTP caching -- it's not rsync-over-HTTP. I'm the lead developer for it, and it's in beta. Completely unrelated to rproxy, sfr has added a small feature to tunnel rsync through HTTP CONNECT proxies. Therefore, some people at Linuxcare (primarily rusty, tridge and myself) are looking at a ground-up rewrite with new code and a new network protocol. (Of course we will have a fallback mode.) This might be called rsync-3.0, or rsync-tng, or tsync, or something else. This will likely be a more traditional client-server protocol, somewhat similar to FTP and HTTP in that the client sends commands to the server to put or get files. However, commands will be pipelined, network-independent binary, and using only a single tcp connection. In general we hope that there will be less special cases, and probably that there will be less application-level intelligence in the server and more in the client. This should be a firmer foundation for building things such as * implementations in different languages/platforms (Java, Win32 native, INTERCAL, ...) * interactive rsync (like ftp(1)) * two-way rsync (controlled by the client, which could be automatic or even have a GUI.) * rsync as a transport for things such as CVS Discussion about either feature requests or implementation ideas would be very welcome. It's probably best to send them to the rsync mailing list.> The reason I ask is that I am thinking of extending Bob Edwards' > rsync-based backup server architecture here at DCS, using a database to > hold file metadata, doing binary deltas for history, and doing block > compression on backed up data. This is a fair amount of stuff to > change, and I was wondering which source base would be better to start > with.You might like to look at the XDelta work on XDFS and PCVS, or in the longer term to work on rsync 3.0.
Donovan Baarda
2003-Jun-08 15:43 UTC
state of the rsync nation? (revisited 6/2003 from 11/2000)
On Sun, 2003-06-08 at 00:31, Jeff Kowalczyk wrote:> I'm interested in these very questions (librsync-rsync relationship, > remaining limitations of rsync, active prospects for ground-up rewrites), > Google searches for rsync info have proved a little too vague due to the > programs ubiquity. Much has certainly changed since this was written, > could some people with knowledge in these areas could update martin's > response for the state of rsync, June 2003? Thanks.regarding librsync... It is still in sort-of-active development on SourceForge by a variety of developers... a new release is waiting in CVS for me to finally get around to releasing it, but I'm busy on a big contract at the moment so its currently on hold pending some more cygwin/win32 testing. It is in active use by projects like rdiff-backup. AFAIK, rproxy is pretty much dead, and the only version that exists depends on a very old version of libhsync. The closest thing to this available now is the http proxy "proof of concept" with xdelta, but it's radically different in many ways to the old rproxy (due to xdelta not using signatures).> On 13 Nov 2000, Jason Ozolins wrote: > http://lists.samba.org/pipermail/rsync/2000-November/003147.html > > Just a quick question: is the librsync contained within the rproxy > > source code meant to be tracking the development of the mainstream > > rsync, or is it a stripped-down thing meant only to support rproxy? > > On 13 Nov 2000, Martin Pool Responded: Here's a quick history:[...]> rsync gave rise to Josh Macdonald's XDelta, which is optimized for the > case where old and new versions are on the same machine, and so it can > generate more efficient deltas.xdelta is still under active development by Josh, and is evolving into a fancy versioning virtual file system... an ideal back-end for something like subversion. Josh tends to develop stuff with little fanfare, but his code tends to be _very_ clean.> tridge extracted the algorithm into librsync, which I renamed to libhsync > when I changed the wire format. The code currently checked in as librsync > is in my opinion not very good. It tries to make the algorithm available > at various levels to programs that would like to use it, though the only > user at the moment is rproxy. rsync doesn't use libhsync -- possibly it > never will, as we care enough about rsync performance that tighter > integration is justified. Well, if we were starting from scratch it might > be separated out, but it's not worth doing it retrospectively now.[...] This is largely still true, except libhsync changed back to librsync and now has its own project on SourceForge separate from the mostly defunct rproxy. librsync itself has no "wire format", being just a general purpose signature/delta/patch library implementing the rsync algorithm. The comments about rsync never using libhsync/librsync are still true for the foreseeable future. There are many things rsync includes that are still missing from librsync, and the rsync implementation is very tightly coupled, with many backwards compatibility issues. Even when librsync reaches the point of being as good or better than rsync at signature/delta/patch calculation, it would be a major task to "fit it into" rsync. rsync also has more active development, mostly in the form of incremental feature additions and the resulting "bugfix fire-fighting", all of which lead to an even more tangled implementation. Occasionally there are efforts to re-write and clean up sections of the code, but they are (rightly) regarded cautiously because of the breakage risk involved for little immediate gain. The librsync code in CVS is still largely "not very good". It is pretty messy and needs a good cleanup. The API is mostly OK though, and it _does_ work quite well, with no known bugs. I have some plans for a major cleanup and optimisation of the code based on my experiences with pysync. I have a patch submitted that I plan to commit after the next release that optimises and cleans up the delta calculation code quite a bit. The "next big thing" in delta calculation is probably going to be the vcdiff encoding format, which should allow a common delta format for various applications and supports "self-referencing delta's", which makes it capable of compression. According to the xdelta project this has already been implemented, and I'm keen to see Josh's code, as it could be used as the basis for a cleanup/replacement of at least the "patch" component of librsync. Possibly worth also mentioning is things like pysync which is a demonstration implementation of rsync and xdelta, as well as a wrapper for librsync. I'm kind of embarrassed though that at the moment rdiff-backup probably has a better python wrapper of librsync than pysync does. I believe there has also been some implementations of rsync in Perl (one that claims to talk to rsync, which is an amazing achievement), but I'm not up to date on those. I think someone has a Perl wrapper for librsync that was being used as a test bed for rsync 3 type development (superlifter?). For the future I can see continued support of the exising rsync code. I would also like to see librsync adopt vcdiff as it's delta format, and get a major cleanup, possibly by re-using some xdelta code. There are many common elements to the xdelta and rsync algorithms, and I see no reason why a single library couldn't support both (as pysync does). It would be nice if librsync and/or xdelta could become _the_ delta library. -- ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ----------------------------------------------------------------
Brad Hards
2003-Jun-09 22:23 UTC
state of the rsync nation? (revisited 6/2003 from 11/2000)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 8 Jun 2003 15:43 pm, Donovan Baarda wrote:> The comments about rsync never using libhsync/librsync are still true > for the foreseeable future. There are many things rsync includes that > are still missing from librsync, and the rsync implementation is very > tightly coupled, with many backwards compatibility issues. Even when > librsync reaches the point of being as good or better than rsync at > signature/delta/patch calculation, it would be a major task to "fit it > into" rsync.The downside to not having a library that is wire-compatible with rsync - --daemon is that it is damn difficult to write something that works as a VFS / kioslave type device. I had a hack at this, by wrapping the rsync executable, and it worked a bit, but it was way too fragile for any real use: http://www.cuneata.net/rsync-kio.html Brad -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE+5Hw9W6pHgIdAuOMRAr7xAJ4j8mRta8NziilLSc39hguut+8guQCeIJ5R +wZ/EDtAfZm4baxESxzBcIE=HhLE -----END PGP SIGNATURE-----