Hello Rsync devs, We're investigating ways to provide large scale software updates for multi-gigabyte games, and have recently begun to explore whether rsync may fit the bill. In particular, the checksum-updating patch looks like it might be able to solve our biggest concerns about CPU load on the update server, since the actual content being served will change quite rarely. Would an rsync server running 3.0 CVS + the checksum-updating patch still retain the precomputed checksum advantage when talking to an older 2.6.9 client? Or would a 3.0 client be required on both sides? Alternatively, would it be difficult to backport the checksum-updating patch to a 2.6.9 server? Lastly, does anyone have any empirical data on how well an rsync server with checksum-updating works with large number (eg: hundreds to thousands) of simultaneous clients? If not, we'll be happy to provide some data from our own testing if we do end up going down this path. Thanks, -Gav -- Gavriel State Founder & CTO TransGaming Inc. gav@transgaming.com http://www.transgaming.com
Martin Schröder
2007-Jul-09 09:28 UTC
Mass software update distribution + checksum-updating
2007/7/9, Gavriel State <gav@transgaming.com>:> We're investigating ways to provide large scale software updates for > multi-gigabyte games, and have recently begun to exploreIMHO bittorrent is a better way to do this. Best Martin
On Mon, Jul 09, 2007 at 02:13:36AM -0400, Gavriel State wrote:> In particular, the checksum-updating patch looks like it might be able > to solve our biggest concerns about CPU load on the update server, > since the actual content being served will change quite rarely.The only checksum that is being cached is the one that the user can optionally request for a pre-transfer check. It's not usually needed, unless the "quick check" algorithm (size + mtime) has a chance of being wrong. A better update strategy would be some kind of a binary patch algorithm. Since the user should be starting with a limited set of initial files, you only need a limited set of updates. One way to do this with rsync is to use its batch processing. A batch saves off the data that was used to update a file to a new version. You could have deployed an update script that identifies what version of the program they have, checks the server to see what the latest version is, and then downloads a batch file for changing the old version into the new (applying it via rsync's batch processing). As long as you have a copy of each released version on the server, it would be easy to create these update files via the --only-write-batch=NAME option when a new version was released. I could even imagine a custom rsync server that used the data from the generator to identify which version of a file the user had and to choose which pre-recorded data stream to send to the user to effect the update instead of computing the binary patch "live". You may want to check into some other binary-patching software to see what your options are (I haven't looked into it).> Would an rsync server running 3.0 CVS + the checksum-updating patch > still retain the precomputed checksum advantage when talking to an older > 2.6.9 client?Sure, it works when talking to any rsync client.> Alternatively, would it be difficult to backport the checksum-updating > patch to a 2.6.9 server?It wouldn't be difficult. The checksum-xattr patch would be even easier to port (as long as you have xattrs on your server) since it doesn't even need an rsync with xattr support (it just needs to use an extended attribute read function).> Lastly, does anyone have any empirical data on how well an rsync server > with checksum-updating works with large number (eg: hundreds to > thousands) of simultaneous clients?Not that I know of. For really large files, that is likely to be quite a memory and CPU hog. Each client will be sending you checksum data for the whole file, and then the server will be doing its own checksumming and block comparisons using this in-memory checksum cache. ..wayne..