thr3ads.net - rsync - Mass software update distribution + checksum-updating [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Gavriel State

2007-Jul-09 06:20 UTC

Mass software update distribution + checksum-updating

Hello Rsync devs,

We're investigating ways to provide large scale software updates for 
multi-gigabyte games, and have recently begun to explore whether rsync 
may fit the bill.  In particular, the checksum-updating patch looks like 
it might be able to solve our biggest concerns about CPU load on the 
update server, since the actual content being served will change quite 
rarely.

Would an rsync server running 3.0 CVS + the checksum-updating patch 
still retain the precomputed checksum advantage when talking to an older 
2.6.9 client?  Or would a 3.0 client be required on both sides?

Alternatively, would it be difficult to backport the checksum-updating 
patch to a 2.6.9 server?

Lastly, does anyone have any empirical data on how well an rsync server 
with checksum-updating works with large number (eg: hundreds to 
thousands) of simultaneous clients?  If not, we'll be happy to provide 
some data from our own testing if we do end up going down this path.

Thanks,
 -Gav

-- 
Gavriel State 
Founder & CTO
TransGaming Inc.
gav@transgaming.com
http://www.transgaming.com

Martin Schröder

2007-Jul-09 09:28 UTC

head link

Mass software update distribution + checksum-updating

2007/7/9, Gavriel State <gav@transgaming.com>:> We're investigating ways to provide large scale software updates for
> multi-gigabyte games, and have recently begun to explore
IMHO bittorrent is a better way to do this.

Best
   Martin

Wayne Davison

2007-Jul-10 03:26 UTC

head link

Mass software update distribution + checksum-updating

On Mon, Jul 09, 2007 at 02:13:36AM -0400, Gavriel State
wrote:> In particular, the checksum-updating patch looks like it might be able
> to solve our biggest concerns about CPU load on the update server,
> since the actual content being served will change quite rarely.
The only checksum that is being cached is the one that the user can
optionally request for a pre-transfer check.  It's not usually needed,
unless the "quick check" algorithm (size + mtime) has a chance of
being
wrong.

A better update strategy would be some kind of a binary patch algorithm.
Since the user should be starting with a limited set of initial files,
you only need a limited set of updates.  One way to do this with rsync
is to use its batch processing.  A batch saves off the data that was
used to update a file to a new version.  You could have deployed an
update script that identifies what version of the program they have,
checks the server to see what the latest version is, and then downloads
a batch file for changing the old version into the new (applying it via
rsync's batch processing).  As long as you have a copy of each released
version on the server, it would be easy to create these update files via
the --only-write-batch=NAME option when a new version was released.

I could even imagine a custom rsync server that used the data from the
generator to identify which version of a file the user had and to choose
which pre-recorded data stream to send to the user to effect the update
instead of computing the binary patch "live".

You may want to check into some other binary-patching software to see
what your options are (I haven't looked into it).
> Would an rsync server running 3.0 CVS + the checksum-updating patch 
> still retain the precomputed checksum advantage when talking to an older 
> 2.6.9 client?
Sure, it works when talking to any rsync client.
> Alternatively, would it be difficult to backport the checksum-updating 
> patch to a 2.6.9 server?
It wouldn't be difficult.  The checksum-xattr patch would be even easier
to port (as long as you have xattrs on your server) since it doesn't
even need an rsync with xattr support (it just needs to use an extended
attribute read function).
> Lastly, does anyone have any empirical data on how well an rsync server 
> with checksum-updating works with large number (eg: hundreds to 
> thousands) of simultaneous clients?
Not that I know of.  For really large files, that is likely to be quite
a memory and CPU hog.  Each client will be sending you checksum data for
the whole file, and then the server will be doing its own checksumming
and block comparisons using this in-memory checksum cache.

..wayne..

Seemingly Similar Threads

Search for more apparently analagous threads

rsync - Jul 2007 - Mass software update distribution + checksum-updating

Mass software update distribution + checksum-updating

Mass software update distribution + checksum-updating

Mass software update distribution + checksum-updating

Seemingly Similar Threads