Keating, Tim
2001-Nov-30 05:41 UTC
Rsync: Re: patch to enable faster mirroring of large filesyst ems
I, too, was disappointed with rsync's performance when no changes were required (23 minutes to verify that a system of about 3200 files was identical). I wrote a little client/server python app which does the verification, and then hands rsync the list of files to update. This reduced the optimal case compare time to under 30 seconds. Here's what it does, and forgive me if these sound similar to the stuff you're doing: - The client and server cache checksums (MD5, since there is no MD4 implementation conveniently available for Python that I know of) on a per-directory basis. These are kept in a .checksum file in the directory, so they persist from session to session. This is especially handy for the server, where (in my particular case) the files don't change very often. - On the initial compare the client sends the checksum of each .checksum file; if they match, it's not necessary to send the .checksum file, and we just culled an entire directory for a cost of about a 32 byte transfer. - If there's a mismatch, the client sends over the entire .checksum file. The server does the compare and sends back a list of files to delete and a list of files to update. (And now I think of it, it would probably be better if the server just sent the client back the list of files and let the client figure out what it needed, since this would distribute the work better.) - The client deletes the delete files, and uses rsync to update the update files. The ideal case is when all checksums are up to date. The worst-case is when the checksum cache needs to be built completely -- but this still only takes a couple of minutes, easily an order of magnitude better than the best-case I experienced with raw rsync.> -----Original Message----- > From: Alberto Accomazzi [mailto:aaccomazzi@cfa.harvard.edu] > Sent: Thursday, November 29, 2001 10:02 AM > To: Dave Dykstra > Cc: rsync@samba.org > Subject: Re: Rsync: Re: patch to enable faster mirroring of large > filesystems
Dave Dykstra
2001-Nov-30 05:52 UTC
Rsync: Re: patch to enable faster mirroring of large filesyst ems
Were you using the -c option of rsync? It sounds like you were and it's extremely slow. I knew somebody who once went to extraordinary lengths to avoid the overhead of -c, making a big patch to rsync to cache checksums, when all he had to do was not use -c. - Dave Dykstra On Thu, Nov 29, 2001 at 12:41:46PM -0600, Keating, Tim wrote:> I, too, was disappointed with rsync's performance when no changes were > required (23 minutes to verify that a system of about 3200 files was > identical). I wrote a little client/server python app which does the > verification, and then hands rsync the list of files to update. This reduced > the optimal case compare time to under 30 seconds. Here's what it does, and > forgive me if these sound similar to the stuff you're doing: > > - The client and server cache checksums (MD5, since there is no MD4 > implementation conveniently available for Python that I know of) on a > per-directory basis. These are kept in a .checksum file in the directory, so > they persist from session to session. This is especially handy for the > server, where (in my particular case) the files don't change very often. > > - On the initial compare the client sends the checksum of each .checksum > file; if they match, it's not necessary to send the .checksum file, and we > just culled an entire directory for a cost of about a 32 byte transfer. > > - If there's a mismatch, the client sends over the entire .checksum file. > The server does the compare and sends back a list of files to delete and a > list of files to update. (And now I think of it, it would probably be better > if the server just sent the client back the list of files and let the client > figure out what it needed, since this would distribute the work better.) > > - The client deletes the delete files, and uses rsync to update the update > files. > > The ideal case is when all checksums are up to date. The worst-case is when > the checksum cache needs to be built completely -- but this still only takes > a couple of minutes, easily an order of magnitude better than the best-case > I experienced with raw rsync. > > > -----Original Message----- > > From: Alberto Accomazzi [mailto:aaccomazzi@cfa.harvard.edu] > > Sent: Thursday, November 29, 2001 10:02 AM > > To: Dave Dykstra > > Cc: rsync@samba.org > > Subject: Re: Rsync: Re: patch to enable faster mirroring of large > > filesystems
Maybe Matching Threads
- Rsync: Re: patch to enable faster mirroring of large filesyst ems
- Rsync: Re: patch to enable faster mirroring of large filesyst ems
- Rsync: Re: patch to enable faster mirroring of large filesyst ems
- Rsync: Re: patch to enable faster mirroring of large filesyst ems
- Rsync: Re: patch to enable faster mirroring of large filesyst ems