We would like to use rsync to deploy large data files over an (all to often) faulty WAN connection. In the past, we've used scp and, when a transfer got interrupted, it would have to be restarted from scratch. This is why we've begun experimenting with rsync. The problem we're having with rsync stems from its use of checksums. The target systems we're deploying data to are highly sensitive to IO loads. When an rsync resumes, the performance of the systems are considerably degraded for a period of several minutes while the checksum runs, driving up IO wait and invalidating the system disk cache. In order to throttle the load on the target systems, we're currently using the following options: --bwlimit=17 --partial --append --bwlimit is the primary mechanism for limiting the IO, --partial resumes failed transfers (primary reason for using rsync), and --append prevents the computation of the initial checksum when resuming a failed transfer. Given this background information, I have two questions: 1. In using --append, I understand the final post-transfer checksum is still computed. Would it be better to take the hit up front and compute the checksum on the partial chunk (which is smaller than the whole file...)? This would be the best choice if rsync maintains a running checksum as data is transferred, negating the need to re-read the entire file for a post-transfer checksum). From the docs, it's not entirely clear how this works. 2. Are there other options I could/should be using to help in this specific application? Thanks, Jeff -------------- next part -------------- HTML attachment scrubbed and removed
On Fri, Jul 11, 2008 at 10:10:16AM -0400, Jeff Woods wrote:> --bwlimit=17 --partial --appendYou only want to use --append if you can guarantee that the files will not have any changes in the existing data on the receiving side. A modern rsync does not compute a full-file checksum for --append unless you use a second --append, since that slows down the appending. If you're using -c (--checksum), you probably don't want to do that, since that's super slow. Just use -t (--times) and let the normal size+mtime check look over things for you. If you find that you really need checksumming, you might want to look at the db.diff in the patches dir that lets you cache checksums in a DB (i.e. SQLite or MySQL) and associate them with unchanged files (since it matches a file's size, mtime, ctime, and inode, it is safe). If the source of the I/O is rsync's scanning of the directories (not the checksumming of the files), you may want to look into the slow-down.diff file in the patches dir, as that provides a way to get rsync to do its directory scanning more slowly.> negating the need to re-read the entire file for a post-transfer > checksumThere is no such thing in rsync, since it computes the checksum as the file is written. ..wayne..
Possibly Parallel Threads
- breakage? when using --ignore-times with --link-dest
- Wierd timestamp problem
- checksum-xattr.diff [CVS update: rsync/patches]
- [Bug 13735] New: Synchronize files when the sending side has newer change times while modification times and sizes are identical on both sides
- --whole-file not working ?