thr3ads.net - rsync - rsync to servers highly sensitive to IO load [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Jeff Woods

2008-Jul-11 14:10 UTC

rsync to servers highly sensitive to IO load

We would like to use rsync to deploy large data files over an (all to
often) faulty WAN connection.  In the past, we've used scp and, when a
transfer got interrupted, it would have to be restarted from scratch.
This is why we've begun experimenting with rsync.

The problem we're having with rsync stems from its use of checksums.
The target systems we're deploying data to are highly sensitive to IO
loads.  When an rsync resumes, the performance of the systems are
considerably degraded for a period of several minutes while the
checksum runs, driving up IO wait and invalidating the system disk
cache.

In order to throttle the load on the target systems, we're currently
using the following options:

    --bwlimit=17 --partial --append

--bwlimit is the primary mechanism for limiting the IO, --partial
resumes failed transfers (primary reason for using rsync), and
--append prevents the computation of the initial checksum when
resuming a failed transfer.

Given this background information, I have two questions:

1.  In using --append, I understand the final post-transfer checksum
is still computed.  Would it be better to take the hit up front and
compute the checksum on the partial chunk (which is smaller than the
whole file...)?  This would be the best choice if rsync maintains a
running checksum as data is transferred, negating the need to re-read
the entire file for a post-transfer checksum).  From the docs, it's
not entirely clear how this works.

2.  Are there other options I could/should be using to help in this
specific application?

Thanks,
Jeff
-------------- next part --------------
HTML attachment scrubbed and removed

Wayne Davison

2008-Jul-11 20:18 UTC

head link

rsync to servers highly sensitive to IO load

On Fri, Jul 11, 2008 at 10:10:16AM -0400, Jeff Woods
wrote:>     --bwlimit=17 --partial --append
You only want to use --append if you can guarantee that the files will
not have any changes in the existing data on the receiving side.  A
modern rsync does not compute a full-file checksum for --append unless
you use a second --append, since that slows down the appending.

If you're using -c (--checksum), you probably don't want to do that,
since that's super slow.  Just use -t (--times) and let the normal
size+mtime check look over things for you.  If you find that you really
need checksumming, you might want to look at the db.diff in the patches
dir that lets you cache checksums in a DB (i.e. SQLite or MySQL) and
associate them with unchanged files (since it matches a file's size,
mtime, ctime, and inode, it is safe).

If the source of the I/O is rsync's scanning of the directories (not the
checksumming of the files), you may want to look into the slow-down.diff
file in the patches dir, as that provides a way to get rsync to do its
directory scanning more slowly.
> negating the need to re-read the entire file for a post-transfer
> checksum
There is no such thing in rsync, since it computes the checksum as the
file is written.

..wayne..

Possibly Parallel Threads

Search for more possibly parallel threads

rsync - Jul 2008 - rsync to servers highly sensitive to IO load

rsync to servers highly sensitive to IO load

rsync to servers highly sensitive to IO load

Possibly Parallel Threads