thr3ads.net - rsync - rsync speedup

If this information is useful, please help other people find it:
Share via:

devzero at web.de

2009-Aug-06 18:15 UTC

rsync speedup - how ?

Hello, 

i`m using rsync to sync large virtual machine files from one esx server to
another.

rsync is running inside the so called "esx console" which is basically
a specially crafted linux vm with some restrictions.

the speed is "reasonable", but i guess it`s not the optimum - at least
i don?t know where the bottleneck is.

i`m not using ssh as transport but run rsync in deamon mode on the target. so
this speeds things up if large amounts of data go over the wire.

i read that rsync would be not very efficient with ultra-large files (i`m
syncing files with up to 80gb size)

regarding the bottleneck:  neither cpu, network or disk is at their limits -
neither on the source nor on the destination system.
i don`t see 100% cpu, i don`t see 100% network or 100% disk i/o usage

furthermore, i wonder:
isn`t rsync just too intelligent for such file transfers, as the position of
data inside that files (containing harddisk-images) won`t really change?
i.e. we don`t need to check for data relocation, we just need to know if some
blocks changed inside a block of size "x" and if there was a change,
we could transfer that whole block again. so i wonder if we need a "rolling
checksum" at all to handle this. wouldn`t checksums over fixed block size
be sufficient for this task?

regards
roland

______________________________________________________
GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

Jon Forrest

2009-Aug-07 00:25 UTC

head link

rsync speedup - how ?

One way I've been trying to speedup rsync may
not apply in every situation. In my situation
when files change, they usually change completely.
This is especially true for large files. So,
the rsync algorithm does me no good. So,
I've been using the "W" flag (e.g. rsync -avzW)
to turn this off.

I don't know objectively how much difference
this makes but it seems reasonable.

Comments?

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

devzero at web.de

2009-Aug-07 07:36 UTC

head link

rsync speedup - how ?

so, instead of 500M i would transfer 100GB over the network.
that`s no option.

besides that, for transferring complete files i know faster methods than rsync.

one more question: 
how safe is transferring a 100gb file, i.e. as rsync is using checksums
internally to compare the contents of two files, how can i calculate the risk of
2 files being NOT perfectly in sync after rsync run ?  i assume there IS a risk,
just as like there is a risk that 2 files may have the same md5 checksum by
chance....

regards
roland


>List:       rsync
>Subject:    Re: rsync speedup - how ?
>From:       Jon Forrest <jlforrest () berkeley ! edu>
>Date:       2009-08-07 0:25:12
>Message-ID: h5fs8m$nqj$1 () ger ! gmane ! org
>[Download message RAW]
>
>One way I've been trying to speedup rsync may
>not apply in every situation. In my situation
>when files change, they usually change completely.
>This is especially true for large files. So,
>the rsync algorithm does me no good. So,
>I've been using the "W" flag (e.g. rsync -avzW)
>to turn this off.
>I don't know objectively how much difference
>this makes but it seems reasonable.
>Comments?
________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
f?r nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

devzero at web.de

2009-Aug-07 16:44 UTC

head link

rsync speedup - how ?

> devzero at web.de wrote:
> > so, instead of 500M i would transfer 100GB over the network.
> > that`s no option.
> 
> I don't see how you came up with such numbers.
> If files change completely then I don't see why
> you would transfer more (or less) over the network.
> The difference that I'm thinking of is that
> by not using the rsync algorithm then you're
> substantially reducing the number of disk I/Os.
let me explain: all files are HUGE datafiles and they are of constant size.
they are harddisk-images and the contents being changed inside, i.e. 
specific blocks in the files being accessed and rewritten.

so, the question is:
is rsync rolling checksum algorithm the perfect (i.e. fastest) algorithm to
match
changed blocks at fixed locations between source and destination files ?
i`m not sure because i have no in depth knowledge of the mathematical background
in rsync algorithm. i assume: no - but it`s only a guess...
> The reason I say this, and I could be wrong since
> I'm no rsync algorithm expert, is because when the
> local version of a file and the remote version of
> a file are completely different, and the rsync
> algorithm is being used, the amount of I/O
> that must be done consists of the I/Os that
> compare the two files, plus the actual transfer
> of the bits from the source file to the destination
> file. (That's a very long sentence, isn't it.)
> Please correct this thinking if it's wrong.
yes, that`s correct. but what i`m unsure about is, if
rsync isn`t doing too much work with detecting the
differences. it doesn`t need to "look forth and back" (as i read 
somewhere it would) , it just need to check if block1 in filea differs 
from block1  in fileb.sorta stupid comparison without need for complex
math or any real "intelligence" to detect relocation of data.
see this post: http://www.mail-archive.com/backuppc-users at
lists.sourceforge.net/msg08998.html
> > besides that, for transferring complete files i know faster methods
than rsync.
> 
> Maybe so (I'd like to hear what you're referring to) but one reason
> I like to use rsync is that using the '-avzW' flags
> results in a perfect mirror on the destination, which is
> my goal. Do your faster methods have a way of doing that?
no, i have no faster replacement which is as good in perfect mirroring like 
rsync, but there are faster methods for transferring files.
here is some example: http://communities.vmware.com/thread/29721
> > one more question: 
> > how safe is transferring a 100gb file, i.e. as rsync
> > is using checksums internally to compare the contents
> > of two files, how can i calculate the risk of 2 files
> > being NOT perfectly in sync after rsync run ?
> 
> Assuming the rsync algorithm works correctly, I don't
> see any difference between the end result of copying
> a 100gb file with the rsync algorithm or without it.
> The only difference is the amount of disk and network
> I/O that must occur.
the rsync algorithm is using checksumming to find differences.
checksums are sort of "data reduction" which create a hash from
a larger amount of data. i just want to understand what makes
sure that there are no hash collisions which break the algorithm.
mind that rsync exists for some time and by that time file sizes
transferred with rsync may have grown by a factor of 100 or 
even 1000.  

regards
roland
 
________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
f?r nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

devzero at web.de

2009-Aug-08 08:59 UTC

head link

rsync speedup - how ?

> I really don't think it's a good idea to sync large data files in
use,
> which is modified frequently, e.g. SQL database, VMware image file.
> 
> As rsync do NOT have the algorithm to keep those frequently modified
> data file sync with the source file. And this will course data file
> corrupted.
> 
> If I'm wrong, please correct me. Thanks.
they are not in use, as i do a snapshot before rsync. 
so, the file won`t change during transfer.
so i`m doing sorta "crash consistent" copy.

roland

________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
f?r nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

Wayne Davison

2009-Aug-14 14:48 UTC

head link

rsync speedup - how ?

On Thu, Aug 06, 2009 at 08:15:39PM +0200, devzero at web.de
wrote:> i read that rsync would be not very efficient with ultra-large files
> (i`m syncing files with up to 80gb size)
Things to try:

- Be sure you're using rsync 3.x, as it has a better hash algorithm for
  the large numbers of checksum blocks that need be scanned on the
  sending side.

- The --inplace option might help, since it can reduce the amount of
  write I/O when the file is being modified (though it does reduce the
  amount of backward matching).  In a really large file where most of
  the data stays the same, this could be a big win.

- Try setting the --block-size option.  This will only help if the
  block size is so large it is missing finding matching data.  In a huge
  file that is mostly unchanged, this may not be an issue.  Note that
  decreasing the block size increases the amount of checksum data, and
  the amount of blocks in the matching algorithm.

- The best things you could do would be to mount the virtual drives
  (source read-only, dest read/write) and copy within the file systems.
  That would allow rsync to use its size+mtime fast-check to skip most
  of the files.  It would not, however, result in truly identical disk
  images, so may not be a solution for you.

Keep in mind that the checksumming as it currently works requires the
receiving side to read the whole file (sending its checksums), then
(after that is done) the sending side reads the whole file (generating
differences), which allows the receiving side to reconstruct the file
while the sender is sending in the changes.  Sadly, this means that the
transfer serializes this file-reading time (since the sender wants to be
able to find moved blocks from anywhere in the file).

An interesting new option might be one that tells the sender to
immediately start comparing the received checksums to the source file,
and only check if the data matches (with no movement) or if it needs to
send the changed data (i.e. this would skip scanning for moved data).
For mostly unchanged, large files, that would allow concurrent reading
of the receiving and sending files.  Combined with --inplace, this might
be a pretty large speedup for mostly-unchanged files.

..wayne..

Seemingly Similar Threads

Search for more possibly parallel threads

rsync - Aug 2009 - rsync speedup - how ?

rsync speedup - how ?

rsync speedup - how ?

rsync speedup - how ?

rsync speedup - how ?

rsync speedup - how ?

rsync speedup - how ?

Seemingly Similar Threads