thr3ads.net - rsync - performance problem of using parallel rsync to stage data from 1 source to multiple destination [Aug 2005]

If this information is useful, please help other people find it:
Share via:

xuehai zhang

2005-Aug-31 21:44 UTC

performance problem of using parallel rsync to stage data from 1 source to multiple destination

Hi all,
I am new to rsync and I apologize in advance if my question is shallow.
I write a simple script to use rsync to transfer a big file (~600MB)from a
single source to variable
number of destinations in a parallel way. When I transfer the file to 4
destination machines, I get
X overall transfer time. Then I transfer the same file to 8 destination machines
and I get Y overall
transfer time. However, Y is smaller than 2 * X from my experimental results.
Why the time of
transferring the file to 2N nodes is shorter than twice of the time of
transferring the same file to
N nodes? Does it make sense to you? What could be the reason if it makes sense
in some way?
Thank you so much for your help in advance!
Xuehai

P.S. the script to do the parallel rsync

#!/bin/sh

LIST="ccn2"

if [ "$#" -gt "0" ] ; then
   if  [ "$1" -eq "2" ] ; then
      LIST="ccn2"
   fi
   if  [ "$1" -eq "4" ] ; then
      LIST="ccn2 ccn3 ccn4"
   fi
   if  [ "$1" -eq "6" ] ; then
      LIST="ccn2 ccn3 ccn4 ccn7 ccn6"
   fi
   if  [ "$1" -eq "8" ] ; then
      LIST="ccn2 ccn3 ccn4 ccn5 ccn6 ccn7 ccn8"
   fi
fi

echo "nodes: $LIST"

date
for dest in $LIST
do
   time rsync -az /tmp/disk.img $dest:/tmp/&
done
wait
date

Paul Slootman

2005-Sep-01 10:32 UTC

head link

performance problem of using parallel rsync to stage data from 1 source to multiple destination

On Wed 31 Aug 2005, xuehai zhang wrote:
> results. Why the time of transferring the file to 2N nodes is shorter than 
> twice of the time of transferring the same file to N nodes? Does it make 
If the network is not the bottleneck, then cpu or the disks are. If
(similar) tasks are run in parallel, then the data of the files being
handled may still be in the buffer cache so that it doesn't need to get
read in from disk again. This will save time...


Paul Slootman

Wayne Davison

2005-Sep-02 16:46 UTC

head link

performance problem of using parallel rsync to stage data from 1 source to multiple destination

Since it sounds like disk I/O is your limiting factor, you may wish to
look into updating multiple systems using a batch file.  This requires
all the receiving systems to have identical files in the destination
hierarchy.  You would first create a batch file by performing the
synchronization either to one of the destination systems, or even to a
second physical disk on the sending system in an effort to create the
batch file more quickly (if you avoid a copy to the same physical disk*,
it will avoid having both the sending and receiving disk I/O hitting a
single disk).  The batch-writing command would be:

  rsync -av --write-batch=xfer1 /src/ /dest/

Then, you would update all the (remaining) destinations by reading that
batch file:

  rsync -av --read-batch=xfer1 $dest:/tmp/ &

As long as you're using at least 2.6.3, batch mode should work quite
well (older versions used an experimental implementation that is not
recommended).

*One other thing you might try to create the batch file quickly using a
local copy on a machine that doesn't have a second physical disk:  use
the option --only-write-batch instead of --write-batch.  This would use
an extra copy of the destination files somewhere on the sending system,
but not update it right away, which _might_ save some elapsed time in
the creating of the batch file (you'd have to try it and see).  Then, at
the end of the updating of the remote systems, you would use the batch
file to update your local destination mirror.

..wayne..

Possibly Parallel Threads

Search for more reasonably related threads

rsync - Aug 2005 - performance problem of using parallel rsync to stage data from 1 source to multiple destination

performance problem of using parallel rsync to stage data from 1 source to multiple destination

performance problem of using parallel rsync to stage data from 1 source to multiple destination

performance problem of using parallel rsync to stage data from 1 source to multiple destination

Possibly Parallel Threads