xuehai zhang
2005-Aug-31 21:44 UTC
performance problem of using parallel rsync to stage data from 1 source to multiple destination
Hi all, I am new to rsync and I apologize in advance if my question is shallow. I write a simple script to use rsync to transfer a big file (~600MB)from a single source to variable number of destinations in a parallel way. When I transfer the file to 4 destination machines, I get X overall transfer time. Then I transfer the same file to 8 destination machines and I get Y overall transfer time. However, Y is smaller than 2 * X from my experimental results. Why the time of transferring the file to 2N nodes is shorter than twice of the time of transferring the same file to N nodes? Does it make sense to you? What could be the reason if it makes sense in some way? Thank you so much for your help in advance! Xuehai P.S. the script to do the parallel rsync #!/bin/sh LIST="ccn2" if [ "$#" -gt "0" ] ; then if [ "$1" -eq "2" ] ; then LIST="ccn2" fi if [ "$1" -eq "4" ] ; then LIST="ccn2 ccn3 ccn4" fi if [ "$1" -eq "6" ] ; then LIST="ccn2 ccn3 ccn4 ccn7 ccn6" fi if [ "$1" -eq "8" ] ; then LIST="ccn2 ccn3 ccn4 ccn5 ccn6 ccn7 ccn8" fi fi echo "nodes: $LIST" date for dest in $LIST do time rsync -az /tmp/disk.img $dest:/tmp/& done wait date
Paul Slootman
2005-Sep-01 10:32 UTC
performance problem of using parallel rsync to stage data from 1 source to multiple destination
On Wed 31 Aug 2005, xuehai zhang wrote:> results. Why the time of transferring the file to 2N nodes is shorter than > twice of the time of transferring the same file to N nodes? Does it makeIf the network is not the bottleneck, then cpu or the disks are. If (similar) tasks are run in parallel, then the data of the files being handled may still be in the buffer cache so that it doesn't need to get read in from disk again. This will save time... Paul Slootman
Wayne Davison
2005-Sep-02 16:46 UTC
performance problem of using parallel rsync to stage data from 1 source to multiple destination
Since it sounds like disk I/O is your limiting factor, you may wish to look into updating multiple systems using a batch file. This requires all the receiving systems to have identical files in the destination hierarchy. You would first create a batch file by performing the synchronization either to one of the destination systems, or even to a second physical disk on the sending system in an effort to create the batch file more quickly (if you avoid a copy to the same physical disk*, it will avoid having both the sending and receiving disk I/O hitting a single disk). The batch-writing command would be: rsync -av --write-batch=xfer1 /src/ /dest/ Then, you would update all the (remaining) destinations by reading that batch file: rsync -av --read-batch=xfer1 $dest:/tmp/ & As long as you're using at least 2.6.3, batch mode should work quite well (older versions used an experimental implementation that is not recommended). *One other thing you might try to create the batch file quickly using a local copy on a machine that doesn't have a second physical disk: use the option --only-write-batch instead of --write-batch. This would use an extra copy of the destination files somewhere on the sending system, but not update it right away, which _might_ save some elapsed time in the creating of the batch file (you'd have to try it and see). Then, at the end of the updating of the remote systems, you would use the batch file to update your local destination mirror. ..wayne..
Possibly Parallel Threads
- open/stat64 syscalls run faster on Xen VM than standard Linux
- [LLVMdev] choice between SSAPRE and bitvector aporach
- [LLVMdev] choice between SSAPRE and bitvector aporach
- [LLVMdev] choice between SSAPRE and bitvector aporach
- [LLVMdev] choice between SSAPRE and bitvector aporach