Matt Olson
2003-Oct-22 11:25 UTC
Feature Request - Recursive Rsync Parameter - Example Script
I wanted to flag a problem and offer a possible solution. The problem: large rsync operation fails on machines with modest amounts of memory. Proposal: Add a parameter to rsync recursive to specify a recursion level (See example bash wrapper below). (works with recursive file system rsyncs only, i.e. -a or -r) The logic goes: if recursion switch true and recursion_level > 0 -rsync this directory only -call rsync for each subdirectory with a decremented recursion_level and pass the same switches along else (recursion_level really is 0) -perform the full rsync (from this level) What this does is help break up the job into smaller pieces. Otherwise rsync can consume hundreds of megabyte of memory attempting to perform a single operation. In this scenario you'll see one rsync process for each level of recursion. Here's and example bash script that is an attempt at this idea: (it supports the -n options so you can see the calls it makes) My bash scripting skills need some work, but, you get the idea. If someone wants to further develop this script, feel free. Cheers. #!/bin/bash # copyright 2003 Matt Olson, Kavi Corporation (molson@kavi.com) # Licence: General Public License # set our environment IFS=$'\n' # This keeps bash from breaking up file names with space in them. if [ ! -n "$1" ] then echo "Usage: `basename $0` <recursion_level> \"quoted_rsync_parameters\" <source path> <destination_path>" echo "Note: argument parsing is order dependent." exit 1 fi # Some debugging help, fifth parm will echo args if [ -n "$5" -a "$5" = "args" ] then for arg in "$@" do echo "Arg #$index = $arg" done fi # Assign parameters to some variables. r_level=$1 rsync_options=$2 source_path_parm=$3 dest_path_parm=$4 rsync_no_r_options=`echo $rsync_options | sed -e "s/r//" | sed -e "s/a/lptgoD/"` # Let's support the rsync test mode. test_run=`echo $rsync_options | grep n` # We need to decide if the source is a remote host # Parse out the <source path> and if it is remote, capture the hostname if [ `echo $source_path_parm | grep ":"` ] then remote_source_host=${source_path_parm%:*} remote_source_path=${source_path_parm#*:} fi # We need to also decide if the destination is a remote host. # Parse out the <source path> and if it is remote, capture the hostname if [ `echo $dest_path_parm | grep ":"` ] then remote_dest_host=${dest_path_parm%:*} remote_dest_path=${dest_path_parm#*:} fi # At this point we need to see if there are additional directories to # call with lrsync, as long as our recursion level is > 0. # To build a list of targets, we need to determine if the host is remote. if [ $remote_source_host ] then # If host is remote, get a file list via rsh directory_object=`rsh $remote_source_host ls -1p $remote_source_path | grep /` else # If host is local, get a file list directory_object=`ls -1p $source_path_parm | grep /` fi if [ $test_run ] then echo "lrsync: directory_object: $directory_object" fi # With these results walk through list returned and call rsync/lrsync # Testing the recursion level. # At this point if we are at recursion level 0 then do some rsyncs. # If not at r_level 0 then call lrsync with a decremented r_level. # If no additional directory objects to recurse, do the rsync. if [ $1 = 0 ] || [ -z "$directory_object" ] then # Do rsync(s) # If this is a test run, echo some extra info. if [ $test_run ] then echo "lrsync: rsync $rsync_options $source_path_parm $dest_path_parm" rsync $rsync_options $source_path_parm $dest_path_parm else rsync $rsync_options $source_path_parm $dest_path_parm fi else # Do lrsync(s) # Set some variables next_r_level=$(($r_level - 1)) # Next we have to rsync the top level files in the directory we are going to recurse. if [ $test_run ] then echo "lrsync: rsync $rsync_no_r_options $source_path_parm/* $dest_path_parm/." rsync $rsync_no_r_options $source_path_parm/* $dest_path_parm/. else rsync $rsync_no_r_options $source_path_parm/* $dest_path_parm/. fi # Walk through the directories at this level. for file_or_dir in $directory_object do if [ $remote_dest_host ] then if [ $test_run ] then echo "lrsync: rsh $remote_dest_host mkdir $dest_path_parm/$file_or_dir" else # If host is remote, make directory on remote host via rsh rsh $remote_dest_host mkdir $dest_path_parm/$file_or_dir fi else if [ $test_run ] then echo "lrsync: mkdir $dest_path_parm/$file_or_dir" else # If host is local, make the directory mkdir $dest_path_parm/$file_or_dir fi fi # If this is a test run, echo some extra info. if [ $test_run ] then echo "lrsync: lrsync $next_r_level $rsync_options '$source_path_parm/$file_or_dir' '$dest_path_parm/$file_or_dir'" lrsync $next_r_level $rsync_options $source_path_parm/$file_or_dir $dest_path_parm/$file_or_dir $5 else lrsync $next_r_level $rsync_options $source_path_parm/$file_or_dir $dest_path_parm/$file_or_dir $5 fi done unset IFS # Just doing the right thing. fi -- Matt Olson Platform Engineer Kavi Corporation Phone 503.813.9383 e-mail molson@kavi.com
jw schultz
2003-Oct-22 11:46 UTC
Feature Request - Recursive Rsync Parameter - Example Script
On Tue, Oct 21, 2003 at 06:25:51PM -0700, Matt Olson wrote:> I wanted to flag a problem and offer a possible solution. > > The problem: > > large rsync operation fails on machines with modest amounts of memory. > > Proposal: > > Add a parameter to rsync recursive to specify a recursion level (See > example bash wrapper below). (works with recursive file system rsyncs > only, i.e. -a or -r) The logic goes:Limiting the depth of recursion is already supported just not intuitive. rsync -r --exclude='/*/*/*/' Your idea for a shell script to automate picking up the lower levels is good and could compose the --exclude pattern. The next step would be to set the job partition level based on path count as in "find $subtree -print|wc -l".> if recursion switch true and recursion_level > 0 > > -rsync this directory only > -call rsync for each subdirectory with a decremented recursion_level and > pass the same switches along > > else (recursion_level really is 0) > > -perform the full rsync (from this level) > > What this does is help break up the job into smaller pieces. Otherwise > rsync can consume hundreds of megabyte of memory attempting to perform a > single operation. In this scenario you'll see one rsync process for each > level of recursion. > > Here's and example bash script that is an attempt at this idea: (it > supports the -n options so you can see the calls it makes) > > My bash scripting skills need some work, but, you get the idea. If > someone wants to further develop this script, feel free. > > Cheers. > > > #!/bin/bash[snip]