Sure. Here's what I sent to Lennie. I'm ccing the list in case
it's
raised more questions.
Here's the splitter program. Yes, it's a recursive shell script.
However,
the only active processes in the process tree will be the tips of the
tree, and the bigger processes (sort, wc) will exit before children are
called., so it won't suck up all your resources.
It takes a filename and a number of items to limit the named subdirs to
containing.
The file is generated by find, with relative paths, naming the directory.
for instance, to split /www
cd /
find www -print >/tmp/listfile
splitter /tmp/listfile >modulelist
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh
limit=$1
file=$2
splitdir(){
dir=$1
pathlength=`echo $dir |tr / ' '|wc -w`
pathlength=`echo $pathlength`
searchpat="^$dir/"
[ "$searchpat" = "^/" ] && searchpat='^'
grep $searchpat $file |
cut -d/ -f1-`expr $pathlength + 1` |
uniq -c |
while read dircount subdir
do
if [ "$dircount" -le "$limit" ]
then
echo $subdir
else
(splitdir $subdir) </dev/null
fi
done
}
splitdir
++++++++++++++++++++++++++++++++++++++++++++++
In my application, I actually run the find directly on the NAS box, and
process the file with a much faster SunOS box.
Here's the script run on the NAS box:
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh
#200105something tcc wrote to get list of everything we push out on
toolservers.
#the output is used as input to
#ToolSyncMakeModules, which gives back directories directories with no
more than some limit (50000 seems to
#work well) subcomponents (files AND directories, combined) for efficient
rsync. Some of these "modules"
#may in fact, be files. rsync -a gets them. Important note!: Must rsync
module to parent directory
#destination, not module/* to module name. If module is a symlink, you
copy everything in under link name,
#which is probably not what you want, and with the ensuredir function, the
link will be created as a dir,
#causing the synced files to actually exist seperately in linked locations
#As a low-cost safety, have added lines to fix ownerships. Done locally
on toolserver, low performance cost.
basedir=/mnt/vol1
workdir=$basedir/ToolSyncModules
chgrp=/usr/bin/chgrp
chmod=/bin/chmod
chown=/usr/sbin/chown
date=/bin/date
echo=/bin/echo
find=/usr/bin/find
mv=/bin/mv
rm=/bin/rm
listfile=$workdir/Module.List
stagelistfile=$workdir/Module.List.stage
logfile=$workdir/makefull.log
if cd $basedir 2>/dev/null
then
{
#for time purposes
$date
#$find big big1 -print > $stagelistfile
$find big/*/* big1/*/* -print > $stagelistfile
rm $listfile 2>/dev/null
mv $stagelistfile $listfile
chmod ugo+r $listfile
#just fix the ownerships, in case. These are the numeric ids of user and
group Tools
$chown -R 24 big big1 &
$chgrp -R 70 big big1 &
#for time purposes
$date
} >$logfile 2>&1
else
$echo "Can't cd to $basedir to create ToolSync raw file list"
fi
++++++++++++++++++++++++++++++++++++++++++++++
Here's the script run by the SunOS box
++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh
#define directories used
basedir=/wan/sjt-tools-master1/sjt-tools-master1
listdir=$basedir/ToolSyncModules
moduledir=$listdir/ToolSyncModules
workdir=/tmp
tee=/bin/tee
#define scripts used - non-standard locations
splitter=$listdir/ToolSyncSplitModules
#define files used
listfile=$listdir/Module.List
logfile=$listdir/split.log
worklistfile=$workdir/ToolSyncModule.list
modulefile=$moduledir/ToolSyncModuleList
stagemodulefile=$listdir/ToolSyncModuleList
umask 022
{
#for timetrial purposes
date
#put it on a fast filesystem for reading
cp $listfile $worklistfile
#and write it to a slow filesystem (very little writing... read 1.5Mlines,
write 500)
$splitter 500000 $worklistfile > $stagemodulefile
#flash over so the file is either there or not, never part.
#All the data is there, and we just change the name.
rm $modulefile 2>/dev/null
mv $stagemodulefile $modulefile
#clean up after ourselves
rm $worklistfile
#for timetrial purposes
date
} > $logfile 2>&1
++++++++++++++++++++++++++++++++++++++++++++++
I have actually quit using rsync for the full syncronization, and have
written a pair of scripts to use find, sort, gzip, diff, and tar. The
basic idea took about 10 minutes to write, and worked, but I then
optimized it to take advantage of our specific conditions. and to add
integrety/safety measures (prevent catastrophic deletions).
I'd run rsync for 4 days, beat the hell out of the network, and have it
die incomplete. Now, I can finish it in 3 hours (for a nop... more time
to copy over changes). You might want to consider a script-based
solution if you're short on cpu/ram, or are working over NFS.
Good luck.
Tim Conway
tim.conway@philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn,
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970),
".\n" '
"There are some who call me.... Tim?"
Lenny Foner <foner@media.mit.edu>
05/17/2002 02:10 PM
To: Tim Conway/LMT/SC/PHILIPS@AMEC
cc: foner@media.mit.edu
Subject: Rsync dies
Classification:
Date: Fri, 17 May 2002 11:43:55 -0600
From: tim.conway@philips.com
I have some code that can be used to analyze your system before the
sync,
and choose directories containing no more than a maximum number of
items
below them. Iterating through the list and using -R can let you get
the
whole thing run, though --delete and -H become less certain (not
dangerous, but if you don't name anything containing a deleted
directory
because it didn't come up on your list, youll never tell the
destination
to delete it, and if you have two hard links to the same file, but hit
them in two seperate runs, you now have two copies on disk).
Let me know if you want it. I'm sure you can figure how to modify it
for
your environment.
I'd be interested in this. Tnx.
Tim Conway
tim.conway@philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn,
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970),
".\n" '
"There are some who call me.... Tim?"
"Jurrie Overgoor" <jurr@tref.nl>
05/21/2002 12:30 PM
Please respond to "Jurrie Overgoor"
To: Tim Conway/LMT/SC/PHILIPS@AMEC
cc:
Subject: Re: Rsync dies
Classification:
Well, I'm not the one that started the tread, but I am interessed in your
code non the less. Could you please mail it to me? Thanks in advance,
Greetz -- Jurrie
jurr@tref.nl
----- Original Message -----
From: <tim.conway@philips.com>
To: "C.Zimmermann" <clemens@prz.tu-berlin.de>
Cc: <rsync@lists.samba.org>; <rsync-admin@lists.samba.org>
Sent: Friday, May 17, 2002 7:43 PM
Subject: Re: Rsync dies
Yeah. You'll have to find a way to break the job up into smaller pieces.
It's a pain, but I have a similar situation - 3M+ files in 130+Gb. I
can't get the whole thing in one chunk, no matter how fast a server with
however much memory, even on Gb ethernet (for the server). In my case,
the filesystem is on NAS, and the NAS has only 100bT simplex (half-duplex,
to some).
I have some code that can be used to analyze your system before the sync,
and choose directories containing no more than a maximum number of items
below them. Iterating through the list and using -R can let you get the
whole thing run, though --delete and -H become less certain (not
dangerous, but if you don't name anything containing a deleted directory
because it didn't come up on your list, youll never tell the destination
to delete it, and if you have two hard links to the same file, but hit
them in two seperate runs, you now have two copies on disk).
Let me know if you want it. I'm sure you can figure how to modify it for
your environment.
Tim Conway
tim.conway@philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn,
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970),
".\n" '
"There are some who call me.... Tim?"
"C.Zimmermann" <clemens@prz.tu-berlin.de>
Sent by: rsync-admin@lists.samba.org
05/17/2002 02:08 AM
To: <rsync@lists.samba.org>
cc: (bcc: Tim Conway/LMT/SC/PHILIPS)
Subject: Rsync dies
Classification:
I?m trying to rsync a 210 GB Filesystem with approx 1.500.000 Files.
Rsync always dies after about 29 GB without any error messages.
I?m Using rsync version 2.5.5 protocol version 26.
Has anyone an idea ?
Thank?s Clemens
--
To unsubscribe or change options:
http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
--
To unsubscribe or change options:
http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html