Hello, I've had some problems using rsync to transfer directories with more than 3 million files. Here's the error message from rsync: <snip> ERROR: out of memory in receive_file_entry rsync error: error allocating core memory buffers (code 22) at util.c(116) rsync: connection unexpectedly closed (63453387 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(342) </snip> I'm doing a pull on a linux system fom the HP-UX system that actually houses the data. Both using rsync-2.6.2. The one solution i've come up with isn't pretty, but seems to work. Basically, I wrote a shell function that runs an rsync process for each subdirectory if necessary. I'm using "find | wc -l" to count the number of files in the source path and then calling the function again for each subdirectory if the number is more than 2mil. Perhaps recursion is a bad idea? It's the only way I could think of to catch the case where all the files exist in a single directory several levels below the top. Anyways, i'll take the function out of my script and paste it here but it may not work right taken out of context. YMMV. <snip> function DC { # "Divide and Conquer" # evil hack to get around rsync's 3mil file limit # Accepts two arguments as the srcpath and dstpath. Will not work # if srcpath is local. srchost=`echo $1 | awk -F: '{ print $1 }'` srcdir=`echo $1 | awk -F: '{ print $2 }'` num_files=`ssh root@$srchost "find $srcdir | wc -l"` if [ $((num_files)) -gt 2000000 ] then echo "WARNING! file count greater than 2mil, recursing into subdirs." for file in `ssh root@$srchost "ls $srcdir"` do dstpath=`echo $2/$file` DC $1/$file $dstpath done else rsync $rsync_opts $1 $2 fi } </snip> Comments? Better ideas? -- James Bagley | CDI Innovantage jabagley@cvs.agilent.com | Technical Computing UNIX Admin Support DON'T PANIC | Agilent Technologies IT --
On Tue, Aug 10, 2004 at 11:36:26AM -0700, James Bagley Jr wrote:> Hello, > > I've had some problems using rsync to transfer directories with more than > 3 million files. Here's the error message from rsync: > > ERROR: out of memory in receive_file_entry > rsync error: error allocating core memory buffers (code 22) at util.c(116)I've rsync'ed 7413719 files successfully (2.6.2 on RH7.2 server, Fermi SL3.0.1 client), 680GB. When I do the top level directory all at once, there are several points where it locks up the server for minutes at a time (directories with large #'s of files, it seems, and I suppose it's an ext3 issue). The server side hit 1GB of memory near its peak, and the client side hits 540MB of memory. Ick. At least when I upgraded to 2.6.2, it was possible to do this at all (compared with the version provided by RedHat's RPMs). For future sanity, I'm subdividing the top level directory into several discrete rsyncs on subdirectories. I like your idea in general (though I agree it's ugly) for dynamicly addressing this issue, but for now I can afford the luxury of manually subdividing the tree. Regards, Dan W. Better ideas? No. However, my suggestion would be to run a nightly script on the *server* side (if you have access) which counts files, and puts tallies in selected higher-level directories. So, e.g. /.filecount would have # of files in /tmp, /usr, /var, etc. /usr/local/src/.filecount would have # of files in all its subdirs. This prevents you from ssh'ing in and find'ing so many times. Depending on how dynamic your disk utilization is, you could just make this a weekly or monthly analysis.
Hi, On Tue, 10 Aug 2004, James Bagley Jr wrote:> I've had some problems using rsync to transfer directories with more than > 3 million files. Here's the error message from rsync: > > <snip> > ERROR: out of memory in receive_file_entry > rsync error: error allocating core memory buffers (code 22) at util.c(116) > rsync: connection unexpectedly closed (63453387 bytes read so far) > rsync error: error in rsync protocol data stream (code 12) at io.c(342) > </snip> > > I'm doing a pull on a linux system fom the HP-UX system that actually > houses the data. Both using rsync-2.6.2. The one solution i've come up > with isn't pretty, but seems to work.How is your RAM/swap situation on the linux side? I am rsyncing the 4 million files of ftp.gwdg.de to a backup server each night, one-shot, no problems. It just needs 7 or more hours... rsync[28526] (receiver) heap statistics: arena: 233472 (bytes from sbrk) ordblks: 9 (chunks not in use) smblks: 3 hblks: 1282 (chunks from mmap) hblkhd: 357650432 (bytes from mmap) allmem: 357883904 (bytes from sbrk + mmap) usmblks: 0 fsmblks: 96 uordblks: 71072 (bytes used) fordblks: 162400 (bytes free) keepcost: 135048 (bytes in releasable chunk) Number of files: 4099094 Number of files transferred: 27160 Total file size: 1568423288614 bytes Total transferred file size: 16916178416 bytes Literal data: 10758972422 bytes Matched data: 6158385953 bytes File list size: 119674315 Total bytes written: 11990441 Total bytes read: 10885842335 wrote 11990441 bytes read 10885842335 bytes 410271.35 bytes/sec total size is 1568423288614 speedup is 143.92 rsync[26637] (generator) heap statistics: arena: 233472 (bytes from sbrk) ordblks: 8 (chunks not in use) smblks: 3 hblks: 1285 (chunks from mmap) hblkhd: 357924864 (bytes from mmap) allmem: 358158336 (bytes from sbrk + mmap) usmblks: 0 fsmblks: 96 uordblks: 90760 (bytes used) fordblks: 142712 (bytes free) keepcost: 135048 (bytes in releasable chunk) ==== end ======== RC=0 ============ 040810.0100 040810.0822 Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)