Hi All,
I was hoping someone could help me figure out what['s going on here...
I have a server that I'm using to backup a lot of files to, and I'm
using rsync to back them up.
The backup server runs CentOS 5.4 Linux:
# uname -a
Linux slurp.kilokluster.ucsc.edu 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17
11:30:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
and so do the servers the rsync is pulling from. Basically, this server
is rsyncing from 6 other servers that each have a read-only rsync daemon
running on them. I'm running about 80 rsyncs (to break up the file
sets), about 6 in parallel (and they rotate through the 80). I'm
transferring about 160 million files, in about 75TB.
The backup server has 48GB RAM, and during the rsyncs about 30GB get
used at any given time, but not more than that. The server has 16 3GHz
cores and doesn't even break a sweat.
All my rsyncs are called via simple bash scripts in the format:
#!/bin/bash
/usr/bin/rsync -a --delete rsync://encodek-0-4/data/genomes/[q-u]*
/hive/data/genomes/
/usr/bin/rsync -a --delete rsync://encodek-0-4/data/genomes/[A-Z]*
/hive/data/genomes/
etc...
And while many of them work, as time moves on I see some fail with this:
rsync error: timeout in data send/receive (code 30) at io.c(200)
[sender=3.0.5]
rsync: connection unexpectedly closed (86980639 bytes received so far)
[receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600)
[receiver=3.0.5]
As you can see, both sender and receiver are version 3.0.5. I read the
docs and it seems to indicate that by default there is no timeout...?
So the error messages are confusing....
Am I trying to rsync too many files? I've restarted the rsyncs several
times, so now it ends up counting through millions of files before it
actually gets to a point where is has new data to copy. But I suspect
it counts files while transferring nothing for up to 5-10 hours, or
more. Could that cause a timeout?
Anyone ever seen something like this?