-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello!
I'm using RSync to mirror from a server (gandalf) to a backupserver
(dwalin).
RSync does just fit my needs for what I want it to do and I would like to
thank your for your work on RSync. Nevertheless I do have a problem which
actually comes up from time to time.
The syncing of each partition is done by rsync calls from cron.daily.
This is one of these calls:
rsync -a -v -P --delete --numeric-ids --rsh="ssh"
root@gandalf:/images/ \
/backup/images &> /var/log/rsync/backup_gandalf_images
The other calls are similar and differ just in their pathes. Actually this
last call sometimes hangs up. This means RSync does not terminate with error
nor does it without error. The RSync process is still 'running' but
there is
no file transport anymore. Within the logfile it shows up at one last line
like this:
cytometer/C12883-98/fisher/C12883-98_FEU_0073_0000_063x_MASK.tif
1247804 100% 1.29MB/s 0:00:00
The file and directory within which this happens differs from time to time.
Depending on the number of files that need to be transfered (about 300k files
on that partition and about 90k files to be synced) I can kill, restart, kill
and restart again and again until RSync is finished, but this is not nice :(.
While RSync did hang up I got this output from free:
dwalin (Debian GNU/Linux, RSync 2.5.6 pv 26):
total used free shared buffers cached
Mem: 127496 123364 4132 0 16772 44104
- -/+ buffers/cache: 62488 65008
Swap: 497972 107600 390372
gandalf (SuSE GNU/Linux, RSync 2.5.6 pv 26):
total used free shared buffers cached
Mem: 256292 251052 5240 0 58360 113760
- -/+ buffers/cache: 78932 177360
Swap: 995988 44560 951428
So both systems have still left swapspace.
I've read through the RSync man-pages and FAQ on the net. There it has been
suggested to play around with some options like blocking and non-blocking-io
and bwlimit. This didn't change anything.
I furthermore did leave RSync alone a whole weekend, but no further files have
been processed.
Finally I did an strace run of RSync with:
strace -f -ff -o strace.out rsync -a -v -P --delete --numeric-ids \
--rsh="ssh" root@gandalf:/images/ /backup/images/ \
&> /var/log/rsync/backup_gandalf_images
This outputs are sized about 300MB all together and do contain the systemcalls
from start up to my final kill of RSync.
Within this output the following is repeated again and again at the end:
select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout)
select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout)
I actually do have no idea what causes this Timeout.
If it yould be of any value I even can offer the strace output-files as
download or add any other information you may find usefull here.
Do you have any suggestion what I can do to find the problem?
Is there any other internet source I've missed and which I should read to
get
rid of this problem?
Thanks in advance
- --
Andre Bell <andre.bell@gmx.de>
PGP-Public-Key: http://www.andre-bell.de/public_key.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/T02RnuHMhboRh6QRAlhoAJ4t2wD4RnWCcY4MTYYPY1+ZNyEUVwCeLJ1b
7luF+vxxwim0CbFaBl4F9/s=/qyu
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello!
It is about a month ago that I came up here with my problem that rsync stops
processing from time to time. Meanwhile I investigated some more time on this
problem and increased the verbosity of rsync.
Calling rsync with
rsync -a -vv -P --delete --numeric-ids --rsh="ssh"
root@gandalf:/images/ \
/backup/images/
or even -vvv did allways end up with rsync consuming no more cpu time and the
following last output lines (or quite similiar):
<--output begins-->
match at 1241645 last_match=1240991 j=374 len=700 n=654
match at 1242345 last_match=1242345 j=375 len=700 n=0
match at 1243965 last_match=1243045 j=573 len=700 n=920
match at 1244665 last_match=1244665 j=1 len=700 n=0
match at 1246665 last_match=1245365 j=1437 len=700 n=1300
match at 1247400 last_match=1247365 j=1782 len=404 n=35
1247804 100% 440.23kB/s 0:00:02
done hash search
sending file_sum
got file_sum
renaming .C8760-02_FEU_0378_0000_063x_MASK.tif.vsuS6T to
fisher/C8760-02_FEU_0378_0000_063x_MASK.tif
set modtime of fisher/C8760-02_FEU_0378_0000_063x_MASK.tif to (1065079671) Thu
Oct 2 09:27:51 2003
redoing fisher/C8760-02_FEU_0378_0000_063x_MASK.tif(1850)
<--output ends-->
This last file is allways a different one. This rsync process did not run out
of memory nor does the disk or anything else as I posted in my first mail.
Increasing the verbosity to -vvvv causes rsync to repeat something like
potential match at 708123 target=514 1281 sum=bfb2ff16
potential match at 708123 target=515 1629 sum=bfb2ff16
potential match at 708124 target=1672 235 sum=c185ff17
potential match at 708124 target=1673 364 sum=c185ff17
potential match at 708124 target=1674 1360 sum=c185ff17
and stop with some of these lines. (no further output is produced anymore and
this is a tail of such a logfile)
I did play around with everything I found within the man page and FAQ about
rsync hangups, but it did not help.
Has anyone any idea what I can do to find out the reason for this problem?
Thanks in advance
- --
Andre Alexander Bell <andre.bell@gmx.de>
PGP-Public-Key: http://www.andre-bell.de/public_key.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/gbKHnuHMhboRh6QRAtYYAKCs6lnqAZndux9ZIzOH1cAnUAuh8ACgxfgq
7b8o6iiIGiTlp37xUnr3TLU=j2ND
-----END PGP SIGNATURE-----
On Fri, Aug 29, 2003 at 02:56:47PM +0200, Andre Alexander Bell wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello! > > I'm using RSync to mirror from a server (gandalf) to a backupserver (dwalin). > RSync does just fit my needs for what I want it to do and I would like to > thank your for your work on RSync. Nevertheless I do have a problem which > actually comes up from time to time. > The syncing of each partition is done by rsync calls from cron.daily. > This is one of these calls: > > rsync -a -v -P --delete --numeric-ids --rsh="ssh" root@gandalf:/images/ \ > /backup/images &> /var/log/rsync/backup_gandalf_images > > The other calls are similar and differ just in their pathes. Actually this > last call sometimes hangs up. This means RSync does not terminate with error > nor does it without error. The RSync process is still 'running' but there is > no file transport anymore. Within the logfile it shows up at one last line > like this: > > cytometer/C12883-98/fisher/C12883-98_FEU_0073_0000_063x_MASK.tif > 1247804 100% 1.29MB/s 0:00:00 > > The file and directory within which this happens differs from time to time. > Depending on the number of files that need to be transfered (about 300k files > on that partition and about 90k files to be synced) I can kill, restart, kill > and restart again and again until RSync is finished, but this is not nice :(. > While RSync did hang up I got this output from free: > > dwalin (Debian GNU/Linux, RSync 2.5.6 pv 26): > total used free shared buffers cached > Mem: 127496 123364 4132 0 16772 44104 > - -/+ buffers/cache: 62488 65008 > Swap: 497972 107600 390372 > > gandalf (SuSE GNU/Linux, RSync 2.5.6 pv 26): > total used free shared buffers cached > Mem: 256292 251052 5240 0 58360 113760 > - -/+ buffers/cache: 78932 177360 > Swap: 995988 44560 951428 > > So both systems have still left swapspace. > I've read through the RSync man-pages and FAQ on the net. There it has been > suggested to play around with some options like blocking and non-blocking-io > and bwlimit. This didn't change anything. > I furthermore did leave RSync alone a whole weekend, but no further files have > been processed. > Finally I did an strace run of RSync with: > > strace -f -ff -o strace.out rsync -a -v -P --delete --numeric-ids \ > --rsh="ssh" root@gandalf:/images/ /backup/images/ \ > &> /var/log/rsync/backup_gandalf_images > > This outputs are sized about 300MB all together and do contain the systemcalls > from start up to my final kill of RSync. > Within this output the following is repeated again and again at the end: > > select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) > select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) > > I actually do have no idea what causes this Timeout.The timeout is caused by the fact that select was called with a timeout argument. That process is in a loop waiting for data from the pipe. You need to look at the process on the other end of the pipe.> If it yould be of any value I even can offer the strace output-files as > download or add any other information you may find usefull here. > Do you have any suggestion what I can do to find the problem? > Is there any other internet source I've missed and which I should read to get > rid of this problem? > > Thanks in advance > > - -- > Andre Bell <andre.bell@gmx.de> > PGP-Public-Key: http://www.andre-bell.de/public_key.asc > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/T02RnuHMhboRh6QRAlhoAJ4t2wD4RnWCcY4MTYYPY1+ZNyEUVwCeLJ1b > 7luF+vxxwim0CbFaBl4F9/s> =/qyu > -----END PGP SIGNATURE----- > > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >-- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt