-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello! I'm using RSync to mirror from a server (gandalf) to a backupserver (dwalin). RSync does just fit my needs for what I want it to do and I would like to thank your for your work on RSync. Nevertheless I do have a problem which actually comes up from time to time. The syncing of each partition is done by rsync calls from cron.daily. This is one of these calls: rsync -a -v -P --delete --numeric-ids --rsh="ssh" root@gandalf:/images/ \ /backup/images &> /var/log/rsync/backup_gandalf_images The other calls are similar and differ just in their pathes. Actually this last call sometimes hangs up. This means RSync does not terminate with error nor does it without error. The RSync process is still 'running' but there is no file transport anymore. Within the logfile it shows up at one last line like this: cytometer/C12883-98/fisher/C12883-98_FEU_0073_0000_063x_MASK.tif 1247804 100% 1.29MB/s 0:00:00 The file and directory within which this happens differs from time to time. Depending on the number of files that need to be transfered (about 300k files on that partition and about 90k files to be synced) I can kill, restart, kill and restart again and again until RSync is finished, but this is not nice :(. While RSync did hang up I got this output from free: dwalin (Debian GNU/Linux, RSync 2.5.6 pv 26): total used free shared buffers cached Mem: 127496 123364 4132 0 16772 44104 - -/+ buffers/cache: 62488 65008 Swap: 497972 107600 390372 gandalf (SuSE GNU/Linux, RSync 2.5.6 pv 26): total used free shared buffers cached Mem: 256292 251052 5240 0 58360 113760 - -/+ buffers/cache: 78932 177360 Swap: 995988 44560 951428 So both systems have still left swapspace. I've read through the RSync man-pages and FAQ on the net. There it has been suggested to play around with some options like blocking and non-blocking-io and bwlimit. This didn't change anything. I furthermore did leave RSync alone a whole weekend, but no further files have been processed. Finally I did an strace run of RSync with: strace -f -ff -o strace.out rsync -a -v -P --delete --numeric-ids \ --rsh="ssh" root@gandalf:/images/ /backup/images/ \ &> /var/log/rsync/backup_gandalf_images This outputs are sized about 300MB all together and do contain the systemcalls from start up to my final kill of RSync. Within this output the following is repeated again and again at the end: select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) I actually do have no idea what causes this Timeout. If it yould be of any value I even can offer the strace output-files as download or add any other information you may find usefull here. Do you have any suggestion what I can do to find the problem? Is there any other internet source I've missed and which I should read to get rid of this problem? Thanks in advance - -- Andre Bell <andre.bell@gmx.de> PGP-Public-Key: http://www.andre-bell.de/public_key.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/T02RnuHMhboRh6QRAlhoAJ4t2wD4RnWCcY4MTYYPY1+ZNyEUVwCeLJ1b 7luF+vxxwim0CbFaBl4F9/s=/qyu -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello! It is about a month ago that I came up here with my problem that rsync stops processing from time to time. Meanwhile I investigated some more time on this problem and increased the verbosity of rsync. Calling rsync with rsync -a -vv -P --delete --numeric-ids --rsh="ssh" root@gandalf:/images/ \ /backup/images/ or even -vvv did allways end up with rsync consuming no more cpu time and the following last output lines (or quite similiar): <--output begins--> match at 1241645 last_match=1240991 j=374 len=700 n=654 match at 1242345 last_match=1242345 j=375 len=700 n=0 match at 1243965 last_match=1243045 j=573 len=700 n=920 match at 1244665 last_match=1244665 j=1 len=700 n=0 match at 1246665 last_match=1245365 j=1437 len=700 n=1300 match at 1247400 last_match=1247365 j=1782 len=404 n=35 1247804 100% 440.23kB/s 0:00:02 done hash search sending file_sum got file_sum renaming .C8760-02_FEU_0378_0000_063x_MASK.tif.vsuS6T to fisher/C8760-02_FEU_0378_0000_063x_MASK.tif set modtime of fisher/C8760-02_FEU_0378_0000_063x_MASK.tif to (1065079671) Thu Oct 2 09:27:51 2003 redoing fisher/C8760-02_FEU_0378_0000_063x_MASK.tif(1850) <--output ends--> This last file is allways a different one. This rsync process did not run out of memory nor does the disk or anything else as I posted in my first mail. Increasing the verbosity to -vvvv causes rsync to repeat something like potential match at 708123 target=514 1281 sum=bfb2ff16 potential match at 708123 target=515 1629 sum=bfb2ff16 potential match at 708124 target=1672 235 sum=c185ff17 potential match at 708124 target=1673 364 sum=c185ff17 potential match at 708124 target=1674 1360 sum=c185ff17 and stop with some of these lines. (no further output is produced anymore and this is a tail of such a logfile) I did play around with everything I found within the man page and FAQ about rsync hangups, but it did not help. Has anyone any idea what I can do to find out the reason for this problem? Thanks in advance - -- Andre Alexander Bell <andre.bell@gmx.de> PGP-Public-Key: http://www.andre-bell.de/public_key.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/gbKHnuHMhboRh6QRAtYYAKCs6lnqAZndux9ZIzOH1cAnUAuh8ACgxfgq 7b8o6iiIGiTlp37xUnr3TLU=j2ND -----END PGP SIGNATURE-----
On Fri, Aug 29, 2003 at 02:56:47PM +0200, Andre Alexander Bell wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello! > > I'm using RSync to mirror from a server (gandalf) to a backupserver (dwalin). > RSync does just fit my needs for what I want it to do and I would like to > thank your for your work on RSync. Nevertheless I do have a problem which > actually comes up from time to time. > The syncing of each partition is done by rsync calls from cron.daily. > This is one of these calls: > > rsync -a -v -P --delete --numeric-ids --rsh="ssh" root@gandalf:/images/ \ > /backup/images &> /var/log/rsync/backup_gandalf_images > > The other calls are similar and differ just in their pathes. Actually this > last call sometimes hangs up. This means RSync does not terminate with error > nor does it without error. The RSync process is still 'running' but there is > no file transport anymore. Within the logfile it shows up at one last line > like this: > > cytometer/C12883-98/fisher/C12883-98_FEU_0073_0000_063x_MASK.tif > 1247804 100% 1.29MB/s 0:00:00 > > The file and directory within which this happens differs from time to time. > Depending on the number of files that need to be transfered (about 300k files > on that partition and about 90k files to be synced) I can kill, restart, kill > and restart again and again until RSync is finished, but this is not nice :(. > While RSync did hang up I got this output from free: > > dwalin (Debian GNU/Linux, RSync 2.5.6 pv 26): > total used free shared buffers cached > Mem: 127496 123364 4132 0 16772 44104 > - -/+ buffers/cache: 62488 65008 > Swap: 497972 107600 390372 > > gandalf (SuSE GNU/Linux, RSync 2.5.6 pv 26): > total used free shared buffers cached > Mem: 256292 251052 5240 0 58360 113760 > - -/+ buffers/cache: 78932 177360 > Swap: 995988 44560 951428 > > So both systems have still left swapspace. > I've read through the RSync man-pages and FAQ on the net. There it has been > suggested to play around with some options like blocking and non-blocking-io > and bwlimit. This didn't change anything. > I furthermore did leave RSync alone a whole weekend, but no further files have > been processed. > Finally I did an strace run of RSync with: > > strace -f -ff -o strace.out rsync -a -v -P --delete --numeric-ids \ > --rsh="ssh" root@gandalf:/images/ /backup/images/ \ > &> /var/log/rsync/backup_gandalf_images > > This outputs are sized about 300MB all together and do contain the systemcalls > from start up to my final kill of RSync. > Within this output the following is repeated again and again at the end: > > select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) > select(8, [7], [4], NULL, {60, 0}) = 0 (Timeout) > > I actually do have no idea what causes this Timeout.The timeout is caused by the fact that select was called with a timeout argument. That process is in a loop waiting for data from the pipe. You need to look at the process on the other end of the pipe.> If it yould be of any value I even can offer the strace output-files as > download or add any other information you may find usefull here. > Do you have any suggestion what I can do to find the problem? > Is there any other internet source I've missed and which I should read to get > rid of this problem? > > Thanks in advance > > - -- > Andre Bell <andre.bell@gmx.de> > PGP-Public-Key: http://www.andre-bell.de/public_key.asc > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/T02RnuHMhboRh6QRAlhoAJ4t2wD4RnWCcY4MTYYPY1+ZNyEUVwCeLJ1b > 7luF+vxxwim0CbFaBl4F9/s> =/qyu > -----END PGP SIGNATURE----- > > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >-- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt