Have you tried running rsync 2.5.5 on the local end? We have seen
cases where VPNs have been at fault. Rsync is very hard on TCP
implementations. Also, verbose mode has been known to cause problems,
especially in older versions, you might try it without that.
- Dave Dykstra
On Tue, Jun 18, 2002 at 10:42:48AM -0600, Melvin, Lee
wrote:> I've got an rsync job which is consistently failing, but I've been
> unable to diagnose the problem. FAQ/Google/docs/etc. checked and
> no luck.
>
> Basically, it looks like the rsync process invoked on the far end
> is exiting, and then the local process waits until the timeout and
> exits.
>
> Both systems are Sun boxes, Ultra 10 or better with 256+ MB of memory.
> Rsync version is 2.5.0 on the local end and 2.5.5 on the remote end.
> Network pipe between the two is 768KB VPN WAN. On the local side,
> here's
> what I see:
>
> Begin job 02-tomove-hpx at Tue Jun 18 10:13:36 2002
> Executing /somepath/rsync -z -v --exclude=.snapshot
> --exclude=lost+found --archive --delete --force
> --rsync-path=/usr/local/bin/rsync /some/path/
> user@somehost.faraway:/another/path/
> building file list ... done
>
> On the remote end, looking with truss -vpoll -p:
>
> lstat64("toolbox/shaperouter.mgc_shaperouter.attr", 0xFFBEFAE0) =
0
> lstat64("toolbox/shaperouter/shaperouter.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/spicenet2G6", 0xFFBEFAE0) = 0
> lstat64("toolbox/spicenet2G6", 0xFFBEF1D8) = 0
> lstat64("toolbox/spicenet2G6.SpiceNet2G6.attr", 0xFFBEFAE0) = 0
> lstat64("toolbox/spicenet2G6/spicenet2G6.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/srp", 0xFFBEFAE0) = 0
> lstat64("toolbox/srp", 0xFFBEF1D8) = 0
> lstat64("toolbox/srp.mgc_srp_tool.attr", 0xFFBEFAE0) = 0
> lstat64("toolbox/srp/srp.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/test_fablink", 0xFFBEFAE0) = 0
> lstat64("toolbox/test_fablink", 0xFFBEF1D8) = 0
> lstat64("toolbox/test_fablink.mgc_test_fablink.attr", 0xFFBEFAE0)
= 0
> lstat64("toolbox/test_fablink/test_fablink.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/test_layout", 0xFFBEFAE0) = 0
> lstat64("toolbox/test_layout", 0xFFBEF1D8) = 0
> lstat64("toolbox/test_layout.mgc_test_layout.attr", 0xFFBEFAE0) =
0
> lstat64("toolbox/test_layout/test_layout.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/to_layout", 0xFFBEFAE0) = 0
> lstat64("toolbox/to_layout", 0xFFBEF1D8) = 0
> lstat64("toolbox/to_layout.to_layout_tvpt.attr", 0xFFBEFAE0) = 0
> lstat64("toolbox/to_layout/to_layout.qual", 0xFFBEFAE0) = 0
> lstat64("toolbox/vnet", 0xFFBEFAE0) = 0
> lstat64("toolbox/vnet", 0xFFBEF1D8) = 0
> lstat64("toolbox/vnet.VNet.attr", 0xFFBEFAE0) = 0
> lstat64("toolbox/vnet/vnet.qual", 0xFFBEFAE0) = 0
> poll(0xFFBEE7E0, 2, 60000) = 1
> fd=1 ev=POLLOUT rev=POLLOUT
> fd=8 ev=POLLRDNORM rev=0
> write(1, "04\0\007FFFFFFFF", 8) = 8
> poll(0xFFBEF4D0, 2, 60000) = 1
> fd=6 ev=POLLRDNORM rev=POLLRDNORM
> fd=8 ev=POLLRDNORM rev=0
> read(6, "FFFFFFFF", 4) = 4
> poll(0xFFBEE850, 2, 60000) = 1
> fd=1 ev=POLLOUT rev=POLLOUT
> fd=8 ev=POLLRDNORM rev=0
> write(1, "04\0\007FFFFFFFF", 8) = 8
> poll(0xFFBEF540, 2, 60000) = 1
> fd=6 ev=POLLRDNORM rev=POLLRDNORM
> fd=8 ev=POLLRDNORM rev=0
> read(6, "01\0\0\0", 4) = 4
> close(6) = 0
> poll(0xFFBEE938, 2, 60000) = 1
> fd=1 ev=POLLOUT rev=POLLOUT
> fd=8 ev=POLLRDNORM rev=0
> write(1, "04\0\007FFFFFFFF", 8) = 8
> kill(18231, SIGUSR2) = 0
> waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) = 0
> Received signal #18, SIGCLD, in poll() [caught]
> siginfo: SIGCLD CLD_EXITED pid=18231 status=0x0000
> poll(0xFFBEFAE8, 0, 20) Err#4 EINTR
> waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) = 0
> waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
> setcontext(0xFFBEF7D0)
> poll(0xFFBEFAE8, 0, 16) = 0
> waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
> sigaction(SIGUSR1, 0xFFBEFB48, 0xFFBEFBC8) = 0
> sigaction(SIGUSR2, 0xFFBEFB48, 0xFFBEFBC8) = 0
> llseek(0, 0, SEEK_CUR) Err#9 EBADF
> _exit(0)
> bash-2.03$
> The destination directory has free space. I have a job between the same
> hosts (different paths) that executes successfully just before this job.
> This job fails consistently, but not always after the same file lstat.
> I have tried disabling -z, using --bwlimit, disabling -v, using -vvvvv,
> all to no avail. Also tried changing the local end of the rsync to a
> different system. I still need to try moving the far end, but I do get
> a similar problem on a completely different rsync to a different host
> (same source).
>
> I can provide additonal details if needed. Any help greatly
> appreciated.
>
> - Lee
> lee_melvin@mentor.com
>
> --
> To unsubscribe or change options:
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html