Greetings and salutations, rsync users. I have a problem. I'm hoping that
someone out there could perhaps provide a hand.
I've been trying to transfer large amounts of data (lots of data, lots of
files) via rsync over an encrypted TCP tunnel, but I seem to be continually
getting hangs in the transfers -- things will go along for a bit, and then
just come to a screetching halt. There doesn't immediately appear to be
any particular set of common circumstances that trigger this event.
I've tried doing these rsyncs over stunnel4, stunnel3, and using ssh 
forwarded ports (unfortunately, a direct-via-ssh copy doesn't suit my
needs, even though that seems to work fine, and always has).
I've tested using several combinations of FreeBSD (a fairly old build, 3.x
I think), Solaris 2.6, Solaris 9, and Solaris 10(b69), with identical results
on each. Transfer starts over the tunnel, and after some arbitrary period
of time, it just stops.
This is all with the current release (2.6.3) of rsync.
Much google searching has pointed to a couple of reports of this type of 
problem, but not much in the way of fixes -- most seem to suggest using
--blocking-io on the client end, but that does not appear to fix the
problem.
I found a patch a year or two back on the rsync mailing list that claimed
to fix a 'hanging problem', but the changes from that patch appear to be
integrated with the current rsync release already (so, thus, obviously 
don't fix the hanging problem).
I don't see any obviously matching bugs in the rsync bugzilla database.
Stack traces of hung transfer:
Client:
#0  0x280bf6c8 in select () from /usr/lib/libc.so.3
#1  0x8058124 in writefd_unbuffered (fd=3,
    buf=0x92b8000 "...", len=32768) at io.c:865
#2  0x8058447 in writefd (fd=3,
    buf=0x92b8000 "...", len=32768) at io.c:981
#3  0x8058551 in write_buf (f=3,
    buf=0x92b8000 "...", len=32768) at io.c:1045
#4  0x8058ff8 in simple_send_token (f=3, token=-2, buf=0x807d0c0,
    offset=3833856, n=32768) at token.c:104
#5  0x80598c0 in send_token (f=3, token=-2, buf=0x807d0c0, offset=3833856,
    n=32768, toklen=0) at token.c:472
#6  0x805108c in matched (f=3, s=0x807b020, buf=0x807d0c0, offset=3866624,
    i=-2) at match.c:114
#7  0x805196e in match_sums (f=3, s=0x807b020, buf=0x807d0c0, len=12530436)
    at match.c:353
#8  0x804d0fd in send_files (flist=0x807b120, f_out=3, f_in=3) at sender.c:240
#9  0x80500d0 in client_run (f_in=3, f_out=3, pid=-1, argc=1, argv=0x807d000)
    at main.c:688
#10 0x805dc9c in start_socket_client (host=0x807d048 "box.mfnx.net",
    path=0x807d063 "box.mfnx.net/data/", argc=1, argv=0x807d000)
    at clientserver.c:98
#11 0x80504ec in start_client (argc=2, argv=0x807d000) at main.c:861
#12 0x8050a8d in main (argc=2, argv=0x807d000) at main.c:1142
#13 0x804a069 in _start ()
server parent process:
#0  0xff19db44 in _poll () from /usr/lib/libc.so.1
#1  0xff15236c in _select () from /usr/lib/libc.so.1
#2  0x00025ee8 in writefd_unbuffered ()
#3  0x00026f4c in io_multiplex_write ()
#4  0x0001ce6c in rwrite ()
#5  0x0001cfcc in rprintf ()
#6  0x00012ae0 in recv_generator ()
#7  0x000139fc in generate_files ()
#8  0x00019528 in do_recv ()
#9  0x000197f4 in start_server ()
#10 0x0002fec4 in start_daemon ()
#11 0x00030088 in daemon_main ()
#12 0x0001a52c in main ()
server child process:
#0  0xff19db44 in _poll () from /usr/lib/libc.so.1
#1  0xff15236c in _select () from /usr/lib/libc.so.1
#2  0x000251e0 in read_timeout ()
#3  0x00025cc4 in readfd_unbuffered ()
#4  0x0002651c in read_buf ()
#5  0x00028058 in recv_token ()
#6  0x0001418c in receive_data ()
#7  0x00014c08 in recv_files ()
#8  0x0001948c in do_recv ()
#9  0x000197f4 in start_server ()
#10 0x0002fec4 in start_daemon ()
#11 0x00030088 in daemon_main ()
#12 0x0001a52c in main ()
[sorry about lack of full debugging data on the latter two, that can be
corrected if needed]
... I'm really in a bind here, can anyone offer fixes or workarounds?
Thanks!
-jay