samba-bugs at samba.org
2015-Mar-16  16:14 UTC
[Bug 11166] New: running with -vvv causes a hang
https://bugzilla.samba.org/show_bug.cgi?id=11166
            Bug ID: 11166
           Summary: running with -vvv causes a hang
           Product: rsync
           Version: 3.1.1
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: wayned at samba.org
          Reporter: pskocik at gmail.com
        QA Contact: rsync-qa at samba.org
I don't know what might be causing this but when I run rsync with -vvv (my
other options were `-aH --delete  -F -Pi --dry-run -vvv` , the process hangs
somewhere in the middle of my wine's virtual c_drive (I'm running this
to
mirror my /home).
It works fine without the -vvv or even with -vv.
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #1 from pjump <pskocik at gmail.com> --- Basically it just freezes. No IO, no CPU usage. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #2 from devurandom at gmx.net --- There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other: https://bugzilla.samba.org/show_bug.cgi?id=1442 https://bugzilla.samba.org/show_bug.cgi?id=2957 https://bugzilla.samba.org/show_bug.cgi?id=9164 https://bugzilla.samba.org/show_bug.cgi?id=10035 https://bugzilla.samba.org/show_bug.cgi?id=10092 https://bugzilla.samba.org/show_bug.cgi?id=10518 https://bugzilla.samba.org/show_bug.cgi?id=10950 https://bugzilla.samba.org/show_bug.cgi?id=11166 https://bugzilla.samba.org/show_bug.cgi?id=12732 https://bugzilla.samba.org/show_bug.cgi?id=13109 -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166
--- Comment #3 from Mark Vitale <mvitale at sinenomine.net> ---
While debugging a different rsync hang, I have also seen this behavior with
-vvv, and used git bisect to find when it was introduced:
d8587b4690b1987c02c71c136720f366abf250e6 is the first bad commit       
introduced at 3.1.0pre1
commit d8587b4690b1987c02c71c136720f366abf250e6
Author: Wayne Davison <wayned at samba.org>
Date:   Tue Sep 15 16:12:24 2009 -0700
    Change the msg pipe to use a real multiplexed IO mode
    for the data that goes from the receiver to the generator.
I hope that helps.
Regards,
--
Mark Vitale
mvitale at sinenomine.net
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166
--- Comment #4 from Michal Ruprich <mruprich at redhat.com> ---
When running on localhost with just -avvv options, rsync spawns three processes
each stuck in the select function in the new perform_io function:
# ps -aux | grep rsync
root 98764 10.0  0.1  13108  3060 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv
src/ dst/
root 98765  0.0  0.1  12816  1872 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv
src/ dst/
root 98766 10.6  0.1  13076  2432 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv
src/ dst/
# pstack 98764
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd4b76d in write_buf ()
#3  0x000055cd6fd50022 in send_token ()
#4  0x000055cd6fd3d12e in matched.isra ()
#5  0x000055cd6fd3d3c2 in match_sums ()
#6  0x000055cd6fd32dd9 in send_files ()
#7  0x000055cd6fd3c512 in client_run ()
#8  0x000055cd6fd1d9f1 in main ()
# pstack 98765
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd49d8f in send_msg ()
#3  0x000055cd6fd3f5fa in rwrite ()
#4  0x000055cd6fd4a982 in read_a_msg ()
#5  0x000055cd6fd49046 in perform_io ()
#6  0x000055cd6fd4a1a9 in io_flush ()
#7  0x000055cd6fd2f1d7 in generate_files ()
#8  0x000055cd6fd3b941 in do_recv ()
#9  0x000055cd6fd3c0d4 in start_server ()
#10 0x000055cd6fd3c20b in child_main ()
#11 0x000055cd6fd5c088 in local_child ()
#12 0x000055cd6fd1d99a in main ()
# pstack 98766
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd49d8f in send_msg ()
#3  0x000055cd6fd3f2c0 in rwrite ()
#4  0x000055cd6fd3fa07 in rprintf ()
#5  0x000055cd6fd28e11 in finish_transfer ()
#6  0x000055cd6fd314fa in recv_files ()
#7  0x000055cd6fd3b852 in do_recv ()
#8  0x000055cd6fd3c0d4 in start_server ()
#9  0x000055cd6fd3c20b in child_main ()
#10 0x000055cd6fd5c088 in local_child ()
#11 0x000055cd6fd1d99a in main ()
These are most likely waiting for the same buffers?
# strace -p 98764
strace: Process 98764 attached
select(5, [], [4], [], {tv_sec=8, tv_usec=659438}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}
# strace -p 98765
strace: Process 98765 attached
select(2, [], [1], [], {tv_sec=40, tv_usec=875659}) = 0 (Timeout)
select(2, [], [1], [], {tv_sec=60, tv_usec=0}
# strace -p 98766
strace: Process 98766 attached
select(5, [], [4], [], {tv_sec=55, tv_usec=452214}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}
So all the processes are waiting to write something? I am guessing that the
client process and one of the server processes are waiting to write to the
iobuf(fd=4) and the second server process is writing to stdout(assuming fd=1 is
stdout)?
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #5 from Michal Ruprich <mruprich at redhat.com> --- If anyone from rsync could help, that would be awesome. I am trying to figure this out but it is hard to tell what happened and who is actually waiting on who. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166
Wayne Davison <wayne at opencoder.net> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
--- Comment #6 from Wayne Davison <wayne at opencoder.net> ---
The hang isn't very easy to fix because of the current way the 3 processes
communicate when performing a push operation (which is what happens by default
in a local transfer) combined with the huge slew of messages that get generated
by -vvv. The messages have to go "around the horn" from the receiver
to the
generator to the sender to be output, and that forwarding can hang up in
certain full-buffer situations. I have some ideas I'm looking into for
improving this, but it will take a while.
In the meantime you have several options for working around the issue:
Use some --info=FOO and/or --debug=FOO options instead of -vvv to limit the
number of messages that rsync is generating to just those that you really need
to see. (Specify a FOO of "help" to see a list of choices.)
Change a local copy from a push to a pull using the support/lsh script that
comes with rsync. e.g., instead of running "rsync -aivvv src/ dest/"
you'd run:
    rsync -aivvve lsh localhost:$PWD/src/ dest/
Finally, use the --msgs2stderr (which can be specified as {-M,}--msgs2stderr to
affect both sides of a non-local transfer) so that the messages are not in the
protocol stream. If you want to map them back from stderr to stdout, you could
specify "2>&1" on the command-line. This choice has small
chance to cause some
weirdness in the message stream, though, since all 3 processes are outputting
debug messages at the same time to stderr (though rsync changes stderr to be
line buffered to try to minimize that).
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166
Wayne Davison <wayne at opencoder.net> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
--- Comment #7 from Wayne Davison <wayne at opencoder.net> ---
I figured out a simple fix that allows the receiver to expand its message buf
size when things clog up and committed that fix to git for 3.2.0.  I still have
a dream of fixing this in a better way in the future, but this will work OK for
now.
I still suggest that you avoid -vvv when possible, but it will at least work
when you do it (assuming you don't overflow memory). In my one local
testing,
the msg buffer expanded from 16KB to 512KB (in several steps) and then stayed
steady at that size, even when I increased the number of files copied.
Also, my latest git version of "lsh" (& its bash
"lsh.sh" alternative) in the
support dir now accepts a hostname of "lh" for localhost with an
implied
--no-cd option so that the "remote" side doesn't default to the
user's home
dir. This makes it easier to type a local-copy command that will avoid the
extra msg memory allocations:
    rsync -aivvve lsh lh:src/ dest/
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #8 from Michal Ruprich <mruprich at redhat.com> --- This is awesome thanks. No hang on a reproducer with circa 9k of files(6.6G of files). -- You are receiving this mail because: You are the QA Contact for the bug.
Apparently Analagous Threads
- [Bug 8666] New: --debug=all9 fail
- [Bug 13109] New: rsync hangs during transfer of many small files
- [Bug 10372] New: rsync 3.10 error in protocol data stream while rsync 3.0.9 runs through
- error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]
- [Bug 11166] swfdec crashes firefox if it cannot access the sound device