samba-bugs at samba.org
2015-Mar-16 16:14 UTC
[Bug 11166] New: running with -vvv causes a hang
https://bugzilla.samba.org/show_bug.cgi?id=11166 Bug ID: 11166 Summary: running with -vvv causes a hang Product: rsync Version: 3.1.1 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: wayned at samba.org Reporter: pskocik at gmail.com QA Contact: rsync-qa at samba.org I don't know what might be causing this but when I run rsync with -vvv (my other options were `-aH --delete -F -Pi --dry-run -vvv` , the process hangs somewhere in the middle of my wine's virtual c_drive (I'm running this to mirror my /home). It works fine without the -vvv or even with -vv. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #1 from pjump <pskocik at gmail.com> --- Basically it just freezes. No IO, no CPU usage. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #2 from devurandom at gmx.net --- There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other: https://bugzilla.samba.org/show_bug.cgi?id=1442 https://bugzilla.samba.org/show_bug.cgi?id=2957 https://bugzilla.samba.org/show_bug.cgi?id=9164 https://bugzilla.samba.org/show_bug.cgi?id=10035 https://bugzilla.samba.org/show_bug.cgi?id=10092 https://bugzilla.samba.org/show_bug.cgi?id=10518 https://bugzilla.samba.org/show_bug.cgi?id=10950 https://bugzilla.samba.org/show_bug.cgi?id=11166 https://bugzilla.samba.org/show_bug.cgi?id=12732 https://bugzilla.samba.org/show_bug.cgi?id=13109 -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #3 from Mark Vitale <mvitale at sinenomine.net> --- While debugging a different rsync hang, I have also seen this behavior with -vvv, and used git bisect to find when it was introduced: d8587b4690b1987c02c71c136720f366abf250e6 is the first bad commit introduced at 3.1.0pre1 commit d8587b4690b1987c02c71c136720f366abf250e6 Author: Wayne Davison <wayned at samba.org> Date: Tue Sep 15 16:12:24 2009 -0700 Change the msg pipe to use a real multiplexed IO mode for the data that goes from the receiver to the generator. I hope that helps. Regards, -- Mark Vitale mvitale at sinenomine.net -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #4 from Michal Ruprich <mruprich at redhat.com> --- When running on localhost with just -avvv options, rsync spawns three processes each stuck in the select function in the new perform_io function: # ps -aux | grep rsync root 98764 10.0 0.1 13108 3060 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ root 98765 0.0 0.1 12816 1872 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ root 98766 10.6 0.1 13076 2432 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ # pstack 98764 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd4b76d in write_buf () #3 0x000055cd6fd50022 in send_token () #4 0x000055cd6fd3d12e in matched.isra () #5 0x000055cd6fd3d3c2 in match_sums () #6 0x000055cd6fd32dd9 in send_files () #7 0x000055cd6fd3c512 in client_run () #8 0x000055cd6fd1d9f1 in main () # pstack 98765 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd49d8f in send_msg () #3 0x000055cd6fd3f5fa in rwrite () #4 0x000055cd6fd4a982 in read_a_msg () #5 0x000055cd6fd49046 in perform_io () #6 0x000055cd6fd4a1a9 in io_flush () #7 0x000055cd6fd2f1d7 in generate_files () #8 0x000055cd6fd3b941 in do_recv () #9 0x000055cd6fd3c0d4 in start_server () #10 0x000055cd6fd3c20b in child_main () #11 0x000055cd6fd5c088 in local_child () #12 0x000055cd6fd1d99a in main () # pstack 98766 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd49d8f in send_msg () #3 0x000055cd6fd3f2c0 in rwrite () #4 0x000055cd6fd3fa07 in rprintf () #5 0x000055cd6fd28e11 in finish_transfer () #6 0x000055cd6fd314fa in recv_files () #7 0x000055cd6fd3b852 in do_recv () #8 0x000055cd6fd3c0d4 in start_server () #9 0x000055cd6fd3c20b in child_main () #10 0x000055cd6fd5c088 in local_child () #11 0x000055cd6fd1d99a in main () These are most likely waiting for the same buffers? # strace -p 98764 strace: Process 98764 attached select(5, [], [4], [], {tv_sec=8, tv_usec=659438}) = 0 (Timeout) select(5, [], [4], [], {tv_sec=60, tv_usec=0} # strace -p 98765 strace: Process 98765 attached select(2, [], [1], [], {tv_sec=40, tv_usec=875659}) = 0 (Timeout) select(2, [], [1], [], {tv_sec=60, tv_usec=0} # strace -p 98766 strace: Process 98766 attached select(5, [], [4], [], {tv_sec=55, tv_usec=452214}) = 0 (Timeout) select(5, [], [4], [], {tv_sec=60, tv_usec=0} So all the processes are waiting to write something? I am guessing that the client process and one of the server processes are waiting to write to the iobuf(fd=4) and the second server process is writing to stdout(assuming fd=1 is stdout)? -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #5 from Michal Ruprich <mruprich at redhat.com> --- If anyone from rsync could help, that would be awesome. I am trying to figure this out but it is hard to tell what happened and who is actually waiting on who. -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 Wayne Davison <wayne at opencoder.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #6 from Wayne Davison <wayne at opencoder.net> --- The hang isn't very easy to fix because of the current way the 3 processes communicate when performing a push operation (which is what happens by default in a local transfer) combined with the huge slew of messages that get generated by -vvv. The messages have to go "around the horn" from the receiver to the generator to the sender to be output, and that forwarding can hang up in certain full-buffer situations. I have some ideas I'm looking into for improving this, but it will take a while. In the meantime you have several options for working around the issue: Use some --info=FOO and/or --debug=FOO options instead of -vvv to limit the number of messages that rsync is generating to just those that you really need to see. (Specify a FOO of "help" to see a list of choices.) Change a local copy from a push to a pull using the support/lsh script that comes with rsync. e.g., instead of running "rsync -aivvv src/ dest/" you'd run: rsync -aivvve lsh localhost:$PWD/src/ dest/ Finally, use the --msgs2stderr (which can be specified as {-M,}--msgs2stderr to affect both sides of a non-local transfer) so that the messages are not in the protocol stream. If you want to map them back from stderr to stdout, you could specify "2>&1" on the command-line. This choice has small chance to cause some weirdness in the message stream, though, since all 3 processes are outputting debug messages at the same time to stderr (though rsync changes stderr to be line buffered to try to minimize that). -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 Wayne Davison <wayne at opencoder.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|ASSIGNED |RESOLVED --- Comment #7 from Wayne Davison <wayne at opencoder.net> --- I figured out a simple fix that allows the receiver to expand its message buf size when things clog up and committed that fix to git for 3.2.0. I still have a dream of fixing this in a better way in the future, but this will work OK for now. I still suggest that you avoid -vvv when possible, but it will at least work when you do it (assuming you don't overflow memory). In my one local testing, the msg buffer expanded from 16KB to 512KB (in several steps) and then stayed steady at that size, even when I increased the number of files copied. Also, my latest git version of "lsh" (& its bash "lsh.sh" alternative) in the support dir now accepts a hostname of "lh" for localhost with an implied --no-cd option so that the "remote" side doesn't default to the user's home dir. This makes it easier to type a local-copy command that will avoid the extra msg memory allocations: rsync -aivvve lsh lh:src/ dest/ -- You are receiving this mail because: You are the QA Contact for the bug.
https://bugzilla.samba.org/show_bug.cgi?id=11166 --- Comment #8 from Michal Ruprich <mruprich at redhat.com> --- This is awesome thanks. No hang on a reproducer with circa 9k of files(6.6G of files). -- You are receiving this mail because: You are the QA Contact for the bug.
Maybe Matching Threads
- [Bug 8666] New: --debug=all9 fail
- [Bug 13109] New: rsync hangs during transfer of many small files
- [Bug 10372] New: rsync 3.10 error in protocol data stream while rsync 3.0.9 runs through
- error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]
- [Bug 11166] swfdec crashes firefox if it cannot access the sound device