thr3ads.net - rsync - [Bug 10518] New: rsync hangs (100% cpu) [Mar 2014]

If this information is useful, please help other people find it:
Share via:

samba-bugs at samba.org

2014-Mar-26 20:16 UTC

[Bug 10518] New: rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

           Summary: rsync hangs (100% cpu)
           Product: rsync
           Version: 3.1.1
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P5
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: syzop at vulnscan.org
         QAContact: rsync-qa at samba.org


The rsync process hangs, has been running for over 24hrs now. strace reveals no
system calls, process uses 100% CPU. Important: Even if the other sides rsync
disconnects or is killed, the process stays running on this side, so it really
seems stuck somewhere.

Was previously using 3.0.9-4 on Debian (still on the other side), but upgraded
this side to 3.1.1beta1 (compiled from source) to see if the bug would persist.
Unfortunately it does. I have compiled with debugging symbols enabled.

I also have core file available (generated by gcore) which I can e-mail on
request.

root at st1:~# lsof -n -p 4942
COMMAND  PID USER   FD   TYPE DEVICE     SIZE/OFF     NODE NAME
rsync   4942 root  cwd    DIR    9,1         4096        2 /
rsync   4942 root  rtd    DIR    9,1         4096        2 /
rsync   4942 root  txt    REG    9,1      1066198   146645 /usr/bin/rsync
rsync   4942 root  mem    REG    9,1        18672   262147
/lib/x86_64-linux-gnu/libattr.so.1.1.0
rsync   4942 root  mem    REG    9,1      1599504   267006
/lib/x86_64-linux-gnu/libc-2.13.so
rsync   4942 root  mem    REG    9,1        35320   262162
/lib/x86_64-linux-gnu/libacl.so.1.1.0
rsync   4942 root  mem    REG    9,1       136936   267028
/lib/x86_64-linux-gnu/ld-2.13.so
rsync   4942 root  mem    REG    9,1      1534672   137602
/usr/lib/locale/locale-archive
rsync   4942 root    0r  FIFO    0,8          0t0 13331051 pipe
rsync   4942 root    1w  FIFO    0,8          0t0 13331052 pipe
rsync   4942 root    2w  FIFO    0,8          0t0 13331053 pipe
rsync   4942 root    3r   REG  147,1 214748364800 43712526
/data/vm/green/data-ll.img
root at st1:~# ls -al /data/vm/green/data.img
-rw-r--r-- 1 libvirt-qemu libvirt-qemu 375809638400 Mar 26 21:01
/data/vm/green/data.img
root at st1:~# ps auxw|grep rsync
root      4942 97.3  0.5 120800 83448 ?        Rs   Mar25 1379:18 rsync
--server --sender -vlHogDtpAXrze.iLsf                                          
                                                       --numeric-ids . /
root      8755  0.0  0.0   7836   872 pts/1    S+   21:02   0:00 grep rsync
root at st1:~# strace -p 4942
Process 4942 attached - interrupt to quit
^CProcess 4942 detached
root at st1:~# strace -p 4942
Process 4942 attached - interrupt to quit
^CProcess 4942 detached
root at st1:~# gdb /usr/bin/rsync 4942
--snip--
0x00007f7a75e48914 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f7a75e48914 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000042649f in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:236
#2  0x0000000000426cf4 in match_sums (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:398
#3  0x000000000041a175 in send_files (f_in=0, f_out=1) at sender.c:391
#4  0x0000000000423620 in do_server_sender (f_in=0, f_out=1, argc=1,
argv=0x18da238) at main.c:822
#5  0x0000000000423f1f in start_server (f_in=0, f_out=1, argc=2,
argv=0x18da230) at main.c:1089
#6  0x0000000000425282 in main (argc=2, argv=0x18da230) at main.c:1630
(gdb) quit
-- a number of seconds later --
root at st1:~# gdb /usr/bin/rsync 4942
--snip--
0x0000000000426972 in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:310
310                     } while ((i = s->sums[i].chain) >= 0);
(gdb) bt
#0  0x0000000000426972 in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:310
#1  0x0000000000426cf4 in match_sums (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:398
#2  0x000000000041a175 in send_files (f_in=0, f_out=1) at sender.c:391
#3  0x0000000000423620 in do_server_sender (f_in=0, f_out=1, argc=1,
argv=0x18da238) at main.c:822
#4  0x0000000000423f1f in start_server (f_in=0, f_out=1, argc=2,
argv=0x18da230) at main.c:1089
#5  0x0000000000425282 in main (argc=2, argv=0x18da230) at main.c:1630
--and again some time later --
root at st1:~# gdb /usr/bin/rsync 4942
--snip--
0x000000000042648c in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:236
236                             if
(memcmp(sum2,s->sums[i].sum2,s->s2length) !0) {
(gdb) bt
#0  0x000000000042648c in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:236
#1  0x0000000000426cf4 in match_sums (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:398
#2  0x000000000041a175 in send_files (f_in=0, f_out=1) at sender.c:391
#3  0x0000000000423620 in do_server_sender (f_in=0, f_out=1, argc=1,
argv=0x18da238) at main.c:822
#4  0x0000000000423f1f in start_server (f_in=0, f_out=1, argc=2,
argv=0x18da230) at main.c:1089
#5  0x0000000000425282 in main (argc=2, argv=0x18da230) at main.c:1630
(gdb) bt full
#0  0x000000000042648c in hash_search (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:236
        l = 131072
        done_csum2 = 1
        hash_entry = 0
        i = 93079
        prev = 0x7f7a70dd1fb8
        offset = 4668161391
        aligned_offset = 0
        end = 214748233729
        k = 131072
        want_i = 1638400
        aligned_i = 0
        backup = 146704
        sum2 = "\021\306D\256\071\233\273a$\335\000\063\371<",
<incomplete
sequence \333>
        s1 = 4294836224
        s2 = 317063168
        sum = 0
        more = 1
        map = 0x256570f
"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...
#1  0x0000000000426cf4 in match_sums (f=1, s=0x192d220, buf=0x192d1e0,
len=214748364800) at match.c:398
No locals.
#2  0x000000000041a175 in send_files (f_in=0, f_out=1) at sender.c:391
        fd = 3
        s = 0x192d220
        mbuf = 0x192d1e0
        st = {st_dev = 37633, st_ino = 43712526, st_nlink = 1, st_mode = 33188,
st_uid = 105, st_gid = 109, __pad0 = 0, st_rdev = 0, st_size = 214748364800,
st_blksize = 4096, st_blocks = 293027752,
          st_atim = {tv_sec = 1395779701, tv_nsec = 412191522}, st_mtim {tv_sec
= 1395789399, tv_nsec = 347783447}, st_ctim = {tv_sec = 1395789399,
tv_nsec = 347783447}, __unused = {0, 0, 0}}
        fname = "data/vm/green/data-ll.img\000m.img", '\000'
<repeats 1521
times>,
"@)\277\363\377\177\000\000\060)\277\363\377\177\000\000P)\277\363\377\177\000\000@)\277\363\377\177\000\000\r\000\000\000\000\000\000\000\260*\277\363\377\177\000\000P)\277\363\377\177\000\000\th\325uz\177\000\000\005",
'\000' <repeats 15 times>,
"p)\277\363\377\177\000\000\th\325uz\177\000\000\023\217G",
'\000' <repeats 29
times>, "\023\217G", '\000' <repeats 29
times>"\340, )\277\363", '\000'
<repeats 36 times>, "
\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000Z\221G\000\001",
'\000' <repeats 11 times>"\377,
\377\377\377\377\377\377\377", '\000' <repeats
12 times>, "\001", '\000' <repeats 11
times>"\377"...
        xname = '\000' <repeats 2296 times>,
"\n\001+vz\177\000\000\000\000\000\000\000\000\000\000\020\026\277\363\377\177\000\000\000\000\000\000\000\000\000\000\020\026\277\363\377\177\000\000\210\245Kvz\177\000\000\005\000\000\000\000\000\000\000\355T\r\257\000\000\000\000\023\b+vz\177",
'\000' <repeats 18 times>,
"-\000\000\000\000\000\000\000S5\274\002\000\000\000\000\220R\321uz\177\000\000\200\027\277\363\377\177\000\000`l\321uz\177\000\000\000\000\000\000\000\000\000\000
\235\321uz\177", '\000' <repeats 18 times>,
"??Kvz\177\000\000\250YLvz\177\000\000\022\034@\000\000\000\000\000\000\070\322uz\177\000\000\270\023@\000\000\000\000\000\000\000\000\000\001\000\000\000\b\002\000\000\001",
'\000' <repeats 11 times>, " \205Lvz\177\000\000\320"...
        path = 0x18fab50 "/"
        slash = 0x4699dd "/"
        fnamecmp_type = 128 '\200'
        iflags = 32872
        xlen = -1
        file = 0x7f7a75b07418
        phase = 0
        max_phase = 2
        itemizing = 0
        log_code = FINFO
        f_xfer = 1
        save_io_error = 0
        ndx = 428
        j = 0
#3  0x0000000000423620 in do_server_sender (f_in=0, f_out=1, argc=1,
argv=0x18da238) at main.c:822
        flist = 0x18fa9b0
        dir = 0x7ffff3bf4e65 "."
#4  0x0000000000423f1f in start_server (f_in=0, f_out=1, argc=2,
argv=0x18da230) at main.c:1089
No locals.
#5  0x0000000000425282 in main (argc=2, argv=0x18da230) at main.c:1630
        ret = 0
        orig_argc = 7
        orig_argv = 0x7ffff3bf3fc8
        sigmask = {__val = {84483, 0 <repeats 15 times>}}
--snip--
Detaching from program: /usr/bin/rsync, process 4942
root at st1:~# gcore 4942
hash_search (f=1, s=0x192d220, buf=0x192d1e0, len=214748364800) at match.c:238
238                                     continue;
Saved corefile core.4942
root at st1:~# ls -al core*
-rw-r--r-- 1 root root 115099184 Mar 26 21:04 core.4942
root at st1:~# ls -alh  core*
-rw-r--r-- 1 root root 110M Mar 26 21:04 core.4942

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-26 20:18 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #1 from Bram Matthys <syzop at vulnscan.org> 2014-03-26
20:18:05 UTC ---
Just realized I ls'd the wrong file. Not that it matters much, so just FYI:
# ls -al /data/vm/green/data-ll.img
-rw-r--r-- 1 libvirt-qemu libvirt-qemu 214748364800 Mar 26 21:18
/data/vm/green/data-ll.img

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-26 20:24 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #2 from Bram Matthys <syzop at vulnscan.org> 2014-03-26
20:24:14 UTC ---
And obviously I meant I'm using 3.1.1pre1, not beta1...

Anyway, if you need the (gzip'ed) core file, just let me know. I don't
want to
attach it here due to it likely containing sensitive info.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-27 22:29 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #3 from roland <devzero at web.de> 2014-03-27 22:29:42 UTC ---
large file involved?

have a look at this:
http://www.anchor.com.au/blog/2013/08/out-tridging-tridge/

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-28 12:18 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #4 from Bram Matthys <syzop at vulnscan.org> 2014-03-28
12:18:55 UTC ---
Thanks for the suggestion. That post says the patch was added in rsync in
August 2013, so that should be included in the 3.1.1pre1 of end of January.

Looking through the git changelog since 3.1.1pre1 I see very few fixes other
than installation-wise. 10450 (which seems just a progress indicator), and some
slashes/relative-dir fixes.

I'll try latest git nonetheless.

Note, by the way, that I am not using --inplace. I see I didn't mention the
exact rsync line I used (only part of it is visible in 'ps'). It's:
rsync -avzHAX --numeric-ids --delete-during
--exclude-from=/backup/scripts/exclude/XXX root at A.B.C.D:/ tmp/XXX/fs/

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-28 16:57 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

Wayne Davison <wayned at samba.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #5 from Wayne Davison <wayned at samba.org> 2014-03-28
16:57:07 UTC ---
The latest git will have the same issue.  Looks like your HUGE file has created
a very inefficient hash table of chunks, which the sender is spending a lot of
CPU time trying to search through.

The only fix for this is to either try to make a really large hash more
efficient (i.e. drop duplicates if they get out of hand) or to break the file
up into more manageable chunks.

I've been thinking about the latter possibility, and I think that it would
be
pretty easy to change the sender to have it stop reading checksums at a certain
point in a file and just have it start generating content, at which point it
would clear the hash table and go 'round again.  The hardest bit is that the
code order doesn't support such a looping, so it might take a bit of work to
get that going.  It shouldn't require any change in protocol, though, as the
receiver would not notice that the sender had decided to chunk-ify the file,
and the generator wouldn't have needed to send anything different.

Rsync has never been very good at really big files, but a change such as the
above should help it.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-28 17:25 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #6 from Bram Matthys <syzop at vulnscan.org> 2014-03-28
17:25:22 UTC ---
Thanks for your explanation.

The file in question is an image file (ext4) used by a VM. Since the file is
fresh and not all is in use, it is likely that this file still contains a few
tens of gigabytes of nul bytes, I guess that would explain the unbalanced hash
table.

Your post inspired me to create a workaround.. I'll mount the image file,
create a file in it with random data until the disk is full, then rm the file,
and unmount. Since the (unallocated) space is then no longer all binary zeros I
hope this should temporarily solve the problem. As long as I don't do too
many
disk write tests with dd if=/dev/zero ... ;)

I'm a big fan of rsync, use it all the time on numerous servers. I
wasn't aware
of this shortcoming until now. In my view rsync is THE tool to use for well..
almost any synchronization.. and it only becomes (or should become) more useful
when the files get bigger.

Any change you can make to eliminate this problem (or make it far less of a
problem) would be appreciated.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Mar-29 20:10 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #7 from Bram Matthys <syzop at vulnscan.org> 2014-03-29
20:10:00 UTC ---
Ok that random data trick didn't help, or not enough. I'll be using the
-W
option for now then, to workaround the problem.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2014-Apr-06 13:47 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #8 from Eric Chaput <samba at ftd.idms.com> 2014-04-06
13:47:15 UTC ---
I struggled with the same problem trying to rsync a particular large VM file.

Even though your random data trick didn't help it got me thinking about the
issue and I thought couldn't we just increase the blocksize of the delta
xfer
algorithm until it chunks it out in few enough pieces?

It turns out that you cannot increase it to more than 128k for protocol version
30, which rsync 3 uses.

Thankfully you can specify which protocol version to use.

I tried 1MB, same problem, cpu hangs on sender at some point, then tried 10MB
and that worked for me (40GB VM img which probably contains 50% nuls). Here are
the command parameters I used:

--block-size=10485760 --protocol=29 

I'm using --inplace too in case that matters, and stock rsync 3.0.7 from
debian
squeeze on both ends

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

samba-bugs at samba.org

2017-Dec-31 01:27 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #9 from devurandom at gmx.net ---
There are a lot of bugreports related to rsync hanging mysteriously, some of
which may be duplicates of each other:

https://bugzilla.samba.org/show_bug.cgi?id=1442
https://bugzilla.samba.org/show_bug.cgi?id=2957
https://bugzilla.samba.org/show_bug.cgi?id=9164
https://bugzilla.samba.org/show_bug.cgi?id=10035
https://bugzilla.samba.org/show_bug.cgi?id=10092
https://bugzilla.samba.org/show_bug.cgi?id=10518
https://bugzilla.samba.org/show_bug.cgi?id=10950
https://bugzilla.samba.org/show_bug.cgi?id=11166
https://bugzilla.samba.org/show_bug.cgi?id=12732
https://bugzilla.samba.org/show_bug.cgi?id=13109

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2019-Jul-15 15:37 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #10 from roland <devzero at web.de> ---
apparently, what https://www.anchor.com.au/blog/2013/08/out-tridging-tridge/ is
trying to explain and fix (
https://git.samba.org/?p=rsync.git;a=commit;h=de94193353864221280be9fbb6193d92eb133000
) does not fix the issue satisfyingly.

i have pathological slow transfer when re-syncing 500gb vhd image to linux/zfs
remote system. it's much faster scp'ing or rsyncing a fresh copy then
syncing
the file. but that is no option, as i would loos snapshot efficiency on the zfs
side when the file is getting re-written as a whole - dispite the network
bandwidth i don't want to spend for it either.

https://lists.samba.org/archive/rsync/2019-July/031941.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2019-Aug-06 08:54 UTC

head link

[Bug 10518] rsync hangs (100% cpu)

https://bugzilla.samba.org/show_bug.cgi?id=10518

--- Comment #11 from Paul Slootman <paul at debian.org> ---
roland: note that unless you explicitly use --inplace, rsync will create a new
file and rename it at the end, so that's going to mess up your zfs snapshot
efficiency anyway.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Maybe Matching Threads

Search for more possibly parallel threads

rsync - Mar 2014 - [Bug 10518] New: rsync hangs (100% cpu)

[Bug 10518] New: rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

[Bug 10518] rsync hangs (100% cpu)

Maybe Matching Threads