thr3ads.net - rsync - error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2] [Apr 2017]

If this information is useful, please help other people find it:
Share via:

John A Hawkinson

2017-Apr-10 23:37 UTC

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

Hi:

    I'm in the middle of recoverying from a tactical error copying
around an Mac OS X 10.10.5 Time Machine backup (turns out Apple's
instructions aren't great...), and I had rsync running for the past 6
hours repairing permissions/acls on 1.5 TB of data (not copying the
data), and then it just died in the middle with:

.L....og.... 2015-03-11-094807/platinum-bar2/usr/local/mysql ->
mysql-5.6.21-osx
10.8-x86_64
ERROR: out of memory in expand_item_list [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(106)
[sender=3.1.2]
rsync: [sender] write error: Broken pipe (32)

It was invoked as 

rsync -iaHJAX platinum-barzoom/Backups.backupdb/pb3/
platinum-barratry/x/Backups.backupdb/pb3/

I suspect the situation will be different the next time around, and I'm
also not really inclined to try to wait another 7 hours for it to fail.

Process limits were:

bash-3.2# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited


Based on "df -i", there are about 200 million inodes in the scope of
the
copy

bash-3.2# df -i 
Filesystem                        512-blocks       Used  Available Capacity  
iused     ifree %iused  Mounted on
/dev/disk2s2                      3906353072 3879969656   26383416   100%
484996205   3297927   99%   /Volumes/platinum-barzoom
/dev/disk1s2                      9766869344 3327435312 6439434032    35%
207964705 402464627   34%   /Volumes/platinum-barratry

Because this is a Time Machine backup, and there were 66 snapshots of a
1 TB disk consuming about 1.5 TB, there were a *lot* of hard links. Many
of directories rather than individual files, so it's a little
challenging to estimate what the number of files to links is.

Are there any useful tips here?
Is it worth filing a bug report on this thin record?
I guess I can turn on core dumps and increase (unlimit completely) the 
stack size...

Although it doesn't seem to have segfaulted, so I'm not sure having
core dumps enabled would have helped?

This was rsync 3.1.2 as installed via Homebrew which appears to have
applied 3 local patches:
https://github.com/Homebrew/homebrew-core/blob/68fe052398ea0939315ad87b0a08313c9d9a95af/Formula/rsync.rb

    apply "patches/fileflags.diff",
          "patches/crtimes.diff",
          "patches/hfs-compression.diff"
  
p.s.: If I had to start over, I would have spent less time just deleting
the data and recopying it, rather than trying to fixup the metadata and
dealing with magic Apple stuff like the inability to modify symlinks
inside a top-level Backups.backupdb directory of a Time Machine hfs
volume (But you can move the top-level directory into another directory
and then modify symlinks inside and then move it back). This has been
an "interesting" experience.

Thanks.

--jhawk at mit.edu
  John Hawkinson

John Hawkinson

2017-Apr-11 12:33 UTC

head link

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

> I guess I can turn on core dumps and increase (unlimit completely) the 
> stack size...
> 
> Although it doesn't seem to have segfaulted, so I'm not sure having
> core dumps enabled would have helped?
Indeed. I reran it on the 50 remaining individual directories,
it seems to have made it through 10 before failing again overnight:

66596  20:38:03 rsync -vP -iaHJAX  `cat /tmp/unback` 
/Volumes/platinum-barratry/x/Backups.backupdb/pb3/

...
rsync: unpack_smb_acl: sys_acl_get_info(): Undefined error: 0 (0)
rsync: unpack_smb_acl: sys_acl_get_info(): Undefined error: 0 (0)
...
.f....og.a..
2015-06-19-072520/platinum-bar2/Users/jhawk/.emacs.d/auto-save-list/.saves-1541-platinum-bar2.local~
ERROR: out of memory in expand_item_list [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(106)
[sender=3.1.2]
bash-3.2#  


No segfault, no coredump, no syslogs.

Do I need to run this under lldb and set a breakpoint in expand_item_list()?
Quick inspection suggests running with -vvvv might give some useful output:

1680         if (DEBUG_GTE(FLIST, 3)) {
1681                 rprintf(FINFO, "[%s] expand %s to %s bytes, did%s
move\n",
1682                         who_am_i(), desc, big_num(new_size * item_size),
1683                         new_ptr == lp->items ? " not" :
"");
1684         }
1685         if (!new_ptr)
1686                 out_of_memory("expand_item_list");

Although I imagine that output might be voluminous [but maybe not]?

Again, I don't have time to build test cases and reproduce this
carefully, every run is painful and long and slow. But I'd like to do the
responsible thing if someone can tell me what that is.
> p.s.: If I had to start over, I would have spent less time just deleting
> the data and recopying it, rather than trying to fixup the metadata and
Indeed, it's looking like fixing the metadata with rsync is an order
of magnitude slower, even as far as I've gotten. So maybe it's time to
find
another method. I don't think fts(3) is optimized any better for large
hardlink farms, so I think maybe I need a homegrown solution? Ughhhh.

--jhawk at mit.edu
  John Hawkinson

John Hawkinson

2017-Apr-11 15:29 UTC

head link

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

Sorry to keep replying to myself:
> Because this is a Time Machine backup, and there were 66 snapshots of a
> 1 TB disk consuming about 1.5 TB, there were a *lot* of hard links. Many
> of directories rather than individual files, so it's a little
Err, whoops? No, I was tired and confused. They are not hard links
to directories, that would screw up the universe. Still, lots of hard
linked files.
> Do I need to run this under lldb and set a breakpoint in
expand_item_list()?
> Quick inspection suggests running with -vvvv might give some useful output:

Err...

the result of this was it processed a few files for a minute or so and
then hung in select() and consumed no cpu and there was no disk
activity. Unfortunately apparently my clang/lldb workflow was broken
and I didn't have functional debugging symbols (...) and I also lost
the stack trace I thought I had (inadequate scrollback), so I'm not
sure what was going on. But at first blush, it appeared that adding -vvvv
made things hang forever.

Removing it, and rerunning, it's now happily trucking along and has
been for the past hour actually doing work.

--jhawk at mit.edu
  John Hawkinson

John Hawkinson

2017-Apr-11 17:09 UTC

head link

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

> But at first blush, it appeared that adding -vvvv made things hang
> forever.
Yes. Confirmed against the git HEAD (9e7b8ab7cf66ecd152002926a7da61d8ad862522).

Running:

	rsync -nvvvv -iaHJAX $d $b

Does some initial work and then gets to:

bash-3.2# lldb -p 6458
(lldb) process attach --pid 6458
Process 6458 stopped
* thread #1: tid = 0x1d5ff0, 0x00007fff8c2d83fa libsystem_kernel.dylib`__select
+ 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff8c2d83fa libsystem_kernel.dylib`__select + 10
libsystem_kernel.dylib`__select:
->  0x7fff8c2d83fa <+10>: jae    0x7fff8c2d8404            ;
<+20>
    0x7fff8c2d83fc <+12>: movq   %rax, %rdi
    0x7fff8c2d83ff <+15>: jmp    0x7fff8c2d3c78            ; cerror
    0x7fff8c2d8404 <+20>: retq   

Executable module set to "/usr/local/bin/rsync".
Architecture set to: x86_64h-apple-macosx.
(lldb) bt
* thread #1: tid = 0x1d5ff0, 0x00007fff8c2d83fa libsystem_kernel.dylib`__select
+ 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8c2d83fa libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010dd6aedc rsync`perform_io(needed=47, flags=4) + 2924 at
io.c:742
    frame #2: 0x000000010dd6a0bd rsync`send_msg(code=MSG_INFO,
buf="[receiver] receiving flist for dir 1319\n", len=40, convert=0) +
349 at io.c:958
    frame #3: 0x000000010dd5c765 rsync`rwrite(code=FINFO, buf="[receiver]
receiving flist for dir 1319\n", len=40, is_utf8=0) + 469 at log.c:279
    frame #4: 0x000000010dd5d3f3 rsync`rprintf(code=FINFO, format="[%s]
receiving flist for dir %d\n") + 627 at log.c:435
    frame #5: 0x000000010dd394a6 rsync`read_ndx_and_attrs(f_in=0, f_out=4,
iflag_ptr=0x00007fff51ed0f64, type_ptr="\x80", buf="",
len_ptr=0x00007fff51ed0f60) + 646 at rsync.c:362
    frame #6: 0x000000010dd4483b rsync`recv_files(f_in=0, f_out=4,
local_name=0x0000000000000000) + 379 at receiver.c:542
    frame #7: 0x000000010dd566c3 rsync`do_recv(f_in=0, f_out=4,
local_name=0x0000000000000000) + 899 at main.c:909
    frame #8: 0x000000010dd54ac7 rsync`do_server_recv(f_in=0, f_out=1, argc=1,
argv=0x00007fff51ed2328) + 1207 at main.c:1078
    frame #9: 0x000000010dd5431e rsync`start_server(f_in=0, f_out=1, argc=2,
argv=0x00007fff51ed2320) + 286 at main.c:1112
    frame #10: 0x000000010dd541ee rsync`child_main(argc=2,
argv=0x00007fff51ed2320) + 46 at main.c:1085
    frame #11: 0x000000010dd852ad rsync`local_child(argc=2,
argv=0x00007fff51ed2320, f_in=0x00007fff51ed4328, f_out=0x00007fff51ed4324,
child_main=(rsync`child_main at main.c:1084)) + 701 at pipe.c:167
    frame #12: 0x000000010dd58e42 rsync`do_cmd(cmd=0x0000000000000000,
machine=0x0000000000000000, user=0x0000000000000000,
remote_argv=0x00007fee40c04a50, remote_argc=0, f_in_p=0x00007fff51ed4328,
f_out_p=0x00007fff51ed4324) + 2210 at main.c:543
    frame #13: 0x000000010dd57fd2 rsync`start_client(argc=1,
argv=0x00007fee40c04a40) + 2402 at main.c:1414
    frame #14: 0x000000010dd5747b rsync`main(argc=2, argv=0x00007fee40c04a40) +
2555 at main.c:1652
    frame #15: 0x00007fff96cad5c9 libdyld.dylib`start + 1
    frame #16: 0x00007fff96cad5c9 libdyld.dylib`start + 1
(lldb) up
frame #1: 0x000000010dd6aedc rsync`perform_io(needed=47, flags=4) + 2924 at
io.c:742
   739              tv.tv_sec = select_timeout;
   740              tv.tv_usec = 0;
   741            
-> 742         cnt = select(max_fd + 1, &r_fds, &w_fds, &e_fds,
&tv);
   743          
   744             if (cnt <= 0) {
   745                      if (cnt < 0 && errno == EBADF) {
(lldb) fin

And then never returns fomr that stack frame, seems to hang out in
select() forever (unless interrupted by a breakpoint or whatnot).


Bugzilla account request submitted.

--jhawk at mit.edu
  John Hawkinson

Possibly Parallel Threads

Search for more possibly parallel threads

rsync - Apr 2017 - error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

Possibly Parallel Threads