thr3ads.net - R devel - [Rd] load/unload segfault puzzle [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Ben Bolker

2013-Jun-11 02:03 UTC

[Rd] load/unload segfault puzzle

Dear r-devel readers,

  I have a pretty deep problem with package loading and unloading in
the development version of the lme4 package
<https://github.com/lme4/lme4>; it's not boiled down to a properly
minimal example yet (this has been difficult), but I am posting anyway
in the hopes that someone has ideas about how to proceed farther,
since I'm nearly stumped. Apologies in advance for the long post.

   EXECUTIVE SUMMARY: after one cycle of loading, testing (e.g. by
running example(lmer)) and unloading lme4, then loading and unloading
the nlme package, re-loading and exercising lme4 becomes very
unstable, leading eventually to a segmentation fault.  More detail is
available at <https://github.com/lme4/lme4/issues/35> .  Because it's
a segmentation fault, exactly _where_ the crash happens varies a bit
according to platform and precise incantation, but it seems I can
always get a segfault eventually.

  We had previously forgotten to write a .onUnload() function to call
library.dynam.unload() [it's not entirely clear to me why R handles
dynamic loading automatically when there is a useDynLib() in the
NAMESPACE, but not unloading: does anyone know if this is
intentional?]; before we had the appropriate unloading hook, instead
of segfaults we got errors about objects from the DLL not being found.

  The package has many complexities that might make it an especially
tough case: it uses reference classes as well as making use of the
Rcpp and RcppEigen packages ...

  I have tried running under valgrind and have seen what appear to be
memory leaks but no memory access errors. The beginning and end of the
valgrind output are appended below -- I can send the full output if
anyone wants to see it ...

  I plan to continue to try to isolate the problem by (1) minimizing
the amount of package-exercising code run between loading and
unloading; (2) trying to create a minimal version of the package. In
the meantime if anyone can think of any shortcuts or clever
diagnostics I would love to hear about it ...

  thanks
    Ben Bolker


--24640-- Valgrind options:
--24640--    --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp
--24640--    -v
--24640--    --leak-check=full

==24640== LEAK SUMMARY:
==24640==    definitely lost: 80 bytes in 2 blocks
==24640==    indirectly lost: 240 bytes in 20 blocks
==24640==      possibly lost: 56,014,306 bytes in 32,200 blocks
==24640==    still reachable: 11,032,257 bytes in 993 blocks
==24640==         suppressed: 0 bytes in 0 blocks
==24640== Reachable blocks (those to which a pointer was found) are
not shown.
==24640== To see them, rerun with: --leak-check=full --show-reachable=yes
==24640===24640== ERROR SUMMARY: 2493 errors from 2493 contexts (suppressed:
151 from 11)
--24640--
--24640-- used_suppression:    151 dl-hack3-cond-1
==24640===24640== ERROR SUMMARY: 2493 errors from 2493 contexts (suppressed:
151 from 11)

Ben Bolker

2013-Jun-12 17:47 UTC

head link

[Rd] load/unload segfault puzzle

Ben Bolker <bbolker <at> gmail.com> writes:
> 
> 
>   Dear r-devel readers,
> 
>   I have a pretty deep problem with package loading and unloading in
> the development version of the lme4 package
> <https://github.com/lme4/lme4>; it's not boiled down to a
properly
> minimal example yet (this has been difficult), but I am posting anyway
> in the hopes that someone has ideas about how to proceed farther,
> since I'm nearly stumped. Apologies in advance for the long post.
> 
>    EXECUTIVE SUMMARY: after one cycle of loading, testing (e.g. by
> running example(lmer)) and unloading lme4, then loading and unloading
> the nlme package, re-loading and exercising lme4 becomes very
> unstable, leading eventually to a segmentation fault.  More detail is
> available at <https://github.com/lme4/lme4/issues/35> .  Because
it's
> a segmentation fault, exactly _where_ the crash happens varies a bit
> according to platform and precise incantation, but it seems I can
> always get a segfault eventually.
> 
 [snip]

  UPDATE: after some useful advice off-list, I tried with a
fully valgrind-instrumented version of R.  No suspicious memory
accesses occurred until the very end, right before the crash:

Attempt #1
loading lme4 
loaded DLLs: Rcpp RcppEigen minqa lme4 
detaching lme4 


==18150== Jump to the invalid address stated on the next line
==18150==    at 0x9E46D00: ???
==18150==    by 0x410AB8A: RunFinalizers (memory.c:1357)
==18150==    by 0x410D314: R_gc_internal (memory.c:2709)
==18150==    by 0x410E73C: Rf_allocVector (memory.c:2421)
==18150==    by 0x4153D30: ReadItem (serialize.c:1685)
==18150==    by 0x4152D6D: ReadBC1 (serialize.c:1825)
==18150==    by 0x4153983: ReadItem (serialize.c:1851)
==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
==18150==    by 0x41531A1: ReadItem (serialize.c:1599)
==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
==18150==  Address 0x9e46d00 is not stack'd, malloc'd or (recently)
free'd
==18150== 

  Does this suggest anything to anyone, or is it just the
final symptom of a mysterious problem that occurred earlier ... ?

Ben Bolker

2013-Jun-13 01:53 UTC

head link

[Rd] load/unload segfault puzzle

On 13-06-12 07:09 PM, Duncan Murdoch wrote:> On 13-06-12 1:47 PM, Ben Bolker wrote:
>> Ben Bolker <bbolker <at> gmail.com> writes:
>>
>>>
>>>
>>>    Dear r-devel readers,
>>>
>>>    I have a pretty deep problem with package loading and unloading
in
>>> the development version of the lme4 package
>>> <https://github.com/lme4/lme4>; it's not boiled down to a
properly
>>> minimal example yet (this has been difficult), but I am posting
anyway
>>> in the hopes that someone has ideas about how to proceed farther,
>>> since I'm nearly stumped. Apologies in advance for the long
post.
>>>
>>>     EXECUTIVE SUMMARY: after one cycle of loading, testing (e.g. by
>>> running example(lmer)) and unloading lme4, then loading and
unloading
>>> the nlme package, re-loading and exercising lme4 becomes very
>>> unstable, leading eventually to a segmentation fault.  More detail
is
>>> available at <https://github.com/lme4/lme4/issues/35> . 
Because it's
>>> a segmentation fault, exactly _where_ the crash happens varies a
bit
>>> according to platform and precise incantation, but it seems I can
>>> always get a segfault eventually.
>>>
>>
>>   [snip]
>>
>>    UPDATE: after some useful advice off-list, I tried with a
>> fully valgrind-instrumented version of R.  No suspicious memory
>> accesses occurred until the very end, right before the crash:
>>
>> Attempt #1
>> loading lme4
>> loaded DLLs: Rcpp RcppEigen minqa lme4
>> detaching lme4
>>
>>
>> ==18150== Jump to the invalid address stated on the next line
>> ==18150==    at 0x9E46D00: ???
>> ==18150==    by 0x410AB8A: RunFinalizers (memory.c:1357)
>> ==18150==    by 0x410D314: R_gc_internal (memory.c:2709)
>> ==18150==    by 0x410E73C: Rf_allocVector (memory.c:2421)
>> ==18150==    by 0x4153D30: ReadItem (serialize.c:1685)
>> ==18150==    by 0x4152D6D: ReadBC1 (serialize.c:1825)
>> ==18150==    by 0x4153983: ReadItem (serialize.c:1851)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531A1: ReadItem (serialize.c:1599)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==  Address 0x9e46d00 is not stack'd, malloc'd or
(recently)
>> free'd
>> ==18150=>>
>>    Does this suggest anything to anyone, or is it just the
>> final symptom of a mysterious problem that occurred earlier ... ?
> 
> One other suggestion: run under gctorture().  This will make it very
> slow, but might trigger the fault earlier.
> 
> Duncan Murdoch
> 
  Yes, thanks -- Bill Dunlap already suggested this.  Your and Bill's
warning about how slow gctorture makes things is correct -- I gave up
after running for 3.5 hours when it had gotten only partway through
loading the Matrix package; I will have to find a machine with a decent
cooling system (i.e. not my laptop) where I can replicate the error.
  I've just re-run the regular valgrind, with a fresh build right after
an SVN update.  I got exactly the same results as above.  We're
certainly *not* calling reg.finalizer() anywhere in our package, and I
don't think Rcpp or RcppEigen or minqa do ... there looks to be some
kind of default finalization done on the reference class objects (based
on running 'strings' on the object files ...
  I tried gdb'ing in and setting a breakpoint at memory.c:1357, but this
breakpoint gets hit a lot, and I'm sort of stabbing in the dark at this
point.

   Ben Bolker

Rory Winston

2013-Jun-13 10:55 UTC

head link

[Rd] load/unload segfault puzzle

Ben

Have you compiled R form source yourself? If so, I would be tempted to mark up
memory. c with some debug log statements - especially around line 1357, and
possibly inside the finalizers function as it attempts to run the C
finalizers....not pretty I know, but may be the quickest approach to quickly
identify whats failing...
Cheers
-- Rory

> 
>  Yes, thanks -- Bill Dunlap already suggested this.  Your and Bill's
> warning about how slow gctorture makes things is correct -- I gave up
> after running for 3.5 hours when it had gotten only partway through
> loading the Matrix package; I will have to find a machine with a decent
> cooling system (i.e. not my laptop) where I can replicate the error.
>  I've just re-run the regular valgrind, with a fresh build right after
> an SVN update.  I got exactly the same results as above.  We're
> certainly *not* calling reg.finalizer() anywhere in our package, and I
> don't think Rcpp or RcppEigen or minqa do ... there looks to be some
> kind of default finalization done on the reference class objects (based
> on running 'strings' on the object files ...
>  I tried gdb'ing in and setting a breakpoint at memory.c:1357, but this
> breakpoint gets hit a lot, and I'm sort of stabbing in the dark at this
> point.
> 
>   Ben Bolker
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-devel at r-project.org mailing list  DIGESTED
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> End of R-devel Digest, Vol 124, Issue 12
> ****************************************

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Jun 2013 - load/unload segfault puzzle

[Rd] load/unload segfault puzzle

[Rd] load/unload segfault puzzle

[Rd] load/unload segfault puzzle

[Rd] load/unload segfault puzzle

Apparently Analagous Threads