On 12/08/2018 6:32 PM, Rolf Turner wrote:> > On 12/08/18 17:42, Eric Berger wrote: > >> Hi Rolf, >> When faced with such a situation I take the following approach which >> often helps. >> Use the same setup that caused the seg fault (you need a reproducible >> problem.) >> Start your R session using valgrind. e.g. in linux I do: >> >> $ valgrind R >> >> Assuming that a seg fault still occurs then valgrind should provide info >> as to where. >> >> HTH > > Well, it probably *would* help if I weren't such a thicko. > > The story so far: I have managed to install valgrind (downloaded a > tarball and installed from source). Seemed to go OK, but: > > * when I type "valgrind" I get "command not found" > * however valgrind is in /usr/local/bin (I did "configure" with > prefix="/usr/local" so this is as it should be) > * /usr/local/bin/valgrind is executable > * /usr/local/bin is in my path > > So how in god's name can the command not be found? And why do these > things always happen to *me*??? > > I can work around this problem by giving the full path name, however. > > So I did: > > /usr/local/bin/valgrind RI believe on your system R is a script, so you can't run valgrind this way. It's just debugging bash, not R. You need to use R -d valgrind (though with your weird path problems, you might need a fully qualified /usr/local/bin/valgrind there). You run gdb the same way: R -d gdb and then give the command "r" to gdb to start R. It will give a report when you get the segfault. I don't know which report will be more informative. Duncan Murdoch> > and got a lot of (mysterious to me) output: > >> /usr/local/bin/valgrind R >> ==18051== Memcheck, a memory error detector >> ==18051== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. >> ==18051== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info >> ==18051== Command: /usr/local/bin/R >> ==18051=>> ==18051== Invalid free() / delete / delete[] / realloc() >> ==18051== at 0x4C2ECF0: free (vg_replace_malloc.c:530) >> ==18051== by 0x45E280: ??? (in /bin/bash) >> ==18051== by 0x45E42F: run_unwind_frame (in /bin/bash) >> ==18051== by 0x47B714: parse_and_execute (in /bin/bash) >> ==18051== by 0x47B102: ??? (in /bin/bash) >> ==18051== by 0x47B35C: source_file (in /bin/bash) >> ==18051== by 0x4849C7: source_builtin (in /bin/bash) >> ==18051== by 0x43378D: ??? (in /bin/bash) >> ==18051== by 0x43592C: ??? (in /bin/bash) >> ==18051== by 0x4369C7: execute_command_internal (in /bin/bash) >> ==18051== by 0x43851D: execute_command (in /bin/bash) >> ==18051== by 0x42139D: reader_loop (in /bin/bash) >> ==18051== Address 0x4241008 is in the brk data segment 0x4228000-0x4246fff >> ==18051=>> ==18051== Invalid free() / delete / delete[] / realloc() >> ==18051== at 0x4C2ECF0: free (vg_replace_malloc.c:530) >> ==18051== by 0x45E280: ??? (in /bin/bash) >> ==18051== by 0x45E42F: run_unwind_frame (in /bin/bash) >> ==18051== by 0x4849D3: source_builtin (in /bin/bash) >> ==18051== by 0x43378D: ??? (in /bin/bash) >> ==18051== by 0x43592C: ??? (in /bin/bash) >> ==18051== by 0x4369C7: execute_command_internal (in /bin/bash) >> ==18051== by 0x43851D: execute_command (in /bin/bash) >> ==18051== by 0x42139D: reader_loop (in /bin/bash) >> ==18051== by 0x41FDB0: main (in /bin/bash) >> ==18051== Address 0x4240708 is in the brk data segment 0x4228000-0x4246fff >> ==18051=>> > > Not at all clear to me what to make of this. Does it indicate problems > or memory leaks in my installation of R? Anyhow, things then proceed in > an expected manner: > >> R version 3.5.1 (2018-07-02) -- "Feather Spray" >> Copyright (C) 2018 The R Foundation for Statistical Computing >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> Natural language support but running in an English locale >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Loading required package: misc >> [Previously saved workspace restored] > > I then loaded the problematic package and issued the problematic command: > >>> library(hmm.discnp) >> hmm.discnp 2.0-9 >> >> This package has changed SUBSTANTIALLY from its >> previous release. Read the documentation >> carefully. Note in particular that the meaning of >> the argument "nsim" of the function rhmm() has >> changed, and a new argument "ylengths" now plays >> essentially the role previously played by >> "nsim". >> >>> xxx <- get.hgl(p3,2,yyy) >> >> *** caught segfault *** >> address (nil), cause 'unknown' >> Segmentation fault (core dumped) > > Nothing informative. Is there something else I should be doing? > > Sorry for being a nuisance, but I am at a loss. > > cheers, > > Rolf >
On 13/08/18 12:03, Duncan Murdoch wrote: <SNIP>>> So I did: >> >> /usr/local/bin/valgrind R > > I believe on your system R is a script, so you can't run valgrind this > way.? It's just debugging bash, not R.? You need to use > > R -d valgrind > > (though with your weird path problems, you might need a fully qualified > /usr/local/bin/valgrind there). > > You run gdb the same way: > > R -d gdb > > and then give the command "r" to gdb to start R.? It will give a report > when you get the segfault.? I don't know which report will be more > informative.<SNIP> Thanks Duncan. I did as you said with valgrind and got output that is probably more relevant. However it is still opaque to me. I have no idea how to use it to track down the error that I am making in the code.> xxx <- get.hgl(p3,2,yyy) > ==20088== Invalid read of size 8 > ==20088== at 0x5116CD: Rf_allocVector3 (memory.c:2539) > ==20088== by 0x4B40FF: Rf_allocVector (Rinlinedfuns.h:577) > ==20088== by 0x4B40FF: do_missing (envir.c:2265) > ==20088== by 0x4CA383: bcEval (eval.c:6801) > ==20088== by 0x4D99EF: Rf_eval (eval.c:624) > ==20088== by 0x4DB172: R_execClosure (eval.c:1773) > ==20088== by 0x4D0E6E: bcEval (eval.c:6749) > ==20088== by 0x4D99EF: Rf_eval (eval.c:624) > ==20088== by 0x4DB172: R_execClosure (eval.c:1773) > ==20088== by 0x4D0E6E: bcEval (eval.c:6749) > ==20088== by 0x4D99EF: Rf_eval (eval.c:624) > ==20088== by 0x4DB172: R_execClosure (eval.c:1773) > ==20088== by 0x4D99A1: Rf_eval (eval.c:747) > ==20088== Address 0x3fca86ccfb7de9cc is not stack'd, malloc'd or (recently) free'd > ==20088== > > *** caught segfault *** > address (nil), cause 'unknown' > ==20088== Invalid read of size 8 > ==20088== at 0x511B23: Rf_allocVector3 (memory.c:2691) > ==20088== by 0x49137A: Rf_allocVector (Rinlinedfuns.h:577) > ==20088== by 0x49137A: deparse1WithCutoff (deparse.c:268) > ==20088== by 0x492EAF: Rf_deparse1m (deparse.c:197) > ==20088== by 0x4BA99C: R_GetTraceback (errors.c:1409) > ==20088== by 0x5053CE: sigactionSegv (main.c:592) > ==20088== by 0x6CA738F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.23.so) > ==20088== by 0x5116CC: Rf_allocVector3 (memory.c:2539) > ==20088== Address 0x3fca86ccfb7de9cc is not stack'd, malloc'd or (recently) free'd > ==20088== > ==20088== > ==20088== Process terminating with default action of signal 11 (SIGSEGV) > ==20088== General Protection Fault > ==20088== at 0x511B23: Rf_allocVector3 (memory.c:2691) > ==20088== by 0x49137A: Rf_allocVector (Rinlinedfuns.h:577) > ==20088== by 0x49137A: deparse1WithCutoff (deparse.c:268) > ==20088== by 0x492EAF: Rf_deparse1m (deparse.c:197) > ==20088== by 0x4BA99C: R_GetTraceback (errors.c:1409) > ==20088== by 0x5053CE: sigactionSegv (main.c:592) > ==20088== by 0x6CA738F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.23.so) > ==20088== by 0x5116CC: Rf_allocVector3 (memory.c:2539) > ==20088== > ==20088== HEAP SUMMARY: > ==20088== in use at exit: 210,111,063 bytes in 57,981 blocks > ==20088== total heap usage: 106,693 allocs, 48,712 frees, 349,208,345 bytes allocated > ==20088== > ==20088== LEAK SUMMARY: > ==20088== definitely lost: 0 bytes in 0 blocks > ==20088== indirectly lost: 0 bytes in 0 blocks > ==20088== possibly lost: 0 bytes in 0 blocks > ==20088== still reachable: 210,111,063 bytes in 57,981 blocks > ==20088== of which reachable via heuristic: > ==20088== newarray : 4,264 bytes in 1 blocks > ==20088== suppressed: 0 bytes in 0 blocks > ==20088== Rerun with --leak-check=full to see details of leaked memory > ==20088== > ==20088== For counts of detected and suppressed errors, rerun with: -v > ==20088== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) > Segmentation fault (core dumped)Doesn't mean a thing to me, I'm afraid. Does it mean anything to you? I have not (yet) "rerun with: -v". I suspect that this would not help. I guess I'll try to get gdb going next, and see if that provides more lucid output. cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
OK everybody! You can relax. :-) I managed to spot the loony. After mucking around with valgrind, and before trying gdb, I had one more look at my code and *finally* saw the stupid thing that I had been doing. In the call to .Fortran() I had a line nphi=as.integer(nphi), but "nphi" was nowhere defined (!!!) in the R code. The name "nphi" appeared as an argument in the Fortran subroutine in question, but was nowhere actually *used*!!! It seems that passing a non-existent value as an argument to a Fortran subroutine can *sometimes* confuse it. Understandably. I think that this "nphi" was a left-over from an earlier version of the code. I must have changed the code so that nphi was no longer needed, but then forgot to remove it from some places. Psigh! I hate myself sometimes. Anyhow, thanks to all those who took the time and made the effort to try to help me. cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
On Mon, Aug 13, 2018 at 3:51 AM Rolf Turner <r.turner at auckland.ac.nz> wrote:> > > OK everybody! You can relax. :-) I managed to spot the loony. After > mucking around with valgrind, and before trying gdb, I had one more look > at my code and *finally* saw the stupid thing that I had been doing. > > In the call to .Fortran() I had a line > > nphi=as.integer(nphi), > > but "nphi" was nowhere defined (!!!) in the R code. The name "nphi" > appeared as an argument in the Fortran subroutine in question, but was > nowhere actually *used*!!!Didn't R CMD check pick this up, that is, didn't it report that 'nphi' is a "global" variable? /Henrik> > It seems that passing a non-existent value as an argument to a Fortran > subroutine can *sometimes* confuse it. Understandably. > > I think that this "nphi" was a left-over from an earlier version of the > code. I must have changed the code so that nphi was no longer needed, > but then forgot to remove it from some places. Psigh! I hate myself > sometimes. > > Anyhow, thanks to all those who took the time and made the effort to try > to help me. > > cheers, > > Rolf > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.