) On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> > >>>>> Winston Chang <winstonchang1 at gmail.com> > >>>>> on Tue, 4 Apr 2017 15:29:40 -0500 writes: > > > I've done some more investigation into the problem, and it is very > > difficult to pin down. What it looks like is happening is roughly like this: > > - `p` is an environment and `p$e` is also an environment. > > - There is a loop. In each iteration, it looks for one item in `p$e`, saves > > it in a variable `x`, then removes that item from `p$e`. Then it invokes > > `x()`. The loop runs again, until there are no more items in `p$e`. > > > The problem is that `ls(p$e)` sometimes returns the wrong values -- it > > returns the values that it had in previous iterations of the loop. The > > behavior is very touchy. Almost any change to the code will slightly change > > the behavior; sometimes the `ls()` returns values from a different > > iteration of the loop, and sometimes the problem doesn't happen at all. > > > I've put a Dockerfile and instructions for reproducing the problem here: > > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 > > > I think that I've gotten about as far with this as I can, though I'd be > > happy to provide more information if anyone wants to take look at the > > problem. > > Dear Winston, > > While I agree this may very well be a bug in R(-devel), and hence > also R in 3.4.0 alpha and hence quite important to be dealt with, > > your code still involves 3 non-trivial packages (DBI, R6, > testthat) some of which have their own C code and notably load > a couple of other package's namespaces. > We've always made a point > https://www.r-project.org/bugs.html > that bugs in R should be reproducible without extra > packages... and I think it would definitely help to pinpoint the > issue to be seen outside of your extra packages' world. > > Or have you been aware of that and are just asking for help > finding a bug in one of the extra packages involved, a bug that might only be triggered by recent changes in R ? > > OTOH, what you describe above (p ; p$e ; p$e$x ...) > should be reproducible in pure "base" R code, right? > > I'm sorry not to be of more help > MartinOf the four packages that are loaded when running the tests (pool, DBI, R6, testthat, magrittr, crayon), only testthat contains compiled code, and it is pretty minimal. The only compiled code in testthat that should be executed is a function that finds a label -- but that happens only after an error occurs. This is the sessionInfo(): R Under development (unstable) (2017-03-23 r72389) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/local/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] pool_0.1.0 DBI_0.6-1 testthat_1.0.2 loaded via a namespace (and not attached): [1] compiler_3.5.0 magrittr_1.5 R6_2.2.0 crayon_1.3.2 I have spent days trying to make a minimal example, and that is the best that I have been able to do so far. I would not involve all this complexity if I could avoid it. The problem is that the behavior is extremely sensitive to any changes. Seemingly-innocuous things like removing other tests, or adding a cat() statement can make the error disappear, or in some cases it makes ls() return values from a different (previous) iteration of the loop. In my testing, I have also seen a case where calls to `cat( ... , file=stderr())` result in output that has the wrong order. I don't know if that's due to the cat() calls being executed in the wrong order, or if it's simply being printed or buffered in the wrong order. This is the code in question that cat()s to stderr: https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90 while (TRUE) { tasks <- sort(ls(private$scheduledTasks)) if (length(tasks) == 0) break t <- tasks[[1]] s <- stderr() cat(tasks, "--1--\n", file = s) cat(ls(private$scheduledTasks), "--2--\n", file = s) cat(t, "--3--\n", file = s) task <- private$scheduledTasks[[t]] rm(list = t, envir = private$scheduledTasks) task() } Without going into too much detail, it should print lines of text that end with --1--, --2--, --3--, and repeat. Here's what it prints instead when running the tests: 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --2-- 20170405-182549.466875-18 --3-- 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182559.456628-17 20170405-182729.456318-16 --2-- 20170405-182559.456628-17 --3-- 20170405-182729.456318-16 --1-- 20170405-182729.456318-16 --2-- 20170405-182729.456318-16 --3-- --2-- 20170405-182549.466875-18 --3-- 1. Error: pool scheduler: schedules things in the right order (@test-scheduling.R#13) could not find function "task" 1: naiveScheduler$protect({ scheduleTask(1e+05, function() { results <<- c(results, 3L) }) scheduleTask(10000, function() { results <<- c(results, 2L) }) scheduleTask(10, function() { results <<- c(results, 1L) }) }) at testthat/test-scheduling.R:13 2: private$refCount$release() at testthat/test-scheduling.R:13 3: private$callback() It's almost as though, in the middle of the first iteration of the while loop, R jumps to the next iteration of the loop, runs the loop a couple of times to completion, and then returns to the first iteration of the loop at the place that it left. This can be reproduced by following the instructions in this gist: https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 Almost any change to the code will make the error disappear, or change to a different one. With regard to reproducing it in "base" R: I made a simple example using just R (no other packages) that does something similar to what happens in the tests, but even when I run it for 100,000 iterations, the error doesn't occur. I think there's a good chance that this is due to a bug in R. I have been trying to track down the cause of the problem but haven't been able find it. -Winston
Winston, I had a similar experience to you tracking down an insanely difficult bug in my R code that "disappeared" whenever slight changes were made to the script (e.g. like adding cat() statements). In my case, it coincided with my over-eager compilation of R and its library stack, as I was also experimenting with a cutting edge version of the gcc compiler as well as what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast -march=native. After downgrading gcc and recompiling everything (R and BLAS) without the extra flags, the problem went away. Not saying this is your problem, just sharing my similar experience. <TANGENT> And for anyone interested, I did extensive benchmarking on the effects of the added CFLAGS and cutting edge gcc compilers, and I never saw any significant performance enhancement, and frequently saw a big performance penalty with extra flags, particularly as matrix algebra problems sometimes slowed down enormously when using a custom BLAS (ATLAS) compiled with anything fancy. Though nowadays, the out-of-the-box R BLAS seems much faster than it used to be, so I don't even bother fiddling with a custom BLAS. </TANGENT> --Robert -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Winston Chang Sent: Wednesday, April 05, 2017 2:41 PM To: Martin Maechler <maechler at stat.math.ethz.ch> Cc: R Development <R-devel at r-project.org> Subject: Re: [Rd] Very hard to reproduce bug (?) in R-devel ) On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> > >>>>> Winston Chang <winstonchang1 at gmail.com> > >>>>> on Tue, 4 Apr 2017 15:29:40 -0500 writes: > > > I've done some more investigation into the problem, and it is very > > difficult to pin down. What it looks like is happening is roughly like this: > > - `p` is an environment and `p$e` is also an environment. > > - There is a loop. In each iteration, it looks for one item in `p$e`, saves > > it in a variable `x`, then removes that item from `p$e`. Then it invokes > > `x()`. The loop runs again, until there are no more items in `p$e`. > > > The problem is that `ls(p$e)` sometimes returns the wrong values -- it > > returns the values that it had in previous iterations of the loop. The > > behavior is very touchy. Almost any change to the code will slightly change > > the behavior; sometimes the `ls()` returns values from a different > > iteration of the loop, and sometimes the problem doesn't happen at all. > > > I've put a Dockerfile and instructions for reproducing the problem here: > > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 > > > I think that I've gotten about as far with this as I can, though I'd be > > happy to provide more information if anyone wants to take look at the > > problem. > > Dear Winston, > > While I agree this may very well be a bug in R(-devel), and hence > also R in 3.4.0 alpha and hence quite important to be dealt with, > > your code still involves 3 non-trivial packages (DBI, R6, > testthat) some of which have their own C code and notably load > a couple of other package's namespaces. > We've always made a point > https://www.r-project.org/bugs.html > that bugs in R should be reproducible without extra > packages... and I think it would definitely help to pinpoint the > issue to be seen outside of your extra packages' world. > > Or have you been aware of that and are just asking for help > finding a bug in one of the extra packages involved, a bug that might only be triggered by recent changes in R ? > > OTOH, what you describe above (p ; p$e ; p$e$x ...) > should be reproducible in pure "base" R code, right? > > I'm sorry not to be of more help > MartinOf the four packages that are loaded when running the tests (pool, DBI, R6, testthat, magrittr, crayon), only testthat contains compiled code, and it is pretty minimal. The only compiled code in testthat that should be executed is a function that finds a label -- but that happens only after an error occurs. This is the sessionInfo(): R Under development (unstable) (2017-03-23 r72389) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/local/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] pool_0.1.0 DBI_0.6-1 testthat_1.0.2 loaded via a namespace (and not attached): [1] compiler_3.5.0 magrittr_1.5 R6_2.2.0 crayon_1.3.2 I have spent days trying to make a minimal example, and that is the best that I have been able to do so far. I would not involve all this complexity if I could avoid it. The problem is that the behavior is extremely sensitive to any changes. Seemingly-innocuous things like removing other tests, or adding a cat() statement can make the error disappear, or in some cases it makes ls() return values from a different (previous) iteration of the loop. In my testing, I have also seen a case where calls to `cat( ... , file=stderr())` result in output that has the wrong order. I don't know if that's due to the cat() calls being executed in the wrong order, or if it's simply being printed or buffered in the wrong order. This is the code in question that cat()s to stderr: https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90 while (TRUE) { tasks <- sort(ls(private$scheduledTasks)) if (length(tasks) == 0) break t <- tasks[[1]] s <- stderr() cat(tasks, "--1--\n", file = s) cat(ls(private$scheduledTasks), "--2--\n", file = s) cat(t, "--3--\n", file = s) task <- private$scheduledTasks[[t]] rm(list = t, envir = private$scheduledTasks) task() } Without going into too much detail, it should print lines of text that end with --1--, --2--, --3--, and repeat. Here's what it prints instead when running the tests: 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182549.466875-18 20170405-182559.456628-17 20170405-182729.456318-16 --2-- 20170405-182549.466875-18 --3-- 20170405-182559.456628-17 20170405-182729.456318-16 --1-- 20170405-182559.456628-17 20170405-182729.456318-16 --2-- 20170405-182559.456628-17 --3-- 20170405-182729.456318-16 --1-- 20170405-182729.456318-16 --2-- 20170405-182729.456318-16 --3-- --2-- 20170405-182549.466875-18 --3-- 1. Error: pool scheduler: schedules things in the right order (@test-scheduling.R#13) could not find function "task" 1: naiveScheduler$protect({ scheduleTask(1e+05, function() { results <<- c(results, 3L) }) scheduleTask(10000, function() { results <<- c(results, 2L) }) scheduleTask(10, function() { results <<- c(results, 1L) }) }) at testthat/test-scheduling.R:13 2: private$refCount$release() at testthat/test-scheduling.R:13 3: private$callback() It's almost as though, in the middle of the first iteration of the while loop, R jumps to the next iteration of the loop, runs the loop a couple of times to completion, and then returns to the first iteration of the loop at the place that it left. This can be reproduced by following the instructions in this gist: https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 Almost any change to the code will make the error disappear, or change to a different one. With regard to reproducing it in "base" R: I made a simple example using just R (no other packages) that does something similar to what happens in the tests, but even when I run it for 100,000 iterations, the error doesn't occur. I think there's a good chance that this is due to a bug in R. I have been trying to track down the cause of the problem but haven't been able find it. -Winston ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, Apr 5, 2017 at 2:24 PM, Robert McGehee <rmcgehee at walleyetrading.net> wrote:> Winston, > I had a similar experience to you tracking down an insanely difficult bug > in my R code that "disappeared" whenever slight changes were made to the > script (e.g. like adding cat() statements). In my case, it coincided with > my over-eager compilation of R and its library stack, as I was also > experimenting with a cutting edge version of the gcc compiler as well as > what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast > -march=native. After downgrading gcc and recompiling everything (R and > BLAS) without the extra flags, the problem went away. Not saying this is > your problem, just sharing my similar experience. > >Thanks Robert. I'm glad that I'm not the only one who's encountered an issue like this. "Insanely difficult" is an apt description. :) I've been using the rocker/r-devel for testing. It compiles R with the following CFLAGS: -g -O2 -fdebug-prefix-map=/build/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g It looks like it gets those settings from running R CMD config CFLAGS with the already-installed version of R (3.3.3) which comes from a .deb package. https://github.com/rocker-org/rocker/blob/master/r-devel/Dockerfile#L76 I've also compiled R (again, in Docker) and tested with that, and gotten the same results. It basically uses just `./configure --without-recommended-packages` and then `make`. [[alternative HTML version deleted]]
I just tried recompiling R with no -O flag, and I still get the same error. Here are the CFLAGS (the RD program runs R-devel instead of R 3.3.3): # RD CMD config CFLAGS -g -fdebug-prefix-map=/build/r-base-3.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -Winston On Wed, Apr 5, 2017 at 2:24 PM, Robert McGehee <rmcgehee at walleyetrading.net> wrote:> Winston, > I had a similar experience to you tracking down an insanely difficult bug > in my R code that "disappeared" whenever slight changes were made to the > script (e.g. like adding cat() statements). In my case, it coincided with > my over-eager compilation of R and its library stack, as I was also > experimenting with a cutting edge version of the gcc compiler as well as > what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast > -march=native. After downgrading gcc and recompiling everything (R and > BLAS) without the extra flags, the problem went away. Not saying this is > your problem, just sharing my similar experience. > > <TANGENT> And for anyone interested, I did extensive benchmarking on the > effects of the added CFLAGS and cutting edge gcc compilers, and I never saw > any significant performance enhancement, and frequently saw a big > performance penalty with extra flags, particularly as matrix algebra > problems sometimes slowed down enormously when using a custom BLAS (ATLAS) > compiled with anything fancy. Though nowadays, the out-of-the-box R BLAS > seems much faster than it used to be, so I don't even bother fiddling with > a custom BLAS. </TANGENT> > > --Robert > > > -----Original Message----- > From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Winston > Chang > Sent: Wednesday, April 05, 2017 2:41 PM > To: Martin Maechler <maechler at stat.math.ethz.ch> > Cc: R Development <R-devel at r-project.org> > Subject: Re: [Rd] Very hard to reproduce bug (?) in R-devel > > ) > > On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler > <maechler at stat.math.ethz.ch> wrote: > > > > >>>>> Winston Chang <winstonchang1 at gmail.com> > > >>>>> on Tue, 4 Apr 2017 15:29:40 -0500 writes: > > > > > I've done some more investigation into the problem, and it is very > > > difficult to pin down. What it looks like is happening is roughly > like this: > > > - `p` is an environment and `p$e` is also an environment. > > > - There is a loop. In each iteration, it looks for one item in > `p$e`, saves > > > it in a variable `x`, then removes that item from `p$e`. Then it > invokes > > > `x()`. The loop runs again, until there are no more items in `p$e`. > > > > > The problem is that `ls(p$e)` sometimes returns the wrong values > -- it > > > returns the values that it had in previous iterations of the loop. > The > > > behavior is very touchy. Almost any change to the code will > slightly change > > > the behavior; sometimes the `ls()` returns values from a different > > > iteration of the loop, and sometimes the problem doesn't happen at > all. > > > > > I've put a Dockerfile and instructions for reproducing the > problem here: > > > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 > > > > > I think that I've gotten about as far with this as I can, though > I'd be > > > happy to provide more information if anyone wants to take look at > the > > > problem. > > > > Dear Winston, > > > > While I agree this may very well be a bug in R(-devel), and hence > > also R in 3.4.0 alpha and hence quite important to be dealt with, > > > > your code still involves 3 non-trivial packages (DBI, R6, > > testthat) some of which have their own C code and notably load > > a couple of other package's namespaces. > > We've always made a point > > https://www.r-project.org/bugs.html > > that bugs in R should be reproducible without extra > > packages... and I think it would definitely help to pinpoint the > > issue to be seen outside of your extra packages' world. > > > > Or have you been aware of that and are just asking for help > > finding a bug in one of the extra packages involved, a bug that might > only be triggered by recent changes in R ? > > > > OTOH, what you describe above (p ; p$e ; p$e$x ...) > > should be reproducible in pure "base" R code, right? > > > > I'm sorry not to be of more help > > Martin > > > Of the four packages that are loaded when running the tests (pool, > DBI, R6, testthat, magrittr, crayon), only testthat contains compiled > code, and it is pretty minimal. The only compiled code in testthat > that should be executed is a function that finds a label -- but that > happens only after an error occurs. > > This is the sessionInfo(): > R Under development (unstable) (2017-03-23 r72389) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 9 (stretch) > > Matrix products: default > BLAS: /usr/local/lib/R/lib/libRblas.so > LAPACK: /usr/local/lib/R/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pool_0.1.0 DBI_0.6-1 testthat_1.0.2 > > loaded via a namespace (and not attached): > [1] compiler_3.5.0 magrittr_1.5 R6_2.2.0 crayon_1.3.2 > > > I have spent days trying to make a minimal example, and that is the > best that I have been able to do so far. I would not involve all this > complexity if I could avoid it. The problem is that the behavior is > extremely sensitive to any changes. Seemingly-innocuous things like > removing other tests, or adding a cat() statement can make the error > disappear, or in some cases it makes ls() return values from a > different (previous) iteration of the loop. > > > In my testing, I have also seen a case where calls to `cat( ... , > file=stderr())` result in output that has the wrong order. I don't > know if that's due to the cat() calls being executed in the wrong > order, or if it's simply being printed or buffered in the wrong order. > > This is the code in question that cat()s to stderr: > https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90 > while (TRUE) { > tasks <- sort(ls(private$scheduledTasks)) > if (length(tasks) == 0) break > t <- tasks[[1]] > > s <- stderr() > cat(tasks, "--1--\n", file = s) > cat(ls(private$scheduledTasks), "--2--\n", file = s) > cat(t, "--3--\n", file = s) > > task <- private$scheduledTasks[[t]] > rm(list = t, envir = private$scheduledTasks) > > task() > } > > Without going into too much detail, it should print lines of text that > end with --1--, --2--, --3--, and repeat. Here's what it prints > instead when running the tests: > > 20170405-182549.466875-18 20170405-182559.456628-17 > 20170405-182729.456318-16 --1-- > 20170405-182549.466875-18 20170405-182559.456628-17 > 20170405-182729.456318-16 --1-- > 20170405-182549.466875-18 20170405-182559.456628-17 > 20170405-182729.456318-16 --2-- > 20170405-182549.466875-18 --3-- > 20170405-182559.456628-17 20170405-182729.456318-16 --1-- > 20170405-182559.456628-17 20170405-182729.456318-16 --2-- > 20170405-182559.456628-17 --3-- > 20170405-182729.456318-16 --1-- > 20170405-182729.456318-16 --2-- > 20170405-182729.456318-16 --3-- > --2-- > 20170405-182549.466875-18 --3-- > 1. Error: pool scheduler: schedules things in the right order > (@test-scheduling.R#13) > could not find function "task" > 1: naiveScheduler$protect({ > scheduleTask(1e+05, function() { > results <<- c(results, 3L) > }) > scheduleTask(10000, function() { > results <<- c(results, 2L) > }) > scheduleTask(10, function() { > results <<- c(results, 1L) > }) > }) at testthat/test-scheduling.R:13 > 2: private$refCount$release() at testthat/test-scheduling.R:13 > 3: private$callback() > > It's almost as though, in the middle of the first iteration of the > while loop, R jumps to the next iteration of the loop, runs the loop a > couple of times to completion, and then returns to the first iteration > of the loop at the place that it left. > > This can be reproduced by following the instructions in this gist: > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 > > Almost any change to the code will make the error disappear, or change > to a different one. > > > With regard to reproducing it in "base" R: I made a simple example > using just R (no other packages) that does something similar to what > happens in the tests, but even when I run it for 100,000 iterations, > the error doesn't occur. > > I think there's a good chance that this is due to a bug in R. I have > been trying to track down the cause of the problem but haven't been > able find it. > > -Winston > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
> On 05 Apr 2017, at 20:40 , Winston Chang <winstonchang1 at gmail.com> wrote: > > I think there's a good chance that this is due to a bug in R. I have > been trying to track down the cause of the problem but haven't been > able find it. > > -WinstonApologies in advance if this is just stating the obvious, but let me try and put some general ideas on the table. - is anything non-deterministic involved? (Doesn't sound so, but...) - could it be something with the bytecompiler? - can you get something (_anything_) to trigger the bug (in any variant) when running R under gdb? I'm thinking gctorture() etc. - it is odd that you cannot immediately get the same behaviour with R -d gdb or valgrind. Are you sure you are actually running the same script in the same way? - if you can get a hold of something inside gdb, then there should be some potential for backtracking using hardware watchpoints and such. As in: This memory location doesn't contain the value I expected; what changed it? -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 05.04.2017 23:54, peter dalgaard wrote:> >> On 05 Apr 2017, at 20:40 , Winston Chang <winstonchang1 at gmail.com> wrote: >> >> I think there's a good chance that this is due to a bug in R. I have >> been trying to track down the cause of the problem but haven't been >> able find it. >> >> -Winston > > Apologies in advance if this is just stating the obvious, but let me try and put some general ideas on the table. > > - is anything non-deterministic involved? (Doesn't sound so, but...) > - could it be something with the bytecompiler?Also my suspicion, can you try without having JIT enabled? Best, Uwe Ligges> - can you get something (_anything_) to trigger the bug (in any variant) when running R under gdb? I'm thinking gctorture() etc. > - it is odd that you cannot immediately get the same behaviour with R -d gdb or valgrind. Are you sure you are actually running the same script in the same way? > - if you can get a hold of something inside gdb, then there should be some potential for backtracking using hardware watchpoints and such. As in: This memory location doesn't contain the value I expected; what changed it? > > -pd > >
> > Apologies in advance if this is just stating the obvious, but let me try > and put some general ideas on the table.These are great ideas, thanks.> - is anything non-deterministic involved? (Doesn't sound so, but...) >There was an environment where items were added, and the names of the items had timestamps. However, I just modified that code to use deterministic names and the error still happened. - could it be something with the bytecompiler?>I've tried two things. The first was to install the pool package with --no-byte-compile. In this case the error still happens. The second was to compile R with `./configure --enable-byte-compiled-packages=no`. When I do this, the error does NOT happen. I've tried varying the pool code in a few different ways to try to provoke the error, but I have not been able to get it to happen. So it is possible that the compiled base R packages play some part here.> - can you get something (_anything_) to trigger the bug (in any variant) > when running R under gdb? I'm thinking gctorture() etc. >Some variations of the code will error without gdb, but will not error with gdb. I twiddled with the code a bit and now the current version of the code (f97cfdf) will error under gdb. I've also run it with gctorture(T) and have not seen this error with that enabled, but I haven't tested it extensively in this mode. (In a previous email I mentioned that with gctorture on, I got three different errors in the tests. I later found that these errors were due to tests having incorrect assumptions. For example, one test called gc() and expected a warning, but it incorrectly assumed that a GC event would not have occurred slightly earlier.)> - it is odd that you cannot immediately get the same behaviour with R -d > gdb or valgrind. Are you sure you are actually running the same script in > the same way? >Some versions of the code, but not all, will give the same error under gdb and valgrind. See above.> - if you can get a hold of something inside gdb, then there should be some > potential for backtracking using hardware watchpoints and such. As in: This > memory location doesn't contain the value I expected; what changed it? >I probably don't know enough about R internals or gdb to be useful here. But if someone wants to try it out, reproducing it as simple as copying and pasting the first two blocks of code from the README here (assuming you have Docker installed): https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88 It will build a Docker image with the appropriate software installed, and then run the tests. The README also shows how to run it with gdb. -Winston [[alternative HTML version deleted]]