thr3ads.net - R devel - [Rd] Very hard to reproduce bug (?) in R-devel [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Winston Chang

2017-Apr-05 18:40 UTC

[Rd] Very hard to reproduce bug (?) in R-devel

)

On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>
> >>>>> Winston Chang <winstonchang1 at gmail.com>
> >>>>>     on Tue, 4 Apr 2017 15:29:40 -0500 writes:
>
>     > I've done some more investigation into the problem, and it is
very
>     > difficult to pin down. What it looks like is happening is roughly
like this:
>     > - `p` is an environment and `p$e` is also an environment.
>     > - There is a loop. In each iteration, it looks for one item in
`p$e`, saves
>     > it in a variable `x`, then removes that item from `p$e`. Then it
invokes
>     > `x()`. The loop runs again, until there are no more items in
`p$e`.
>
>     > The problem is that `ls(p$e)` sometimes returns the wrong values
-- it
>     > returns the values that it had in previous iterations of the loop.
The
>     > behavior is very touchy. Almost any change to the code will
slightly change
>     > the behavior; sometimes the `ls()` returns values from a different
>     > iteration of the loop, and sometimes the problem doesn't
happen at all.
>
>     > I've put a  Dockerfile and instructions for reproducing the
problem here:
>     > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
>
>     > I think that I've gotten about as far with this as I can,
though I'd be
>     > happy to provide more information if anyone wants to take look at
the
>     > problem.
>
> Dear Winston,
>
> While I agree this may very well be a bug in R(-devel), and hence
> also R in 3.4.0 alpha and hence quite important to be dealt with,
>
> your code still involves 3 non-trivial  packages (DBI, R6,
> testthat) some of which have their own C code and notably load
> a couple of other package's namespaces.
> We've always made a point
>       https://www.r-project.org/bugs.html
> that bugs in R should be reproducible without extra
> packages... and I think it would definitely help to pinpoint the
> issue to be seen outside of your extra packages' world.
>
> Or have you been aware of that and are just asking for help
> finding a bug in one of the extra packages involved, a bug that might only
be triggered by recent changes in R ?
>
> OTOH, what you describe above  (p ; p$e ; p$e$x ...)
> should be reproducible in pure "base" R code, right?
>
> I'm sorry not to be of more help
> Martin

Of the four packages that are loaded when running the tests (pool,
DBI, R6, testthat, magrittr, crayon), only testthat contains compiled
code, and it is pretty minimal. The only compiled code in testthat
that should be executed is a function that finds a label -- but that
happens only after an error occurs.

This is the sessionInfo():
R Under development (unstable) (2017-03-23 r72389)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] pool_0.1.0     DBI_0.6-1      testthat_1.0.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5   R6_2.2.0       crayon_1.3.2


I have spent days trying to make a minimal example, and that is the
best that I have been able to do so far. I would not involve all this
complexity if I could avoid it. The problem is that the behavior is
extremely sensitive to any changes. Seemingly-innocuous things like
removing other tests, or adding a cat() statement can make the error
disappear, or in some cases it makes ls() return values from a
different (previous) iteration of the loop.


In my testing, I have also seen a case where calls to `cat( ... ,
file=stderr())` result in output that has the wrong order. I don't
know if that's due to the cat() calls being executed in the wrong
order, or if it's simply being printed or buffered in the wrong order.

This is the code in question that cat()s to stderr:
https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90
      while (TRUE) {
        tasks <- sort(ls(private$scheduledTasks))
        if (length(tasks) == 0) break
        t <- tasks[[1]]

        s <- stderr()
        cat(tasks, "--1--\n", file = s)
        cat(ls(private$scheduledTasks), "--2--\n", file = s)
        cat(t, "--3--\n", file = s)

        task <- private$scheduledTasks[[t]]
        rm(list = t, envir = private$scheduledTasks)

        task()
      }

Without going into too much detail, it should print lines of text that
end with --1--, --2--, --3--, and repeat. Here's what it prints
instead when running the tests:

20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --2--
20170405-182549.466875-18 --3--
20170405-182559.456628-17 20170405-182729.456318-16 --1--
20170405-182559.456628-17 20170405-182729.456318-16 --2--
20170405-182559.456628-17 --3--
20170405-182729.456318-16 --1--
20170405-182729.456318-16 --2--
20170405-182729.456318-16 --3--
 --2--
20170405-182549.466875-18 --3--
1. Error: pool scheduler: schedules things in the right order
(@test-scheduling.R#13)
could not find function "task"
1: naiveScheduler$protect({
       scheduleTask(1e+05, function() {
           results <<- c(results, 3L)
       })
       scheduleTask(10000, function() {
           results <<- c(results, 2L)
       })
       scheduleTask(10, function() {
           results <<- c(results, 1L)
       })
   }) at testthat/test-scheduling.R:13
2: private$refCount$release() at testthat/test-scheduling.R:13
3: private$callback()

It's almost as though, in the middle of the first iteration of the
while loop, R jumps to the next iteration of the loop, runs the loop a
couple of times to completion, and then returns to the first iteration
of the loop at the place that it left.

This can be reproduced by following the instructions in this gist:
  https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88

Almost any change to the code will make the error disappear, or change
to a different one.


With regard to reproducing it in "base" R: I made a simple example
using just R (no other packages) that does something similar to what
happens in the tests, but even when I run it for 100,000 iterations,
the error doesn't occur.

I think there's a good chance that this is due to a bug in R. I have
been trying to track down the cause of the problem but haven't been
able find it.

-Winston

Robert McGehee

2017-Apr-05 19:24 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

Winston,
I had a similar experience to you tracking down an insanely difficult bug in my
R code that "disappeared" whenever slight changes were made to the
script (e.g. like adding cat() statements). In my case, it coincided with my
over-eager compilation of R and its library stack, as I was also experimenting
with a cutting edge version of the gcc compiler as well as what I thought were
innocuous performance enhancing CFLAGS like -O3/-Ofast -march=native. After
downgrading gcc and recompiling everything (R and BLAS) without the extra flags,
the problem went away. Not saying this is your problem, just sharing my similar
experience.

<TANGENT> And for anyone interested, I did extensive benchmarking on the
effects of the added CFLAGS and cutting edge gcc compilers, and I never saw any
significant performance enhancement, and frequently saw a big performance
penalty with extra flags, particularly as matrix algebra problems sometimes
slowed down enormously when using a custom BLAS (ATLAS) compiled with anything
fancy. Though nowadays, the out-of-the-box R BLAS seems much faster than it used
to be, so I don't even bother fiddling with a custom BLAS. </TANGENT>

--Robert

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Winston
Chang
Sent: Wednesday, April 05, 2017 2:41 PM
To: Martin Maechler <maechler at stat.math.ethz.ch>
Cc: R Development <R-devel at r-project.org>
Subject: Re: [Rd] Very hard to reproduce bug (?) in R-devel

)

On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>
> >>>>> Winston Chang <winstonchang1 at gmail.com>
> >>>>>     on Tue, 4 Apr 2017 15:29:40 -0500 writes:
>
>     > I've done some more investigation into the problem, and it is
very
>     > difficult to pin down. What it looks like is happening is roughly
like this:
>     > - `p` is an environment and `p$e` is also an environment.
>     > - There is a loop. In each iteration, it looks for one item in
`p$e`, saves
>     > it in a variable `x`, then removes that item from `p$e`. Then it
invokes
>     > `x()`. The loop runs again, until there are no more items in
`p$e`.
>
>     > The problem is that `ls(p$e)` sometimes returns the wrong values
-- it
>     > returns the values that it had in previous iterations of the loop.
The
>     > behavior is very touchy. Almost any change to the code will
slightly change
>     > the behavior; sometimes the `ls()` returns values from a different
>     > iteration of the loop, and sometimes the problem doesn't
happen at all.
>
>     > I've put a  Dockerfile and instructions for reproducing the
problem here:
>     > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
>
>     > I think that I've gotten about as far with this as I can,
though I'd be
>     > happy to provide more information if anyone wants to take look at
the
>     > problem.
>
> Dear Winston,
>
> While I agree this may very well be a bug in R(-devel), and hence
> also R in 3.4.0 alpha and hence quite important to be dealt with,
>
> your code still involves 3 non-trivial  packages (DBI, R6,
> testthat) some of which have their own C code and notably load
> a couple of other package's namespaces.
> We've always made a point
>       https://www.r-project.org/bugs.html
> that bugs in R should be reproducible without extra
> packages... and I think it would definitely help to pinpoint the
> issue to be seen outside of your extra packages' world.
>
> Or have you been aware of that and are just asking for help
> finding a bug in one of the extra packages involved, a bug that might only
be triggered by recent changes in R ?
>
> OTOH, what you describe above  (p ; p$e ; p$e$x ...)
> should be reproducible in pure "base" R code, right?
>
> I'm sorry not to be of more help
> Martin

Of the four packages that are loaded when running the tests (pool,
DBI, R6, testthat, magrittr, crayon), only testthat contains compiled
code, and it is pretty minimal. The only compiled code in testthat
that should be executed is a function that finds a label -- but that
happens only after an error occurs.

This is the sessionInfo():
R Under development (unstable) (2017-03-23 r72389)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] pool_0.1.0     DBI_0.6-1      testthat_1.0.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5   R6_2.2.0       crayon_1.3.2

I have spent days trying to make a minimal example, and that is the
best that I have been able to do so far. I would not involve all this
complexity if I could avoid it. The problem is that the behavior is
extremely sensitive to any changes. Seemingly-innocuous things like
removing other tests, or adding a cat() statement can make the error
disappear, or in some cases it makes ls() return values from a
different (previous) iteration of the loop.

In my testing, I have also seen a case where calls to `cat( ... ,
file=stderr())` result in output that has the wrong order. I don't
know if that's due to the cat() calls being executed in the wrong
order, or if it's simply being printed or buffered in the wrong order.

This is the code in question that cat()s to stderr:
https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90
      while (TRUE) {
        tasks <- sort(ls(private$scheduledTasks))
        if (length(tasks) == 0) break
        t <- tasks[[1]]

        s <- stderr()
        cat(tasks, "--1--\n", file = s)
        cat(ls(private$scheduledTasks), "--2--\n", file = s)
        cat(t, "--3--\n", file = s)

        task <- private$scheduledTasks[[t]]
        rm(list = t, envir = private$scheduledTasks)

        task()
      }

Without going into too much detail, it should print lines of text that
end with --1--, --2--, --3--, and repeat. Here's what it prints
instead when running the tests:

20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --2--
20170405-182549.466875-18 --3--
20170405-182559.456628-17 20170405-182729.456318-16 --1--
20170405-182559.456628-17 20170405-182729.456318-16 --2--
20170405-182559.456628-17 --3--
20170405-182729.456318-16 --1--
20170405-182729.456318-16 --2--
20170405-182729.456318-16 --3--
 --2--
20170405-182549.466875-18 --3--
1. Error: pool scheduler: schedules things in the right order
(@test-scheduling.R#13)
could not find function "task"
1: naiveScheduler$protect({
       scheduleTask(1e+05, function() {
           results <<- c(results, 3L)
       })
       scheduleTask(10000, function() {
           results <<- c(results, 2L)
       })
       scheduleTask(10, function() {
           results <<- c(results, 1L)
       })
   }) at testthat/test-scheduling.R:13
2: private$refCount$release() at testthat/test-scheduling.R:13
3: private$callback()

It's almost as though, in the middle of the first iteration of the
while loop, R jumps to the next iteration of the loop, runs the loop a
couple of times to completion, and then returns to the first iteration
of the loop at the place that it left.

This can be reproduced by following the instructions in this gist:
  https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88

Almost any change to the code will make the error disappear, or change
to a different one.

With regard to reproducing it in "base" R: I made a simple example
using just R (no other packages) that does something similar to what
happens in the tests, but even when I run it for 100,000 iterations,
the error doesn't occur.

I think there's a good chance that this is due to a bug in R. I have
been trying to track down the cause of the problem but haven't been
able find it.

-Winston

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Winston Chang

2017-Apr-05 20:46 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

On Wed, Apr 5, 2017 at 2:24 PM, Robert McGehee <rmcgehee at
walleyetrading.net>
wrote:
> Winston,
> I had a similar experience to you tracking down an insanely difficult bug
> in my R code that "disappeared" whenever slight changes were made
to the
> script (e.g. like adding cat() statements). In my case, it coincided with
> my over-eager compilation of R and its library stack, as I was also
> experimenting with a cutting edge version of the gcc compiler as well as
> what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast
> -march=native. After downgrading gcc and recompiling everything (R and
> BLAS) without the extra flags, the problem went away. Not saying this is
> your problem, just sharing my similar experience.
>
>Thanks Robert. I'm glad that I'm not the only one who's encountered
an
issue like this. "Insanely difficult" is an apt description. :)

I've been using the rocker/r-devel for testing. It compiles R with the
following CFLAGS:
  -g -O2 -fdebug-prefix-map=/build/r-base-3.3.3=. -fstack-protector-strong
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g
It looks like it gets those settings from running R CMD config CFLAGS with
the already-installed version of R (3.3.3) which comes from a .deb package.
  https://github.com/rocker-org/rocker/blob/master/r-devel/Dockerfile#L76


I've also compiled R (again, in Docker) and tested with that, and gotten
the same results. It basically uses just `./configure
--without-recommended-packages`
and then `make`.

	[[alternative HTML version deleted]]

Winston Chang

2017-Apr-05 21:30 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

I just tried recompiling R with no -O flag, and I still get the same error.
Here are the CFLAGS (the RD program runs R-devel instead of R 3.3.3):

# RD CMD config CFLAGS
-g -fdebug-prefix-map=/build/r-base-3.3.3=. -fstack-protector-strong
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g

-Winston


On Wed, Apr 5, 2017 at 2:24 PM, Robert McGehee <rmcgehee at
walleyetrading.net>
wrote:
> Winston,
> I had a similar experience to you tracking down an insanely difficult bug
> in my R code that "disappeared" whenever slight changes were made
to the
> script (e.g. like adding cat() statements). In my case, it coincided with
> my over-eager compilation of R and its library stack, as I was also
> experimenting with a cutting edge version of the gcc compiler as well as
> what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast
> -march=native. After downgrading gcc and recompiling everything (R and
> BLAS) without the extra flags, the problem went away. Not saying this is
> your problem, just sharing my similar experience.
>
> <TANGENT> And for anyone interested, I did extensive benchmarking on
the
> effects of the added CFLAGS and cutting edge gcc compilers, and I never saw
> any significant performance enhancement, and frequently saw a big
> performance penalty with extra flags, particularly as matrix algebra
> problems sometimes slowed down enormously when using a custom BLAS (ATLAS)
> compiled with anything fancy. Though nowadays, the out-of-the-box R BLAS
> seems much faster than it used to be, so I don't even bother fiddling
with
> a custom BLAS. </TANGENT>
>
> --Robert
>
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
Winston
> Chang
> Sent: Wednesday, April 05, 2017 2:41 PM
> To: Martin Maechler <maechler at stat.math.ethz.ch>
> Cc: R Development <R-devel at r-project.org>
> Subject: Re: [Rd] Very hard to reproduce bug (?) in R-devel
>
> )
>
> On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
> >
> > >>>>> Winston Chang <winstonchang1 at gmail.com>
> > >>>>>     on Tue, 4 Apr 2017 15:29:40 -0500 writes:
> >
> >     > I've done some more investigation into the problem, and
it is very
> >     > difficult to pin down. What it looks like is happening is
roughly
> like this:
> >     > - `p` is an environment and `p$e` is also an environment.
> >     > - There is a loop. In each iteration, it looks for one item
in
> `p$e`, saves
> >     > it in a variable `x`, then removes that item from `p$e`. Then
it
> invokes
> >     > `x()`. The loop runs again, until there are no more items in
`p$e`.
> >
> >     > The problem is that `ls(p$e)` sometimes returns the wrong
values
> -- it
> >     > returns the values that it had in previous iterations of the
loop.
> The
> >     > behavior is very touchy. Almost any change to the code will
> slightly change
> >     > the behavior; sometimes the `ls()` returns values from a
different
> >     > iteration of the loop, and sometimes the problem doesn't
happen at
> all.
> >
> >     > I've put a  Dockerfile and instructions for reproducing
the
> problem here:
> >     > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
> >
> >     > I think that I've gotten about as far with this as I can,
though
> I'd be
> >     > happy to provide more information if anyone wants to take
look at
> the
> >     > problem.
> >
> > Dear Winston,
> >
> > While I agree this may very well be a bug in R(-devel), and hence
> > also R in 3.4.0 alpha and hence quite important to be dealt with,
> >
> > your code still involves 3 non-trivial  packages (DBI, R6,
> > testthat) some of which have their own C code and notably load
> > a couple of other package's namespaces.
> > We've always made a point
> >       https://www.r-project.org/bugs.html
> > that bugs in R should be reproducible without extra
> > packages... and I think it would definitely help to pinpoint the
> > issue to be seen outside of your extra packages' world.
> >
> > Or have you been aware of that and are just asking for help
> > finding a bug in one of the extra packages involved, a bug that might
> only be triggered by recent changes in R ?
> >
> > OTOH, what you describe above  (p ; p$e ; p$e$x ...)
> > should be reproducible in pure "base" R code, right?
> >
> > I'm sorry not to be of more help
> > Martin
>
>
> Of the four packages that are loaded when running the tests (pool,
> DBI, R6, testthat, magrittr, crayon), only testthat contains compiled
> code, and it is pretty minimal. The only compiled code in testthat
> that should be executed is a function that finds a label -- but that
> happens only after an error occurs.
>
> This is the sessionInfo():
> R Under development (unstable) (2017-03-23 r72389)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 9 (stretch)
>
> Matrix products: default
> BLAS: /usr/local/lib/R/lib/libRblas.so
> LAPACK: /usr/local/lib/R/lib/libRlapack.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] pool_0.1.0     DBI_0.6-1      testthat_1.0.2
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0 magrittr_1.5   R6_2.2.0       crayon_1.3.2
>
>
> I have spent days trying to make a minimal example, and that is the
> best that I have been able to do so far. I would not involve all this
> complexity if I could avoid it. The problem is that the behavior is
> extremely sensitive to any changes. Seemingly-innocuous things like
> removing other tests, or adding a cat() statement can make the error
> disappear, or in some cases it makes ls() return values from a
> different (previous) iteration of the loop.
>
>
> In my testing, I have also seen a case where calls to `cat( ... ,
> file=stderr())` result in output that has the wrong order. I don't
> know if that's due to the cat() calls being executed in the wrong
> order, or if it's simply being printed or buffered in the wrong order.
>
> This is the code in question that cat()s to stderr:
> https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90
>       while (TRUE) {
>         tasks <- sort(ls(private$scheduledTasks))
>         if (length(tasks) == 0) break
>         t <- tasks[[1]]
>
>         s <- stderr()
>         cat(tasks, "--1--\n", file = s)
>         cat(ls(private$scheduledTasks), "--2--\n", file = s)
>         cat(t, "--3--\n", file = s)
>
>         task <- private$scheduledTasks[[t]]
>         rm(list = t, envir = private$scheduledTasks)
>
>         task()
>       }
>
> Without going into too much detail, it should print lines of text that
> end with --1--, --2--, --3--, and repeat. Here's what it prints
> instead when running the tests:
>
> 20170405-182549.466875-18 20170405-182559.456628-17
> 20170405-182729.456318-16 --1--
> 20170405-182549.466875-18 20170405-182559.456628-17
> 20170405-182729.456318-16 --1--
> 20170405-182549.466875-18 20170405-182559.456628-17
> 20170405-182729.456318-16 --2--
> 20170405-182549.466875-18 --3--
> 20170405-182559.456628-17 20170405-182729.456318-16 --1--
> 20170405-182559.456628-17 20170405-182729.456318-16 --2--
> 20170405-182559.456628-17 --3--
> 20170405-182729.456318-16 --1--
> 20170405-182729.456318-16 --2--
> 20170405-182729.456318-16 --3--
>  --2--
> 20170405-182549.466875-18 --3--
> 1. Error: pool scheduler: schedules things in the right order
> (@test-scheduling.R#13)
> could not find function "task"
> 1: naiveScheduler$protect({
>        scheduleTask(1e+05, function() {
>            results <<- c(results, 3L)
>        })
>        scheduleTask(10000, function() {
>            results <<- c(results, 2L)
>        })
>        scheduleTask(10, function() {
>            results <<- c(results, 1L)
>        })
>    }) at testthat/test-scheduling.R:13
> 2: private$refCount$release() at testthat/test-scheduling.R:13
> 3: private$callback()
>
> It's almost as though, in the middle of the first iteration of the
> while loop, R jumps to the next iteration of the loop, runs the loop a
> couple of times to completion, and then returns to the first iteration
> of the loop at the place that it left.
>
> This can be reproduced by following the instructions in this gist:
>   https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
>
> Almost any change to the code will make the error disappear, or change
> to a different one.
>
>
> With regard to reproducing it in "base" R: I made a simple
example
> using just R (no other packages) that does something similar to what
> happens in the tests, but even when I run it for 100,000 iterations,
> the error doesn't occur.
>
> I think there's a good chance that this is due to a bug in R. I have
> been trying to track down the cause of the problem but haven't been
> able find it.
>
> -Winston
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

peter dalgaard

2017-Apr-05 21:54 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

> On 05 Apr 2017, at 20:40 , Winston Chang <winstonchang1 at gmail.com>
wrote:
> 
> I think there's a good chance that this is due to a bug in R. I have
> been trying to track down the cause of the problem but haven't been
> able find it.
> 
> -Winston
Apologies in advance if this is just stating the obvious, but let me try and put
some general ideas  on the table.

- is anything non-deterministic involved? (Doesn't sound so, but...)
- could it be something with the bytecompiler?
- can you get something (_anything_) to trigger the bug (in any variant) when
running R under gdb? I'm thinking gctorture() etc.
- it is odd that you cannot immediately get the same behaviour with R -d gdb or
valgrind. Are you sure you are actually running the same script in the same way?
- if you can get a hold of something inside gdb, then there should be some
potential for backtracking using hardware watchpoints and such. As in: This
memory location doesn't contain the value I expected; what changed it?

-pd


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Uwe Ligges

2017-Apr-05 22:36 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

On 05.04.2017 23:54, peter dalgaard wrote:>
>> On 05 Apr 2017, at 20:40 , Winston Chang <winstonchang1 at
gmail.com> wrote:
>>
>> I think there's a good chance that this is due to a bug in R. I
have
>> been trying to track down the cause of the problem but haven't been
>> able find it.
>>
>> -Winston
>
> Apologies in advance if this is just stating the obvious, but let me try
and put some general ideas  on the table.
>
> - is anything non-deterministic involved? (Doesn't sound so, but...)
> - could it be something with the bytecompiler?
Also my suspicion, can you try without having JIT enabled?

Best,
Uwe Ligges


> - can you get something (_anything_) to trigger the bug (in any variant)
when running R under gdb? I'm thinking gctorture() etc.
> - it is odd that you cannot immediately get the same behaviour with R -d
gdb or valgrind. Are you sure you are actually running the same script in the
same way?
> - if you can get a hold of something inside gdb, then there should be some
potential for backtracking using hardware watchpoints and such. As in: This
memory location doesn't contain the value I expected; what changed it?
>
> -pd
>
>

Winston Chang

2017-Apr-06 00:34 UTC

head link

[Rd] Very hard to reproduce bug (?) in R-devel

>
> Apologies in advance if this is just stating the obvious, but let me try
> and put some general ideas  on the table.

These are great ideas, thanks.


> - is anything non-deterministic involved? (Doesn't sound so, but...)
>
There was an environment where items were added, and the names of the items
had timestamps. However, I just modified that code to use deterministic
names and the error still happened.


- could it be something with the bytecompiler?>
I've tried two things. The first was to install the pool package with
--no-byte-compile. In this case the error still happens.

The second was to compile R with `./configure
--enable-byte-compiled-packages=no`.
When I do this, the error does NOT happen. I've tried varying the pool code
in a few different ways to try to provoke the error, but I have not been
able to get it to happen. So it is possible that the compiled base R
packages play some part here.


> - can you get something (_anything_) to trigger the bug (in any variant)
> when running R under gdb? I'm thinking gctorture() etc.
>
Some variations of the code will error without gdb, but will not error with
gdb. I twiddled with the code a bit and now the current version of the code
(f97cfdf) will error under gdb.

I've also run it with gctorture(T) and have not seen this error with that
enabled, but I haven't tested it extensively in this mode. (In a previous
email I mentioned that with gctorture on, I got three different errors in
the tests. I later found that these errors were due to tests having
incorrect assumptions. For example, one test called gc() and expected a
warning, but it incorrectly assumed that a GC event would not have occurred
slightly earlier.)


> - it is odd that you cannot immediately get the same behaviour with R -d
> gdb or valgrind. Are you sure you are actually running the same script in
> the same way?
>
Some versions of the code, but not all, will give the same error under gdb
and valgrind. See above.


> - if you can get a hold of something inside gdb, then there should be some
> potential for backtracking using hardware watchpoints and such. As in: This
> memory location doesn't contain the value I expected; what changed it?
>
I probably don't know enough about R internals or gdb to be useful here.
But if someone wants to try it out, reproducing it as simple as copying and
pasting the first two blocks of code from the README here (assuming you
have Docker installed):
  https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
It will build a Docker image with the appropriate software installed, and
then run the tests. The README also shows how to run it with gdb.

-Winston

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Apr 2017 - Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

[Rd] Very hard to reproduce bug (?) in R-devel

Possibly Parallel Threads