)
On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>
> >>>>> Winston Chang <winstonchang1 at gmail.com>
> >>>>> on Tue, 4 Apr 2017 15:29:40 -0500 writes:
>
> > I've done some more investigation into the problem, and it is
very
> > difficult to pin down. What it looks like is happening is roughly
like this:
> > - `p` is an environment and `p$e` is also an environment.
> > - There is a loop. In each iteration, it looks for one item in
`p$e`, saves
> > it in a variable `x`, then removes that item from `p$e`. Then it
invokes
> > `x()`. The loop runs again, until there are no more items in
`p$e`.
>
> > The problem is that `ls(p$e)` sometimes returns the wrong values
-- it
> > returns the values that it had in previous iterations of the loop.
The
> > behavior is very touchy. Almost any change to the code will
slightly change
> > the behavior; sometimes the `ls()` returns values from a different
> > iteration of the loop, and sometimes the problem doesn't
happen at all.
>
> > I've put a Dockerfile and instructions for reproducing the
problem here:
> > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
>
> > I think that I've gotten about as far with this as I can,
though I'd be
> > happy to provide more information if anyone wants to take look at
the
> > problem.
>
> Dear Winston,
>
> While I agree this may very well be a bug in R(-devel), and hence
> also R in 3.4.0 alpha and hence quite important to be dealt with,
>
> your code still involves 3 non-trivial packages (DBI, R6,
> testthat) some of which have their own C code and notably load
> a couple of other package's namespaces.
> We've always made a point
> https://www.r-project.org/bugs.html
> that bugs in R should be reproducible without extra
> packages... and I think it would definitely help to pinpoint the
> issue to be seen outside of your extra packages' world.
>
> Or have you been aware of that and are just asking for help
> finding a bug in one of the extra packages involved, a bug that might only
be triggered by recent changes in R ?
>
> OTOH, what you describe above (p ; p$e ; p$e$x ...)
> should be reproducible in pure "base" R code, right?
>
> I'm sorry not to be of more help
> Martin
Of the four packages that are loaded when running the tests (pool,
DBI, R6, testthat, magrittr, crayon), only testthat contains compiled
code, and it is pretty minimal. The only compiled code in testthat
that should be executed is a function that finds a label -- but that
happens only after an error occurs.
This is the sessionInfo():
R Under development (unstable) (2017-03-23 r72389)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pool_0.1.0 DBI_0.6-1 testthat_1.0.2
loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5 R6_2.2.0 crayon_1.3.2
I have spent days trying to make a minimal example, and that is the
best that I have been able to do so far. I would not involve all this
complexity if I could avoid it. The problem is that the behavior is
extremely sensitive to any changes. Seemingly-innocuous things like
removing other tests, or adding a cat() statement can make the error
disappear, or in some cases it makes ls() return values from a
different (previous) iteration of the loop.
In my testing, I have also seen a case where calls to `cat( ... ,
file=stderr())` result in output that has the wrong order. I don't
know if that's due to the cat() calls being executed in the wrong
order, or if it's simply being printed or buffered in the wrong order.
This is the code in question that cat()s to stderr:
https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90
while (TRUE) {
tasks <- sort(ls(private$scheduledTasks))
if (length(tasks) == 0) break
t <- tasks[[1]]
s <- stderr()
cat(tasks, "--1--\n", file = s)
cat(ls(private$scheduledTasks), "--2--\n", file = s)
cat(t, "--3--\n", file = s)
task <- private$scheduledTasks[[t]]
rm(list = t, envir = private$scheduledTasks)
task()
}
Without going into too much detail, it should print lines of text that
end with --1--, --2--, --3--, and repeat. Here's what it prints
instead when running the tests:
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --2--
20170405-182549.466875-18 --3--
20170405-182559.456628-17 20170405-182729.456318-16 --1--
20170405-182559.456628-17 20170405-182729.456318-16 --2--
20170405-182559.456628-17 --3--
20170405-182729.456318-16 --1--
20170405-182729.456318-16 --2--
20170405-182729.456318-16 --3--
--2--
20170405-182549.466875-18 --3--
1. Error: pool scheduler: schedules things in the right order
(@test-scheduling.R#13)
could not find function "task"
1: naiveScheduler$protect({
scheduleTask(1e+05, function() {
results <<- c(results, 3L)
})
scheduleTask(10000, function() {
results <<- c(results, 2L)
})
scheduleTask(10, function() {
results <<- c(results, 1L)
})
}) at testthat/test-scheduling.R:13
2: private$refCount$release() at testthat/test-scheduling.R:13
3: private$callback()
It's almost as though, in the middle of the first iteration of the
while loop, R jumps to the next iteration of the loop, runs the loop a
couple of times to completion, and then returns to the first iteration
of the loop at the place that it left.
This can be reproduced by following the instructions in this gist:
https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
Almost any change to the code will make the error disappear, or change
to a different one.
With regard to reproducing it in "base" R: I made a simple example
using just R (no other packages) that does something similar to what
happens in the tests, but even when I run it for 100,000 iterations,
the error doesn't occur.
I think there's a good chance that this is due to a bug in R. I have
been trying to track down the cause of the problem but haven't been
able find it.
-Winston