I think the discussion of this issue has gotten more complicated than necessary. First, there really is a bug. You can see this also by the fact that delayed warning messages are wrong. For instance, in R-3.1.2: > lapply(c(-1,2,-1),sqrt) [[1]] [1] NaN [[2]] [1] 1.414214 [[3]] [1] NaN Warning messages: 1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced 2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced The first warning message should have "1L" rather than "3L". It doesn't because lapply made a destructive change to the R expression that was evaluated for the first element. Throughout the R interpreter, there is a general assumption that expressions that are or were evaluated are immutable, which lapply is not abiding by. The only question is whether the bugs from this are sufficiently obscure that it's worth keeping them for the gain in speed, but the speed cost of fixing it is fairly small (though it's not negligible when the function applied is something simple like sqrt). The fix in the C code for lapply, vapply, and eapply is easy: Rather than create an R expression such as FUN(X[[1L]]) for the first function call, and then modify it in place to FUN(X[[2L]]), and so forth, just create a new expression for each iteration. This requires allocating a few new CONS cells each iteration, which does have a cost, but not a huge one. It's certainly easier and faster than creating a new environment (and also less likely to cause incompatibilities). The R code for apply can be changed to use the same approach, rather than using expressions such as FUN(X[i,]), where i is an index variable, it can create expressions like FUN(X[1L,]), then FUN(X[2L,]), etc. The method for this is simple, like so: > a <- quote(FUN(X[i,])) # "i" could be anything > b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr) This has the added advantage of making error messages refer to the actual index, not to "i", which has no meaning if you haven't looked at the source code for apply (and which doesn't tell you which element led to the error even if you know what "i" does). I've implemented this in the development version of pqR, on the development branch 31-apply-fix, at https://github.com/radfordneal/pqR/tree/31-apply-fix The changes are in src/main/apply.R, src/main/envir.R, and src/library/base/R/apply.R, plus a new test in tests/apply.R. You can compare to branch 31 to see what's changed. (Note rapply seems to not have had a problem, and that other apply functions just use these, so should be fixed as well.) There are also other optimizations in pqR for these functions but the code is still quite similar to R-3.1.2. Radford Neal
On Sun, 1 Mar 2015, Radford Neal wrote:> I think the discussion of this issue has gotten more complicated than > necessary.The discussion has gotten no more complicated than it needs to be. There are other instances, such as Reduce where there is a bug report pending that amounts to the same issue. Performing surgery on expressions and calling eval is not good practice at the R level and probably not a good idea at the C level either. It is worth thinking this through carefully before a adopting a solution, which is what we will be doing. Best, luke> > First, there really is a bug. You can see this also by the fact that > delayed warning messages are wrong. For instance, in R-3.1.2: > > > lapply(c(-1,2,-1),sqrt) > [[1]] > [1] NaN > > [[2]] > [1] 1.414214 > > [[3]] > [1] NaN > > Warning messages: > 1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced > 2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced > > The first warning message should have "1L" rather than "3L". It > doesn't because lapply made a destructive change to the R expression > that was evaluated for the first element. Throughout the R > interpreter, there is a general assumption that expressions that are > or were evaluated are immutable, which lapply is not abiding by. The > only question is whether the bugs from this are sufficiently obscure > that it's worth keeping them for the gain in speed, but the speed cost > of fixing it is fairly small (though it's not negligible when the > function applied is something simple like sqrt). > > The fix in the C code for lapply, vapply, and eapply is easy: Rather > than create an R expression such as FUN(X[[1L]]) for the first > function call, and then modify it in place to FUN(X[[2L]]), and so > forth, just create a new expression for each iteration. This requires > allocating a few new CONS cells each iteration, which does have a > cost, but not a huge one. It's certainly easier and faster than > creating a new environment (and also less likely to cause > incompatibilities). > > The R code for apply can be changed to use the same approach, > rather than using expressions such as FUN(X[i,]), where i is an > index variable, it can create expressions like FUN(X[1L,]), then > FUN(X[2L,]), etc. The method for this is simple, like so: > > > a <- quote(FUN(X[i,])) # "i" could be anything > > b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr) > > This has the added advantage of making error messages refer to the > actual index, not to "i", which has no meaning if you haven't looked > at the source code for apply (and which doesn't tell you which element > led to the error even if you know what "i" does). > > I've implemented this in the development version of pqR, on the > development branch 31-apply-fix, at > > https://github.com/radfordneal/pqR/tree/31-apply-fix > > The changes are in src/main/apply.R, src/main/envir.R, and > src/library/base/R/apply.R, plus a new test in tests/apply.R. You can > compare to branch 31 to see what's changed. (Note rapply seems to not > have had a problem, and that other apply functions just use these, so > should be fixed as well.) There are also other optimizations in pqR > for these functions but the code is still quite similar to R-3.1.2. > > Radford Neal > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
> There are other instances, such as Reduce where there is a bug > report pending that amounts to the same issue. Performing surgery on > expressions and calling eval is not good practice at the R level and > probably not a good idea at the C level either. It is worth thinking > this through carefully before a adopting a solution, which is what we > will be doing.Surgery on expressions is what lapply does at the moment. My change makes it no longer do that. There is a general problem that lazy evaluation can have the effect of making the internal details of how an R function like "apply" is implemented leak into its semantics. That's what's going on with the Reduce bug (16093) too. I think one can avoid this by defining the following function for calling a function with evaluation of arguments forced (ie, lazy evaluation disabled): call_forced <- function (f, ...) { list (...); f (...) } (Of course, for speed one could make this a primitive function, which wouldn't actually build a list.) Then the critical code in Reduce could be changed from for (i in rev(ind)) init <- f(x[[i]], init) to for (i in rev(ind)) init <- call_forced (f, x[[i]], init) If one had a primitive (ie, fast) call_forced, a similar technique might be better than the one I presented for fixing "apply" (cleaner, and perhaps slightly faster). I don't see how it helps for functions like lapply that are written in C, however (where help isn't needed, since there's nothing wrong with the mod in my previous message). Radford Neal