Martin Maechler
2016-May-10 10:43 UTC
[Rd] recursion problem using do.call(rbind, list(..,<S4>,..))
This was originally a bug report about Matrix, https://r-forge.r-project.org/tracker/?func=detail&atid=294&aid=6325&group_id=61 but the bug is rather a "design" bug in R, or a limitation. This e-mail is a report of the status quo as I see it, and call for comments, sugguests, help/hints for workarounds, or even a suggestion for a programming task helping R core to amend the status {re-writing the methods:::cbind and ...:rbind functions}: If you read ?rbind carefully, you may have learned that rbind() and cbind() are able to deal with S4 "matrix-like" objects, via the hidden methods:::rbind / methods:::cbind functions where these recursively build on appropriate S4 methods for rbind2() / cbind2(). That is how cbind() and rbind() work nowadays for Matrix-package matrices. However, there is problem lurking from the above paragraph, and for experienced programmers / computer scientists that may even be obvious: recursion. A simple MRE (minimal reproducible example) for the problem seen with Matrix: S <- sparseMatrix(i=1:4, j=9:6, x=pi*1:4) n <- 410 # it succeeds for 407 -- sometimes even with 408 X <- replicate(n, S, simplify = FALSE) Xb <- do.call("rbind", X) ## -> Error in as(x, "CsparseMatrix") : node stack overflow But as I said, that's not really the Matrix package. A small reproducible example, not involving the Matrix package at all: MM <- setClass("mmatrix", contains="matrix") setMethod("rbind2", signature(x="mmatrix", y="mmatrix"), function(x,y) as(base::rbind(unclass(x), unclass(y)), "mmatrix")) (m5 <- MM(diag(5))) m45 <- MM(matrix(1:20, 4,5)) rbind(m5, m45) # fine ## fine with 400 : mL.4c <- replicate(400, m45, simplify=FALSE) mmm4 <- do.call(rbind, mL.4c) stopifnot(is(mmm4, "mmatrix"), dim(mmm4) == c(1600L, 5L)) ## not ok with 410 : mL.410 <- replicate(410, m45, simplify=FALSE) mmm4 <- do.call(rbind, mL.410) ## Error in getExportedValue(pkg, name) (from #2) : node stack overflow and the "node stack overflow" is not too helpful. Unfortunately, this is more than one problem, the first one being that recursive function calls nowadays often end with this "node stack overflow" error, rather than the somewhat more understandable error message, namely Error: evaluation nested too deeply: infinite recursion / options(expressions=)? And even worse, that nowadays increasing the option to a higher number N, options(expressions = N) does not help at all in this situation, once the code is byte compiled.... which of course is true for everything in base R. But that is basically only the reason for the not-so-helpful error message (and that raising 'expressions' does not help!), but the underlying problem is somewhat harder, namely the full setup the S4-serving methods:::rbind() {and ...cbind()} using recursion (in the way it does) at all. There is a simple, in my eyes ugly workaround which also will not scale well, but works fine and fast enough for the above example: ## Simple ugly workaround .. but well fast enough :> system.time({+ r <- mL.410[[1]] + for(i in seq_along(mL.410)[-1]) + r <- rbind2(r, mL.410[[i]]) + }) user system elapsed 0.083 0.000 0.083> dim(r)[1] 1640 5>This should help Ben (the OP of the Matrix bug), and may be something like that should also guide on how to re-write the methods:::rbind() / methods:::cbind() in a non-recursive fashion ? Thank you for your thoughts. Martin Maechler ETH Zurich
Hadley Wickham
2016-May-13 20:11 UTC
[Rd] recursion problem using do.call(rbind, list(..,<S4>,..))
Hi Martin, I think this is a general problem with any function that does dispatch on ... - I think for well-defined behaviour you always need to dispatch on pairs, folding/reducing (like in your code) to get a single value. The downside of this approach is obviously performance: for n inputs, you need n - 1 temporary objects, which is going to hamper performance (which is problematic for rbind since it's not uncommon to have lists of thousands of small data frames). I don't see anyway around this, except maybe you just have to have specialised variants (like dplyr::bind_rows() and data.table::rbindlist()) that don't do dispatch. I've cc'd Michael since he might have missed the original thread, and I suspect he's probably thought more about this than me. Hadley On Tue, May 10, 2016 at 5:43 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> This was originally a bug report about Matrix, > https://r-forge.r-project.org/tracker/?func=detail&atid=294&aid=6325&group_id=61 > but the bug is rather a "design" bug in R, or a limitation. > > This e-mail is a report of the status quo as I see it, and > call for comments, sugguests, help/hints for workarounds, > or even a suggestion for a programming task helping R core to > amend the status {re-writing the methods:::cbind and ...:rbind functions}: > > > If you read ?rbind carefully, you may have learned that rbind() and > cbind() are able to deal with S4 "matrix-like" objects, via the > hidden methods:::rbind / methods:::cbind functions > where these recursively build on appropriate S4 methods for > rbind2() / cbind2(). > > That is how cbind() and rbind() work nowadays for Matrix-package > matrices. > > However, there is problem lurking from the above paragraph, and > for experienced programmers / computer scientists that may even > be obvious: recursion. > > A simple MRE (minimal reproducible example) for the problem seen with Matrix: > > S <- sparseMatrix(i=1:4, j=9:6, x=pi*1:4) > n <- 410 # it succeeds for 407 -- sometimes even with 408 > X <- replicate(n, S, simplify = FALSE) > Xb <- do.call("rbind", X) > ## -> Error in as(x, "CsparseMatrix") : node stack overflow > > But as I said, that's not really the Matrix package. A small > reproducible example, not involving the Matrix package at all: > > MM <- setClass("mmatrix", contains="matrix") > setMethod("rbind2", signature(x="mmatrix", y="mmatrix"), > function(x,y) as(base::rbind(unclass(x), unclass(y)), "mmatrix")) > > (m5 <- MM(diag(5))) > m45 <- MM(matrix(1:20, 4,5)) > rbind(m5, m45) # fine > > ## fine with 400 : > mL.4c <- replicate(400, m45, simplify=FALSE) > mmm4 <- do.call(rbind, mL.4c) > stopifnot(is(mmm4, "mmatrix"), dim(mmm4) == c(1600L, 5L)) > ## not ok with 410 : > mL.410 <- replicate(410, m45, simplify=FALSE) > mmm4 <- do.call(rbind, mL.410) > ## Error in getExportedValue(pkg, name) (from #2) : node stack overflow > > and the "node stack overflow" is not too helpful. > Unfortunately, this is more than one problem, the first one being > that recursive function calls nowadays often end with this > "node stack overflow" error, rather than the somewhat more understandable > error message, namely > > Error: evaluation nested too deeply: infinite recursion / options(expressions=)? > > And even worse, that nowadays increasing the option to a higher number N, > options(expressions = N) > > does not help at all in this situation, once the code is byte > compiled.... which of course is true for everything in base R. > > But that is basically only the reason for the not-so-helpful > error message (and that raising 'expressions' does not help!), > but the underlying problem is somewhat harder, namely the full > setup the S4-serving methods:::rbind() {and ...cbind()} using > recursion (in the way it does) at all. > > There is a simple, in my eyes ugly workaround which also will not scale well, > but works fine and fast enough for the above example: > > ## Simple ugly workaround .. but well fast enough : >> system.time({ > + r <- mL.410[[1]] > + for(i in seq_along(mL.410)[-1]) > + r <- rbind2(r, mL.410[[i]]) > + }) > user system elapsed > 0.083 0.000 0.083 >> dim(r) > [1] 1640 5 >> > > This should help Ben (the OP of the Matrix bug), and may be > something like that should also guide on how to re-write > the methods:::rbind() / methods:::cbind() in a non-recursive > fashion ? > > Thank you for your thoughts. > > Martin Maechler > ETH Zurich > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- http://hadley.nz
Maybe Matching Threads
- Proper way to define cbind, rbind for s4 classes in package
- cbind() & rbind() for S4 objects -- 'Matrix' package changes
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package