Yohan Chalabi
2008-Dec-03 19:28 UTC
[Rd] reduce limit number of arguments in methods:::cbind
Dear all, As far as I understand, the number of arguments in methods:::cbind is limited by the "self recursive" construction of the function which generates nested loops. A workaround could be to use the internal cbind function on blocks of non S4 objects. The limitation would then be reduced to the number of consecutive S4 objects. ##### R code ##### dfr <- data.frame(matrix(0, nrow = 1 , ncol = 1000)) dfr2 <- is.na(dfr) mlist <- rep(list(matrix(0, 2, 1)), 400) cb1 <- do.call("cbind", c(mlist, mlist)) methods:::bind_activation(TRUE) dfr2 <- is.na(dfr) # fails cb2 <- do.call("cbind", mlist) # ok cb3 <- do.call("cbind", c(mlist, mlist)) # fails # This could be avoided by first checking that the arguments has no S4 # objects. If this is the case, the function falls back to the # internal cbind function. # But this would not be very helpful if the arguments are a mixture of # S4 and non S4 objects library(Matrix) Mlist <- rep(list(Matrix(0, 2, 1)), 400) cb4 <- do.call("cbind", Mlist) # ok cb5 <- do.call("cbind", c(Mlist, Mlist)) # fails cb6 <- do.call("cbind", c(Mlist, mlist)) # fails # A workaround could be to use the internal cbind function on blocks of # non S4 objects. The limitation would be reduced to the number of # consecutive S4 objects # After modifications dfr2 <- is.na(dfr) # ok cb7 <- do.call("cbind", mlist) # ok cb8 <- do.call("cbind", c(mlist, mlist)) # ok cb9 <- do.call("cbind", c(Mlist, mlist)) # ok cb10 <- do.call("cbind", c(Mlist, Mlist)) # fails as expected ##### END ##### The code bellow gives an idea how to do it but was not fully tested! Hope it helps, Yohan Index: methods/R/cbind.R ==================================================================--- methods/R/cbind.R (revision 47045) +++ methods/R/cbind.R (working copy) @@ -39,11 +39,10 @@ ## remove trailing 'NULL's: while(na > 0 && is.null(argl[[na]])) { argl <- argl[-na]; na <- na - 1 } if(na == 0) return(NULL) - if(na == 1) { - if(isS4(..1)) - return(cbind2(..1)) - else return(.Internal(cbind(deparse.level, ...))) - } + if (!any(aS4 <- unlist(lapply(argl, isS4)))) + return(.Internal(cbind(deparse.level, ...))) + if(na == 1) + return(cbind2(..1)) ## else : na >= 2 @@ -64,6 +63,15 @@ else { ## na >= 3 arguments: -- RECURSION -- with care ## determine nrow(<result>) for e.g., cbind(diag(2), 1, 2) ## only when the last two argument have *no* dim attribute: + idx.aS4 <- 0 + while (!rev(aS4)[idx.aS4+1]) + idx.aS4 <- idx.aS4 + 1 + if (idx.aS4 > 1) { + argl0 <- argl[(na-idx.aS4+1):na] + argl1 <- do.call(cbind, c(argl0, list(deparse.level=deparse.level))) + argl2 <- c(argl[1L:(na-idx.aS4)], list(argl1)) + return(do.call(cbind, c(argl2, list(deparse.level=deparse.level)))) + } nrs <- unname(lapply(argl, nrow)) # of length na iV <- sapply(nrs, is.null)# is 'vector' fix.na <- identical(nrs[(na-1):na], list(NULL,NULL))
My 2c: The real issue for me is that this approach to handling S4 objects by altering R functions for the worse is incorrect. (by calling bind_activation) m <- matrix(1:2e6L) # 2 million obs> system.time(cbind(m,m))user system elapsed 0.027 0.017 0.044> methods:::bind_activation(TRUE)[1] FALSE # the additional overhead of cbind is now damaging to cbind S3 methods> system.time(cbind(m,m))user system elapsed 0.043 0.034 0.077 [~175% of the original time] Wouldn't a better near-term approach involve writing S3 methods to dispatch on.> methods:::bind_activation(FALSE) > library(Matrix) > M <- Matrix(1:10) > cbind(M,M)M M [1,] ? ?> cbind.dgeMatrix <- function(..., deparse.level=1) methods:::cbind(..., deparse.level=deparse.level) > cbind(M,M)10 x 2 Matrix of class "dgeMatrix" [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 4 [5,] 5 5 [6,] 6 6 [7,] 7 7 [8,] 8 8 [9,] 9 9 [10,] 10 10 # this approach "does no harm" to regular S3 methods> system.time(cbind(m,m))user system elapsed 0.028 0.017 0.045 Obviously this negates part of the S4 dispatch value, but that can be had by calling cbind2 directly. Jeff -- Jeffrey Ryan jeffrey.ryan at insightalgo.com ia: insight algorithmics www.insightalgo.com
Apparently Analagous Threads
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- methods cbind2 bind_activation disrupts cbind everywhere