luke-tierney at uiowa.edu
2015-May-05 13:46 UTC
[Rd] Why is the diag function so slow (for extraction)?
Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure why the change was made many years ago, but this was before names were handled explicitly. It would definitely be better to not force the duplicate, at least in the case where we are sure c() and [ would not dispatch. Best, luke On Mon, 4 May 2015, peter dalgaard wrote:> >> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote: >> >> But I'm still wondering why diag() uses c()...? With it being so slow, I'd >> be inclined to write a qdiag() without the c() and just use that the next >> time I need matrix algebra. Any insight would be appreciated; thanks! > > Well, there are two possibilities: Either it is deliberate or it isn't. > > The latter isn't too unlikely, given that the effect is seen for large matrices. I would appear to be a matter of O(n) (picking out n items) vs. O(n^2) (copying an n x n matrix), but this might drown out in a context involving matrix multiplication and/or inversion, both of which are O(n^3). > > If it is deliberate, the question is why. There could be devils in the details; notice in particular that c() strips off non-name attributes. However, I'm not aware of a situation where such attributes could cause trouble. > > -pd > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Steve Bronder
2015-May-07 15:49 UTC
[Rd] Why is the diag function so slow (for extraction)?
Is it possible to replace c() with .subset()? Example below #### #### library(microbenchmark) diag2 <- function(x,nrow, ncol){ if (is.matrix(x)) { if (nargs() > 1L) stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix") if ((m <- min(dim(x))) == 0L) return(vector(typeof(x), 0L)) # replace this part y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1)) nms <- dimnames(x) if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <- nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)])) names(y) <- nm return(y) } if (is.array(x) && length(dim(x)) != 1L) stop("'x' is an array, but not one-dimensional.") if (missing(x)) n <- nrow else if (length(x) == 1L && nargs() == 1L) { n <- as.integer(x) x <- 1 } else n <- length(x) if (!missing(nrow)) n <- nrow if (missing(ncol)) ncol <- n } nc <- 10 set.seed(1) m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc) runoff <- microbenchmark( diaga = diag(m), diagb = diag2(m) ) Regards, Steve Bronder Website: stevebronder.com Phone: 412-719-1282 Email: sbronder at stevebronder.com On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure > why the change was made many years ago, but this was before names were > handled explicitly. It would definitely be better to not force the > duplicate, at least in the case where we are sure c() and [ would not > dispatch. > > Best, > > luke > > On Mon, 4 May 2015, peter dalgaard wrote: > > >> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote: >>> >>> But I'm still wondering why diag() uses c()...? With it being so slow, >>> I'd >>> be inclined to write a qdiag() without the c() and just use that the next >>> time I need matrix algebra. Any insight would be appreciated; thanks! >>> >> >> Well, there are two possibilities: Either it is deliberate or it isn't. >> >> The latter isn't too unlikely, given that the effect is seen for large >> matrices. I would appear to be a matter of O(n) (picking out n items) vs. >> O(n^2) (copying an n x n matrix), but this might drown out in a context >> involving matrix multiplication and/or inversion, both of which are O(n^3). >> >> If it is deliberate, the question is why. There could be devils in the >> details; notice in particular that c() strips off non-name attributes. >> However, I'm not aware of a situation where such attributes could cause >> trouble. >> >> -pd >> >> >> > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Steve Bronder
2015-May-11 17:10 UTC
[Rd] Why is the diag function so slow (for extraction)?
Sorry if this is a re-post, not sure if my original message got though. Is it possible to replace c() with .subset()? Or would this throw errors because of something specific in c() or as.matrix()? There's a pretty nice speed up by replacing c() with the .subset(). Ex. ---- library(microbenchmark) diag2 <- function(x,nrow, ncol){ if (is.matrix(x)) { if (nargs() > 1L) stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix") if ((m <- min(dim(x))) == 0L) return(vector(typeof(x), 0L)) # replace this part y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1)) nms <- dimnames(x) if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <- nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)])) names(y) <- nm return(y) } if (is.array(x) && length(dim(x)) != 1L) stop("'x' is an array, but not one-dimensional.") if (missing(x)) n <- nrow else if (length(x) == 1L && nargs() == 1L) { n <- as.integer(x) x <- 1 } else n <- length(x) if (!missing(nrow)) n <- nrow if (missing(ncol)) ncol <- n } # Tests nc <- 1e04 set.seed(1) m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc) runoff <- microbenchmark( diaga <- diag(m), diagb <- diag2(m) ) runoff Unit: microseconds expr min lq mean median uq max neval diaga 429033.896 434186.694 512143.6728 503355.5865 572811.11 656035.584 100 diagb 216.112 251.445 536.8531 688.3595 706.98 2437.921 100 Regards, Steve Bronder Website: stevebronder.com Phone: 412-719-1282 Email: sbronder at stevebronder.com On Thu, May 7, 2015 at 11:49 AM, Steve Bronder <sbronder at stevebronder.com> wrote:> Is it possible to replace c() with .subset()? Example below > > #### > #### > > library(microbenchmark) > > > diag2 <- function(x,nrow, ncol){ > if (is.matrix(x)) { > if (nargs() > 1L) > stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix") > if ((m <- min(dim(x))) == 0L) > return(vector(typeof(x), 0L)) > # replace this part > y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1)) > nms <- dimnames(x) > if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <- > nms[[1L]][seq_len(m)]), > > nms[[2L]][seq_len(m)])) > names(y) <- nm > return(y) > } > if (is.array(x) && length(dim(x)) != 1L) > stop("'x' is an array, but not one-dimensional.") > if (missing(x)) > n <- nrow > else if (length(x) == 1L && nargs() == 1L) { > n <- as.integer(x) > x <- 1 > } > else n <- length(x) > if (!missing(nrow)) > n <- nrow > if (missing(ncol)) > ncol <- n > } > > nc <- 10 > > set.seed(1) > m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc) > > > runoff <- microbenchmark( > diaga = diag(m), > diagb = diag2(m) > ) > > Regards, > > Steve Bronder > Website: stevebronder.com > Phone: 412-719-1282 > Email: sbronder at stevebronder.com > > > On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote: > >> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure >> why the change was made many years ago, but this was before names were >> handled explicitly. It would definitely be better to not force the >> duplicate, at least in the case where we are sure c() and [ would not >> dispatch. >> >> Best, >> >> luke >> >> On Mon, 4 May 2015, peter dalgaard wrote: >> >> >>> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote: >>>> >>>> But I'm still wondering why diag() uses c()...? With it being so slow, >>>> I'd >>>> be inclined to write a qdiag() without the c() and just use that the >>>> next >>>> time I need matrix algebra. Any insight would be appreciated; thanks! >>>> >>> >>> Well, there are two possibilities: Either it is deliberate or it isn't. >>> >>> The latter isn't too unlikely, given that the effect is seen for large >>> matrices. I would appear to be a matter of O(n) (picking out n items) vs. >>> O(n^2) (copying an n x n matrix), but this might drown out in a context >>> involving matrix multiplication and/or inversion, both of which are O(n^3). >>> >>> If it is deliberate, the question is why. There could be devils in the >>> details; notice in particular that c() strips off non-name attributes. >>> However, I'm not aware of a situation where such attributes could cause >>> trouble. >>> >>> -pd >>> >>> >>> >> -- >> Luke Tierney >> Ralph E. Wareham Professor of Mathematical Sciences >> University of Iowa Phone: 319-335-3386 >> Department of Statistics and Fax: 319-335-3017 >> Actuarial Science >> 241 Schaeffer Hall email: luke-tierney at uiowa.edu >> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >[[alternative HTML version deleted]]
Martin Maechler
2015-May-12 12:33 UTC
[Rd] Why is the diag function so slow (for extraction)?
>>>>> Steve Bronder <sbronder at stevebronder.com> >>>>> on Thu, 7 May 2015 11:49:49 -0400 writes:> Is it possible to replace c() with .subset()? It would be possible, but I think "entirely" wrong. .subset() is documented to be an internal function not to be used "lightly" and more to the point it is documented to *NOT* dispatch at all. If you read and understood what Peter and Luke wrote, you'd not special case here: diag() should not work only for pure matrices, but for all "matrix-like" objects for which ``the usual methods'' work, such as as.vector(.), c(.) That's why there has been the c(.) in there. You can always make code faster if you write the code so it only has to work in one special case .. and work there very fast. > Example below > #### > #### > library(microbenchmark)