Henrik Bengtsson
2008-Jul-11 03:18 UTC
[Rd] Suggestion: 20% speed up of which() with two-character mod
Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. CURRENT CODE: which <- function(x, arr.ind = FALSE) { if(!is.logical(x)) stop("argument to 'which' is not logical") wh <- seq_along(x)[ll <- x & !is.na(x)] m <- length(wh) dl <- dim(x) if (is.null(dl) || !arr.ind) { names(wh) <- names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 <- function(x, arr.ind = FALSE) { if(!is.logical(x)) stop("argument to 'which' is not logical") wh <- seq_along(x)[x & !is.na(x)] m <- length(wh) dl <- dim(x) if (is.null(dl) || !arr.ind) { names(wh) <- names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 <- base::which; environment(which1) <- globalenv(); # Needed? N <- 1e6; set.seed(0xbeef); x <- sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) <- seq_along(x); B <- 10; t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); print(t1/t2); ## user system elapsed ## 1.283186 1.052632 1.250000 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik
Charles C. Berry
2008-Jul-11 15:57 UTC
[Rd] Suggestion: 20% speed up of which() with two-character mod
On Thu, 10 Jul 2008, Henrik Bengtsson wrote:> Hi, > > by replacing 'll' with 'wh' in the source code for base::which() one > gets ~20% speed up for *named logical vectors*.The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck> > CURRENT CODE: > > which <- function(x, arr.ind = FALSE) > { > if(!is.logical(x)) > stop("argument to 'which' is not logical") > wh <- seq_along(x)[ll <- x & !is.na(x)] > m <- length(wh) > dl <- dim(x) > if (is.null(dl) || !arr.ind) { > names(wh) <- names(x)[ll] > } > ... > wh; > } > > SUGGESTED CODE: (Remove 'll' and use 'wh') > > which2 <- function(x, arr.ind = FALSE) > { > if(!is.logical(x)) > stop("argument to 'which' is not logical") > wh <- seq_along(x)[x & !is.na(x)] > m <- length(wh) > dl <- dim(x) > if (is.null(dl) || !arr.ind) { > names(wh) <- names(x)[wh] > } > ... > wh; > } > > That's all. > > BENCHMARKING: > > # To measure both in same environment > which1 <- base::which; > environment(which1) <- globalenv(); # Needed? > > N <- 1e6; > set.seed(0xbeef); > x <- sample(c(TRUE, FALSE), size=N, replace=TRUE); > names(x) <- seq_along(x); > B <- 10; > t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); > t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); > stopifnot(identical(idxs1, idxs2)); > print(t1/t2); > # Fair benchmarking > t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); > t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); > print(t1/t2); > ## user system elapsed > ## 1.283186 1.052632 1.250000 > > You get similar results if you put for loop outside the system.time() > call (and sum up the timings). > > Cheers > > Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901