Hi, all! My project today is to write a speedy colSums(), which is a function available in S-Plus to add the columns of a matrix. Here are 4 ways to do it, with the time it took (elapsed, best of 3 trials) in both R and S-Plus: m <- matrix(1, 400, 40000) x1 <- apply(m, 2, sum) ## R=16.55 S=52.39 x2 <- as.vector(rep(1,nrow(m)) %*% m) ## R= 2.39 S= 8.52 x3 <- g.apply.sum(m,2,F) ## R= 5.72 S=11.25 x4 <- colSums(m,F) ## S= 0.48 Method 1 is the naive "apply", and is significantly faster in R than S-Plus. Method 2 is a matrix operation I learned from Bill Venables; also faster in R. Method 3 is my C code, given below. Method 4 is the S-Plus built-in function, the bogey to beat! My question is, can anyone suggest how to speed up my C code to get results comparable to colSums, or even to method 2? Is it stupid code, a bad compiler, non-optimal optimizer options, or ??? Here's my interface function and simple C code (compiled with gcc via R SHLIB, no Makevars file): g.apply.sum <- function(x, MARGIN=1, na.rm=T) { dx <- dim(x) if (na.rm) x[is.na(x)] <- 0 .C("applysum", as.real(x), dx, as.integer(MARGIN), z=real(dx[MARGIN]), DUP=FALSE)$z } void applysum(double *x, long *dx, long *mar, double *z) { long i, j; if (*mar==1) for(i=0; i<dx[0]; i++) for(j=0; j<dx[1]; j++) z[i] += x[i + dx[0]*j]; else for(j=0; j<dx[1]; j++) for(i=0; i<dx[0]; i++) z[j] += x[i + dx[0]*j]; } Possibly useful audit trail: agate|Rpkg> R SHLIB g.colSums/src/*.c /res/local/bin/gcc -I/res/local/lib/R/include -I/res/local/include -fPIC -g -O2 -c g.colSums/src/applyfilt.c -o g.colSums/src/applyfilt.o /res/local/bin/gcc -I/res/local/lib/R/include -I/res/local/include -fPIC -g -O2 -c g.colSums/src/applysum.c -o g.colSums/src/applysum.o /res/local/bin/gcc -shared -o g.colSums/src/applyfilt.so g.colSums/src/applyfilt.o g.colSums/src/applysum.o -L/res/local/include Version: platform = sparc-sun-solaris2.6 arch = sparc os = solaris2.6 system = sparc, solaris2.6 status = major = 1 minor = 3.1 year = 2001 month = 08 day = 31 language = R -- -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
David Brahm <brahm at alum.mit.edu> writes:> My question is, can anyone suggest how to speed up my C code to get results > comparable to colSums, or even to method 2? Is it stupid code, a bad compiler, > non-optimal optimizer options, or ???Your code is not very efficiently written, but the compiler ought to be able to catch the common subexpressions etc. On the other hand, especially on Sparc there are some pretty tricky pipelining and parallelism issues that can be really hard to get right and can mean quite a lot. And there are some optimizations that a compiler cannot really do without making assumptions that might be wrong. (Prototypically: what if z and x have overlapping storage? *You* know they don't, but the compiler really can't be sure.)> Here's my interface function and simple C code (compiled with gcc via R > SHLIB, no Makevars file): > > g.apply.sum <- function(x, MARGIN=1, na.rm=T) { > dx <- dim(x) > if (na.rm) x[is.na(x)] <- 0 > .C("applysum", as.real(x), dx, as.integer(MARGIN), z=real(dx[MARGIN]), > DUP=FALSE)$z > } > > void applysum(double *x, long *dx, long *mar, double *z) { > long i, j; > if (*mar==1) > for(i=0; i<dx[0]; i++) for(j=0; j<dx[1]; j++) z[i] += x[i + dx[0]*j]; > else > for(j=0; j<dx[1]; j++) for(i=0; i<dx[0]; i++) z[j] += x[i + dx[0]*j]; > }Hmm. What happens if you rewrite the "else" case as something like register long i; register double s, *p; p = x; for(j=0; j<dx[1]; j++) { i = dx[0]; s = 0.; while (i--) s += *p++; z[j] = s; } -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I asked how to write speedy C code to implement colSums(). My original version on a 400x40000 matrix took 5.72s. Peter Dalgaard <p.dalgaard at biostat.ku.dk> suggested some more efficient coding, which sped my example up to 3.90s. Douglas Bates <bates at stat.wisc.edu> suggested using .Call() instead of .C, and I was amazed to see the time went down to 0.69s! Doug had actually posted his code (a package called "MatUtils") to R-help on July 19, 2001. I've taken Doug's code, added names to the result, and included an na.rm flag. Unfortunately, my na.rm option makes it really slow again! (12.15s). That's no faster than pre-processing the matrix with "m[is.na(m)] <- 0". Can anyone help me understand why the ISNA conditional is taking so much time? The C code is below. Thanks! ---- begin "colSums.c" (90% due to Doug Bates) ---- #include <R.h> #include <Rdefines.h> SEXP colSums(SEXP m, SEXP NaRm) { int i, j, *mdims, n, p, narm; double *mm, sum; SEXP val, nms; if (!isMatrix(m)) error("m must be a matrix"); mdims = INTEGER(GET_DIM(m)); narm = asLogical(NaRm); n = mdims[0]; p = mdims[1]; PROTECT(val = NEW_NUMERIC(p)); PROTECT(nms = GET_COLNAMES(GET_DIMNAMES(m))); PROTECT( m = AS_NUMERIC(m)); mm = REAL(m); if (narm) for (j = 0; j < p; j++) { for (sum = 0., i = 0; i < n; i++) if (!ISNA(mm[i])) sum += mm[i]; REAL(val)[j] = sum; mm += n; } else for (j = 0; j < p; j++) { for (sum = 0., i = 0; i < n; i++) sum += mm[i]; REAL(val)[j] = sum; mm += n; } namesgets(val, nms); UNPROTECT(3); return val; } ---- end "colSums.c" ---- -- -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._