hadley wickham
2006-Jan-23 03:42 UTC
[R] formatC slow? (or how can I make this function faster?
I'm trying to convert a matrix of capture occasions to format that an external program can read. The job is to basically take a row of matrix, like> smp[1,][1] 1 1 0 1 1 1 0 0 0 0 and convert it to the equivalent string "1101110000" I'm having problems doing this in a speedy way. The simplest solution (calc_history below, using apply, paste and collapse) takes about 2 seconds for a 10,000 x 10 matrix. I thought perhaps paste might be building up the string in an efficient manner, so I tried using matrix multiplication and formatC (as in calc_history2). This is about 25% faster, but still seems slow. smp <- matrix(rbinom(100000, 1, 0.5), nrow=10000) calc_history <- function(smp) { apply(smp, 1, paste, collapse="") } calc_history <- function(smp) { mul <- 10 ^ ((ncol(smp)-1):0) as.vector(formatC(smp %*% mul, format="d", width=ncol(smp), flag=0)) } system.time(calc_history(smp)) system.time(calc_history2(smp)) Any ideas for improvement? Thanks, Hadley
Prof Brian Ripley
2006-Jan-23 07:44 UTC
[R] formatC slow? (or how can I make this function faster?
First, your timings seem slow: even my laptop is using 0.4 secs. So the simple solution is to use a better computer. I would just write such things in C. The following runs in 0.01sec on my machine (timed by looping over it) system.time(.Call("Cpaste", smp)) using #include <R.h> #include <Rinternals.h> SEXP Cpaste(SEXP A) { SEXP dims, ans; double *rA = REAL(A); int i, j, nr, nc; char buf[100], one[] = "1", zero[] = "0"; dims = getAttrib(A, R_DimSymbol); nr = INTEGER(dims)[0]; nc = INTEGER(dims)[1]; ans = allocVector(STRSXP, nr); for(i = 0; i < nr; i ++) { buf[0] = '\0'; for(j = 0; j < nc; j++) strcat(buf, rA[i + nr*j] > 0 ? one : zero); SET_STRING_ELT(ans, i, mkChar(buf)); } return ans; } and perhaps that could be made more efficient by avoiding strcat but I would expect mkChar to be taking much of the time. On Sun, 22 Jan 2006, hadley wickham wrote:> I'm trying to convert a matrix of capture occasions to format that an > external program can read. The job is to basically take a row of > matrix, like > >> smp[1,] > [1] 1 1 0 1 1 1 0 0 0 0 > > and convert it to the equivalent string "1101110000" > > I'm having problems doing this in a speedy way. The simplest solution > (calc_history below, using apply, paste and collapse) takes about 2 > seconds for a 10,000 x 10 matrix. I thought perhaps paste might be > building up the string in an efficient manner, so I tried using matrix > multiplication and formatC (as in calc_history2). This is about 25% > faster, but still seems slow. > > smp <- matrix(rbinom(100000, 1, 0.5), nrow=10000) > > calc_history <- function(smp) { > apply(smp, 1, paste, collapse="") > } > > calc_history <- function(smp) { > mul <- 10 ^ ((ncol(smp)-1):0) > as.vector(formatC(smp %*% mul, format="d", width=ncol(smp), flag=0)) > } > > system.time(calc_history(smp)) > system.time(calc_history2(smp)) > > Any ideas for improvement? > > Thanks, > > Hadley > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Possibly Parallel Threads
- formatC (bug and fix) (PR#394)
- Bug/Error in formatC? (Was: Why doesn't formatC( x, digits=2, format= "g")...)
- Formatting in formatC and format (PR#129)
- Bug or new concept in formatC?
- Why doesn't formatC( x, digits=2, format="g") doesn't always give 2 sig figs?