Hi everyone, I ve been using R for months and find it really practical and straight forward. However (the inevitable however), I am finding it very slow for one of my operations: it s basically an itertation over i and j in a pretty big table (4* 4608). It takes 30 minutes!!!! Thanks Ps:if it can help here is the source: median1<-matrix(nrow=4608,ncol=1) median2<-matrix(nrow=4608,ncol=1) median3<-matrix(nrow=4608,ncol=1) median4<-matrix(nrow=4608,ncol=1) v<-c(18,19,20,21,23) for (i in 0:11) { for (j in 1:384) { median1[j+(i*384),]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5]) median2[j+(i*384),]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19]) median3[j+(i*384),]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12]) median4[j+(i*384),]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26]) puce[,5]<-median1 puce[,19]<-median2 puce[,12]<-median3 puce[,26]<-median4 } }
Is well know that R is inefficent on loops. When you have to perform "heavy" loop is better to use a call to fortran or c code (.Fortran() , .C() functions) A.S. ---------------------------- Alessandro Semeria Models and Simulations Laboratory Montecatini Environmental Research Center (Edison Group), Via Ciro Menotti 48, 48023 Marina di Ravenna (RA), Italy Tel. +39 544 536811 Fax. +39 544 538663 E-mail: alessandro.semeria at cramont.it
william ritchie wrote:> Hi everyone, > > I ve been using R for months and find it really > practical and straight forward. > However (the inevitable however), I am finding it very > slow for one of my operations: > it s basically an itertation over i and j in a pretty > big table (4* 4608). It takes 30 minutes!!!! > > ThanksThere was a suggestion to use C or Fortran, but in your particular case it looks like you can choose a simpler way to get more performance (even if not that much as in C) by vectorizing a bit more, see below.> Ps:if it can help here is the source: > > median1<-matrix(nrow=4608,ncol=1) > median2<-matrix(nrow=4608,ncol=1) > median3<-matrix(nrow=4608,ncol=1) > median4<-matrix(nrow=4608,ncol=1) > v<-c(18,19,20,21,23) > for (i in 0:11) > { > for (j in 1:384) > { > > median1[j+(i*384),]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5]) > > median2[j+(i*384),]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19]) > > median3[j+(i*384),]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12]) > > median4[j+(i*384),]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26]) > > > puce[,5]<-median1 > puce[,19]<-median2 > puce[,12]<-median3 > puce[,26]<-median4 > > > } > }The obvious (well, I haven't tried) *first* step (I don't want to rewrite your code here!) is, e.g., median1 <- median2 <- median3 <- median4 <- numeric(4608) v <- c(18,19,20,21,23) for (i in 0:11) { j <- 1:384 median1[j+(i*384)]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5]) median2[j+(i*384)]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19]) median3[j+(i*384)]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12]) median4[j+(i*384)]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26]) } puce[,5]<-median1 puce[,19]<-median2 puce[,12]<-median3 puce[,26]<-median4 Uwe Ligges
"Alessandro Semeria" alessandro.semeria at cramont.it wrote: Is well know that R is inefficent on loops. This is a dangerous half-truth. R is an interpreted language. The interpreter uses techniques similar to those used in Scheme interpreters. As interpreters go, it's pretty good. For comparison, in processing XML documents, I've had interpreted Scheme running rings around compiled Java (by doing the task a different way, of course). Also for comparison, years ago I had a Prolog program for median polish that made a published Fortran program for median polish look sick (by using a much better data structure). With Luke Tierney's byte-code compiler, I expect R loops will become close to as efficient as Python ones, and people run entire web sites with Python. It is more accurate to say that R code qua R code is not as efficient as the large body of "primitives" that operate on entire arrays. When you have to perform "heavy" loop is better to use a call to fortran or c code (.Fortran() , .C() functions) Even if the premiss were literally and exactly true, the conclusion would not follow. When you have a speed problem with R code, (1) Find out where the problem is, exactly. People's intuition about performance bottlenecks is notoriously bad. Do what the experts do: *measure*. (2) Try to restructure the code *entirely in R* to be as clear and high level as possible. If there have to be subscripts, at least let them be vector subscripts. (3) Measure again. Chances are that making the code clear and high level has fixed the performance problem. (4) If that fails, try restructuring the code a couple of ways, *entirely in R*. The two basic techniques for optimising a calculation are (a) eliminate it entirely and (b) if you can't eliminate the first evaluation of an expression, eliminate the second by saving the result. As a special case of (b), try moving things out of loops; try splitting a calculation into a part that changes a lot and a part that changes very little, and update the small-change part only when you have to. Perhaps apply the idea of program differentiation. (NOT the idea of taking a function that computes a value and automatically computing a function that computes the derivative of the first, but the idea of saying if I have z<-f(x,y) and I make a small change to x, do I have to recompute z completely or can I came a small change to z?) Try to use built in operations as much as possible on data structures that are as large as appropriate. (5) Measure again. This will probably have fixed the performance problem. (6) If all else fails, now it's time to try Fortran or C. It's too bad there isn't an existing Fortran or C module you can just call, if there had been you'd have used that before writing the original R code.