Hi everyone,
I ve been using R for months and find it really
practical and straight forward.
However (the inevitable however), I am finding it very
slow for one of my operations:
it s basically an itertation over i and j in a pretty
big table (4* 4608). It takes 30 minutes!!!!
Thanks
Ps:if it can help here is the source:
median1<-matrix(nrow=4608,ncol=1)
median2<-matrix(nrow=4608,ncol=1)
median3<-matrix(nrow=4608,ncol=1)
median4<-matrix(nrow=4608,ncol=1)
v<-c(18,19,20,21,23)
for (i in 0:11)
{
for (j in 1:384)
{
median1[j+(i*384),]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5])
median2[j+(i*384),]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19])
median3[j+(i*384),]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12])
median4[j+(i*384),]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26])
puce[,5]<-median1
puce[,19]<-median2
puce[,12]<-median3
puce[,26]<-median4
}
}
Is well know that R is inefficent on loops. When you have to perform "heavy" loop is better to use a call to fortran or c code (.Fortran() , .C() functions) A.S. ---------------------------- Alessandro Semeria Models and Simulations Laboratory Montecatini Environmental Research Center (Edison Group), Via Ciro Menotti 48, 48023 Marina di Ravenna (RA), Italy Tel. +39 544 536811 Fax. +39 544 538663 E-mail: alessandro.semeria at cramont.it
william ritchie wrote:> Hi everyone, > > I ve been using R for months and find it really > practical and straight forward. > However (the inevitable however), I am finding it very > slow for one of my operations: > it s basically an itertation over i and j in a pretty > big table (4* 4608). It takes 30 minutes!!!! > > ThanksThere was a suggestion to use C or Fortran, but in your particular case it looks like you can choose a simpler way to get more performance (even if not that much as in C) by vectorizing a bit more, see below.> Ps:if it can help here is the source: > > median1<-matrix(nrow=4608,ncol=1) > median2<-matrix(nrow=4608,ncol=1) > median3<-matrix(nrow=4608,ncol=1) > median4<-matrix(nrow=4608,ncol=1) > v<-c(18,19,20,21,23) > for (i in 0:11) > { > for (j in 1:384) > { > > median1[j+(i*384),]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5]) > > median2[j+(i*384),]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19]) > > median3[j+(i*384),]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12]) > > median4[j+(i*384),]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26]) > > > puce[,5]<-median1 > puce[,19]<-median2 > puce[,12]<-median3 > puce[,26]<-median4 > > > } > }The obvious (well, I haven't tried) *first* step (I don't want to rewrite your code here!) is, e.g., median1 <- median2 <- median3 <- median4 <- numeric(4608) v <- c(18,19,20,21,23) for (i in 0:11) { j <- 1:384 median1[j+(i*384)]<-puce[j+(i*384),5]+median(puce[v+384*i,2]-puce[v+384*i,5]) median2[j+(i*384)]<-puce[j+(i*384),19]+median(puce[v+384*i,16]-puce[v+384*i,19]) median3[j+(i*384)]<-puce[j+(i*384),12]+median(puce[v+384*i,9]-puce[v+384*i,12]) median4[j+(i*384)]<-puce[j+(i*384),26]+median(puce[v+384*i,23]-puce[v+384*i,26]) } puce[,5]<-median1 puce[,19]<-median2 puce[,12]<-median3 puce[,26]<-median4 Uwe Ligges
"Alessandro Semeria" alessandro.semeria at cramont.it wrote:
Is well know that R is inefficent on loops.
This is a dangerous half-truth. R is an interpreted language.
The interpreter uses techniques similar to those used in Scheme
interpreters. As interpreters go, it's pretty good. For comparison,
in processing XML documents, I've had interpreted Scheme running rings
around compiled Java (by doing the task a different way, of course).
Also for comparison, years ago I had a Prolog program for median
polish that made a published Fortran program for median polish look
sick (by using a much better data structure). With Luke Tierney's
byte-code compiler, I expect R loops will become close to as efficient
as Python ones, and people run entire web sites with Python.
It is more accurate to say that R code qua R code is not as efficient
as the large body of "primitives" that operate on entire arrays.
When you have to perform "heavy" loop
is better to use a call to fortran or c code (.Fortran() , .C() functions)
Even if the premiss were literally and exactly true, the conclusion
would not follow. When you have a speed problem with R code,
(1) Find out where the problem is, exactly. People's intuition about
performance bottlenecks is notoriously bad. Do what the experts do:
*measure*.
(2) Try to restructure the code *entirely in R* to be as clear and high
level as possible. If there have to be subscripts, at least let them
be vector subscripts.
(3) Measure again. Chances are that making the code clear and high level
has fixed the performance problem.
(4) If that fails, try restructuring the code a couple of ways,
*entirely in R*. The two basic techniques for optimising a calculation
are (a) eliminate it entirely and (b) if you can't eliminate the first
evaluation of an expression, eliminate the second by saving the result.
As a special case of (b), try moving things out of loops; try splitting
a calculation into a part that changes a lot and a part that changes
very little, and update the small-change part only when you have to.
Perhaps apply the idea of program differentiation. (NOT the idea of
taking a function that computes a value and automatically computing
a function that computes the derivative of the first, but the idea of
saying if I have z<-f(x,y) and I make a small change to x, do I have
to recompute z completely or can I came a small change to z?)
Try to use built in operations as much as possible on data structures
that are as large as appropriate.
(5) Measure again. This will probably have fixed the performance problem.
(6) If all else fails, now it's time to try Fortran or C. It's too bad
there isn't an existing Fortran or C module you can just call, if there
had been you'd have used that before writing the original R code.