Other then the reengineering of the approach, one thing that helps is don't
index rows of data frames via loops... ever. It is actually faster to convert
to a matrix, do the operations, and then convert back to a data frame if you
have too.
As an example I have your code in a function:
foo = function(averagedreplicates, zz){
iindex = 1:(dim(averagedreplicates)[2])
for (i in iindex) {
cat(i,'\n') #calculates Meanss
#Sample A
averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
#Sample B
averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
#Sample C
averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
#Sample D
averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
}
return(averagedreplicates)
}
I then make matrix and data.frame versions of things similar in size to what you
are working with:
zz.as.m = matrix(runif(95000*41),95000,41)
zz.as.df = as.data.frame(zz.as.m)
ar.as.m = matrix(0,95000,21)
ar.as.df = as.data.frame(ar.as.m)
And we can time the matrix versions:
start = Sys.time()
x = foo(ar.as.m,zz.as.m)
stop = Sys.time()
stop-start # .06 seconds for me
And on the data frame versions?
#using the data frame versions
start = Sys.time()
x = foo(ar.as.df,zz.as.df)
stop = Sys.time()
stop-start # 31 seconds for me
It takes for me 516 times as long to do the same work in data frames as it would
have took in matrixes for me.
People say "never use loops in R", and I wish they wouldn't say it
like that because it distracts from the facts of the matter which is that
sometimes looping in R is quite reasonably fast. And sometimes... like when you
are indexing rows of a data frame it is horrible. These are the little things I
learned combing through my Masters project for speed.
The only caveat of following this advice of always do this sort of work in
matrixes is that it can be a little time consuming(developer time) repairing
factors. But in terms of code run time it is absolute essential to use the right
data structure for the job.
Hope this is of assistance,
Jeremiah Rounds
> Date: Mon, 8 Jun 2009 15:45:40 +0000
> From: amitrhelp@yahoo.co.uk
> To: r-help@r-project.org
> Subject: [R] help to speed up loops in r
>
>
> Hi
> i am using a script which involves the following loop. It attempts to
reduce a data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates)
of 95000 * 21 by averaging the replicate values as you can see in the script
below. This script however is very slow (2days). Any suggestions to speed it up.
>
> NB I have also tried using rowMeans rather than adding the 2 values and
dividing by 2. (same problem)
>
>
>
>
> #SCRIPT STARTS
> for (i in 1:length(averagedreplicates[,1]))
> #for (i in 1:dim(averagedreplicates)[1])
> {
> cat(i,'\n')
>
>
> #calculates Meanss
> #Sample A
> averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
> averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
> averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
> averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
> averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
>
> #Sample B
> averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
> averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
> averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
> averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
> averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
>
> #Sample C
> averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
> averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
> averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
> averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
> averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
>
> #Sample D
> averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
> averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
> averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
> averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
> averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
> }
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits.
rial_Storage_062009
[[alternative HTML version deleted]]