thr3ads.net - R help - [R] help to speed up loops in r [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Amit Patel

2009-Jun-08 15:45 UTC

[R] help to speed up loops in r

Hi
i am using a script which involves the following loop. It attempts to reduce a
data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates) of 95000
* 21 by averaging the replicate values as you can see in the script below. This
script however is very slow (2days). Any suggestions to speed it up.

NB I have also tried using rowMeans rather than adding the 2 values and dividing
by 2. (same problem)




#SCRIPT STARTS
for (i in 1:length(averagedreplicates[,1]))
#for (i in 1:dim(averagedreplicates)[1])
{
cat(i,'\n')


#calculates Meanss
#Sample A
averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2

#Sample B
averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2

#Sample C
averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2

#Sample D
averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
  }

Jorge Ivan Velez

2009-Jun-08 16:03 UTC

head link

[R] help to speed up loops in r

Dear Amit,
The following should get you started:

# Some data
set.seed(123)
X <- matrix(rnorm(20*10), ncol=10)
X

# Group of replicates
g <- rep(1:(ncol(X)/2), each=2)
g

# Mean of replicate variables
t(apply(X, 1, tapply, g, mean, na.rm = TRUE))

I created a grouping variable (g) and then calculate the mean by row (
apply(X, 1,...) ) for each level of g (that's why I included tapply).

I have not checked timing but I guess it is faster than the script you
already have.

HTH,

Jorge



On Mon, Jun 8, 2009 at 11:45 AM, Amit Patel <amitrhelp@yahoo.co.uk> wrote:
>
> Hi
> i am using a script which involves the following loop. It attempts to
> reduce a data frame(zz) of 95000 * 41 down to a data frame
> (averagedreplicates) of 95000 * 21 by averaging the replicate values as you
> can see in the script below. This script however is very slow (2days). Any
> suggestions to speed it up.
>
> NB I have also tried using rowMeans rather than adding the 2 values and
> dividing by 2. (same problem)
>
>
>
>
> #SCRIPT STARTS
> for (i in 1:length(averagedreplicates[,1]))
> #for (i in 1:dim(averagedreplicates)[1])
> {
> cat(i,'\n')
>
>
> #calculates Meanss
> #Sample A
> averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
> averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
> averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
> averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
> averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
>
> #Sample B
> averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
> averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
> averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
> averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
> averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
>
> #Sample C
> averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
> averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
> averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
> averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
> averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
>
> #Sample D
> averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
> averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
> averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
> averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
> averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
>  }
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeremiah Rounds

2009-Jun-09 09:20 UTC

head link

[R] help to speed up loops in r

Other then the reengineering of the approach, one thing that helps is don't
index rows of data frames via loops... ever.  It is actually faster to convert
to a matrix, do the operations, and then convert back to a data frame if you
have too.

 

As an example I have your code in a function:

 

foo = function(averagedreplicates, zz){
    iindex = 1:(dim(averagedreplicates)[2])
    for (i in iindex) {
     cat(i,'\n')  #calculates Meanss
        #Sample A
     averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
     averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
     averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
     averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
     averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
     #Sample B
     averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
     averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
     averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
     averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
     averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
     #Sample C
     averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
     averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
     averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
     averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
     averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
     #Sample D
     averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
     averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
     averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
     averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
     averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
    }
    return(averagedreplicates)
}

 

I then make matrix and data.frame versions of things similar in size to what you
are working with:

 

zz.as.m = matrix(runif(95000*41),95000,41)
zz.as.df = as.data.frame(zz.as.m)
ar.as.m = matrix(0,95000,21)
ar.as.df = as.data.frame(ar.as.m)


 

And we can time the matrix versions:

 

start = Sys.time()
x = foo(ar.as.m,zz.as.m)
stop = Sys.time()
stop-start  # .06 seconds for me

 

 

And on the data frame versions?

 

#using the data frame versions
start = Sys.time()
x = foo(ar.as.df,zz.as.df)
stop = Sys.time()
stop-start  # 31 seconds for me

 

 

It takes for me 516 times as long to do the same work in data frames as it would
have took in matrixes for me.

 

People say "never use loops in R", and I wish they wouldn't say it
like that because it distracts from the facts of the matter which is that
sometimes looping in R is quite reasonably fast.  And sometimes... like when you
are indexing rows of a data frame it is horrible.  These are the little things I
learned combing through my Masters project for speed.

 

The only caveat of following this advice of always do this sort of work in
matrixes is that it can be a little time consuming(developer time)  repairing
factors. But in terms of code run time it is absolute essential to use the right
data structure for the job.

 

Hope this is of assistance,

Jeremiah Rounds

  
 > Date: Mon, 8 Jun 2009 15:45:40 +0000
> From: amitrhelp@yahoo.co.uk
> To: r-help@r-project.org
> Subject: [R] help to speed up loops in r
> 
> 
> Hi
> i am using a script which involves the following loop. It attempts to
reduce a data frame(zz) of 95000 * 41 down to a data frame (averagedreplicates)
of 95000 * 21 by averaging the replicate values as you can see in the script
below. This script however is very slow (2days). Any suggestions to speed it up.
> 
> NB I have also tried using rowMeans rather than adding the 2 values and
dividing by 2. (same problem)
> 
> 
> 
> 
> #SCRIPT STARTS
> for (i in 1:length(averagedreplicates[,1]))
> #for (i in 1:dim(averagedreplicates)[1])
> {
> cat(i,'\n')
> 
> 
> #calculates Meanss
> #Sample A
> averagedreplicates[i,2] <- (zz[i,2] + zz[i,3])/2
> averagedreplicates[i,3] <- (zz[i,4] + zz[i,5])/2
> averagedreplicates[i,4] <- (zz[i,6] + zz[i,7])/2
> averagedreplicates[i,5] <- (zz[i,8] + zz[i,9])/2
> averagedreplicates[i,6] <- (zz[i,10] + zz[i,11])/2
> 
> #Sample B
> averagedreplicates[i,7] <- (zz[i,12] + zz[i,13])/2
> averagedreplicates[i,8] <- (zz[i,14] + zz[i,15])/2
> averagedreplicates[i,9] <- (zz[i,16] + zz[i,17])/2
> averagedreplicates[i,10] <- (zz[i,18] + zz[i,19])/2
> averagedreplicates[i,11] <- (zz[i,20] + zz[i,21])/2
> 
> #Sample C
> averagedreplicates[i,12] <- (zz[i,22] + zz[i,23])/2
> averagedreplicates[i,13] <- (zz[i,24] + zz[i,25])/2
> averagedreplicates[i,14] <- (zz[i,26] + zz[i,27])/2
> averagedreplicates[i,15] <- (zz[i,28] + zz[i,29])/2
> averagedreplicates[i,16] <- (zz[i,30] + zz[i,31])/2
> 
> #Sample D
> averagedreplicates[i,17] <- (zz[i,32] + zz[i,33])/2
> averagedreplicates[i,18] <- (zz[i,34] + zz[i,35])/2
> averagedreplicates[i,19] <- (zz[i,36] + zz[i,37])/2
> averagedreplicates[i,20] <- (zz[i,38] + zz[i,39])/2
> averagedreplicates[i,21] <- (zz[i,40] + zz[i,41])/2
> }
> 
> 
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits. 

rial_Storage_062009
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more maybe matching threads

R help - Jun 2009 - help to speed up loops in r

[R] help to speed up loops in r

[R] help to speed up loops in r

[R] help to speed up loops in r

Reasonably Related Threads