thr3ads.net - R help - [R] efficient writing of calculation involving each element of 2 data frames. [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Vikas N Kumar

2008-Feb-22 20:48 UTC

[R] efficient writing of calculation involving each element of 2 data frames.

Hi

I have 2 data.frames each of the same number of rows (approximately 30000 or
more entries).
They also have the same number of columns, lets say 2.
One column has the date, the other column has a double precision number. Let
the column names be V1, V2.

Now I want to calculate the correlation of the 2 sets of data, for the last
100 days for every day available in the data.frames.

My code looks like this :
# Let df1, and df2 be the 2 data frames with the required data
## begin code snippet

my_corr <- c();
for ( i_end in 100:nrow(df1)) {
       i_start <- i_end  - 99;
       my_corr[i_start] <-
cor(x=df1[i_start:i_end,"V2"],y=df2[i_start:i_end,"V2"])
}

## end of code snippet

This runs very slowly, and takes more than an hour to run if I have to
calculate correlation between 10 data sets leaving me with 45 runs of this
snippet or taking more than 30 minutes to run.

Is there an efficient  way to write  this piece of code where I can get it
to run faster ?

If I do something similar in Excel, it is much faster. But I have to use R,
since this is a part of a bigger program.

Any help will be appreciated.

Thanks and Regards
Vikas

	[[alternative HTML version deleted]]

Uwe Ligges

2008-Feb-25 11:03 UTC

head link

[R] efficient writing of calculation involving each element of 2 data frames.

Vikas N Kumar wrote:> Hi
> 
> I have 2 data.frames each of the same number of rows (approximately 30000
or
> more entries).
> They also have the same number of columns, lets say 2.
> One column has the date, the other column has a double precision number.
Let
> the column names be V1, V2.
> 
> Now I want to calculate the correlation of the 2 sets of data, for the last
> 100 days for every day available in the data.frames.
> 
> My code looks like this :
> # Let df1, and df2 be the 2 data frames with the required data
> ## begin code snippet
> 
> my_corr <- c();
> for ( i_end in 100:nrow(df1)) {
>        i_start <- i_end  - 99;
>        my_corr[i_start] <-
>
cor(x=df1[i_start:i_end,"V2"],y=df2[i_start:i_end,"V2"])
> }

I'd rather do it this way:

n <- nrow(df1) - 99
my_corr <- numeric(n)
i_end <- seq(n) + 99
dat1 <- df1[,"V2"]
dat2 <- df2[,"V2"]
for (i in seq(n)) {
        sq <- i:(i+99)
        my_corr[i] <- cor(x=dat1[sq], y=dat2[sq])
}


because most of your time has been consumed by the indexing function
  [.data.frame
as profiling shows. Type ?Rprof in order to learn to so profiling yourself.

Uwe Ligges



> ## end of code snippet
> 
> This runs very slowly, and takes more than an hour to run if I have to
> calculate correlation between 10 data sets leaving me with 45 runs of this
> snippet or taking more than 30 minutes to run.
> 
> Is there an efficient  way to write  this piece of code where I can get it
> to run faster ?
> 
> If I do something similar in Excel, it is much faster. But I have to use R,
> since this is a part of a bigger program.
> 
> Any help will be appreciated.
> 
> Thanks and Regards
> Vikas
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more seemingly similar threads

R help - Feb 2008 - efficient writing of calculation involving each element of 2 data frames.

[R] efficient writing of calculation involving each element of 2 data frames.

[R] efficient writing of calculation involving each element of 2 data frames.

Reasonably Related Threads