thr3ads.net - R help - [R] Help with aggregate and cor [Mar 2010]

If this information is useful, please help other people find it:
Share via:

James Marca

2010-Mar-10 02:36 UTC

[R] Help with aggregate and cor

Hello,

I do not understand the correct way to approach the following problem
in R.

I have observations of pairs of variables, v1, o1, v2, o2, etc,
observed every 30 seconds.  What I would like to do is compute the
correlation matrix, but not for all my data, just for, say 5 minutes
or 1 hour chunks.

In sql, what I would say is 

    select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1) as
corr1
    from raw30s 
    where id = 1201087  and 
          (ts between 'Mar 1, 2007' and 'Apr 1, 2007')
    group by id,tshour order by id,tshour;


I've pulled data from PostgreSQL into R, and have a dataframe
containing a timestamp column, v, and o (both numeric).

I created an grouping index for every 5 minutes along these lines:

    obsfivemin <- trunc(obsts,units="hours")
                   +( floor( (obsts$min / 5 ) ) * 5 * 60 )

(where obsts is the sql timestamp converted into a DateTime object)

Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass
just a single column at a time to cor, not the entire data frame.  It
worked for mean and sum, but not cor.

In desperation, I tried looping over the different 5 minute levels and
computing cor, but I'm so R-clueless I couldn't even figure out how to
assign to a variable inside of that loop!

code such as 

    for (f in fivemin){
        output[f] <- cor(df[grouper==f,]); }

failed, as I couldn't figure out how to initialize output so that
output[f] would accept the output of cor.

Any help or steering towards the proper R-way would be appreciated.

Regards,

James Marca

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Ista Zahn

2010-Mar-10 03:24 UTC

head link

[R] Help with aggregate and cor

Hi James,
It would really help if you gave us a sample of the data you are
working with. The following is not tested, because I don't have your
data and am too lazy to construct a similar example dataset for you,
but it might get you started.

You can try using a for loop along the lines of

output <- data.frame(obsfivemin = obsfivemin, 5min.cor
vector(length=length(obsfivemin)))
for (f in fivemin){
       output$5min.cor[obsfivemin==f] <- cor(df[obsfivemin==f,
c("v", "o")])
     }

Or you can try with the plyr package something like

cor.dat <- function(df) {
  cor(df[,c("v", "o")])
}

library(plyr)
dlply(df, obsfivemin, cor.dat)

Good luck,
Ista


On Tue, Mar 9, 2010 at 9:36 PM, James Marca <jmarca at
translab.its.uci.edu> wrote:> Hello,
>
> I do not understand the correct way to approach the following problem
> in R.
>
> I have observations of pairs of variables, v1, o1, v2, o2, etc,
> observed every 30 seconds. ?What I would like to do is compute the
> correlation matrix, but not for all my data, just for, say 5 minutes
> or 1 hour chunks.
>
> In sql, what I would say is
>
> ? ?select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1)
as corr1
> ? ?from raw30s
> ? ?where id = 1201087 ?and
> ? ? ? ? ?(ts between 'Mar 1, 2007' and 'Apr 1, 2007')
> ? ?group by id,tshour order by id,tshour;
>
>
> I've pulled data from PostgreSQL into R, and have a dataframe
> containing a timestamp column, v, and o (both numeric).
>
> I created an grouping index for every 5 minutes along these lines:
>
> ? ?obsfivemin <- trunc(obsts,units="hours")
> ? ? ? ? ? ? ? ? ? +( floor( (obsts$min / 5 ) ) * 5 * 60 )
>
> (where obsts is the sql timestamp converted into a DateTime object)
>
> Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass
> just a single column at a time to cor, not the entire data frame. ?It
> worked for mean and sum, but not cor.
>
> In desperation, I tried looping over the different 5 minute levels and
> computing cor, but I'm so R-clueless I couldn't even figure out how
to
> assign to a variable inside of that loop!
>
> code such as
>
> ? ?for (f in fivemin){
> ? ? ? ?output[f] <- cor(df[grouper==f,]); }
>
> failed, as I couldn't figure out how to initialize output so that
> output[f] would accept the output of cor.
>
> Any help or steering towards the proper R-way would be appreciated.
>
> Regards,
>
> James Marca
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

hvollmeier

2010-Mar-10 08:08 UTC

head link

[R] Help with aggregate and cor

James,

you may post your question to the R-SIG finance group with a small example. 
If I understand your problem correctly it's like converting tick data of
financial time series into aggregates. (to  1-minute, hourly, daily ... data
sets ). There are packages available for this kind of task that are very
fast and efficient. ( no looping ! )

regards Helmuth
-- 
View this message in context:
http://n4.nabble.com/Help-with-aggregate-and-cor-tp1586878p1587069.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2010-Mar-10 11:31 UTC

head link

[R] Help with aggregate and cor

The sqldf package can be used to manipulate R data frames with SQL
statements. See http://sqldf.googlecode.com


On Tue, Mar 9, 2010 at 9:36 PM, James Marca <jmarca at
translab.its.uci.edu> wrote:> Hello,
>
> I do not understand the correct way to approach the following problem
> in R.
>
> I have observations of pairs of variables, v1, o1, v2, o2, etc,
> observed every 30 seconds. ?What I would like to do is compute the
> correlation matrix, but not for all my data, just for, say 5 minutes
> or 1 hour chunks.
>
> In sql, what I would say is
>
> ? ?select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1)
as corr1
> ? ?from raw30s
> ? ?where id = 1201087 ?and
> ? ? ? ? ?(ts between 'Mar 1, 2007' and 'Apr 1, 2007')
> ? ?group by id,tshour order by id,tshour;
>
>
> I've pulled data from PostgreSQL into R, and have a dataframe
> containing a timestamp column, v, and o (both numeric).
>
> I created an grouping index for every 5 minutes along these lines:
>
> ? ?obsfivemin <- trunc(obsts,units="hours")
> ? ? ? ? ? ? ? ? ? +( floor( (obsts$min / 5 ) ) * 5 * 60 )
>
> (where obsts is the sql timestamp converted into a DateTime object)
>
> Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass
> just a single column at a time to cor, not the entire data frame. ?It
> worked for mean and sum, but not cor.
>
> In desperation, I tried looping over the different 5 minute levels and
> computing cor, but I'm so R-clueless I couldn't even figure out how
to
> assign to a variable inside of that loop!
>
> code such as
>
> ? ?for (f in fivemin){
> ? ? ? ?output[f] <- cor(df[grouper==f,]); }
>
> failed, as I couldn't figure out how to initialize output so that
> output[f] would accept the output of cor.
>
> Any help or steering towards the proper R-way would be appreciated.
>
> Regards,
>
> James Marca
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Mar 2010 - Help with aggregate and cor

[R] Help with aggregate and cor

[R] Help with aggregate and cor

[R] Help with aggregate and cor

[R] Help with aggregate and cor

Seemingly Similar Threads