Sam Albers
2011-Feb-11 22:45 UTC
[R] Summarizing a response variable based on an irregular time period
Hello,
I have a question about working with dates in R. I would like to summarize a
response variable based on a designated and irregular time period. The
purpose of this is to compare the summarized values (which were sampled
daily) to another variable that was sampled less frequently. Below is a
trivial example where I would like to summarize the response variable dat$x
such that I have average and sum values from Sept25-27 and Sept28-Oct1. Can
anyone suggest an efficient way to deal with dates like this? As an
extremely tedious previous effort, I simply created another grouping
variable but I had to do this manually. For a large dataset this really
isn't a good option.
Thanks in advance!
Sam
library(plyr)
dat <- data.frame(x = runif(6, 0, 125), date
as.Date(c("2009-09-25","2009-09-26","2009-09-27","2009-09-28","2009-09-29","2009-09-30","2009-10-01"),
format="%Y-%m-%d"), yy = letters[1:2], stringsAsFactors = TRUE)
#If I was using a regular factor, I would do something like this and this is
what I would be hoping for as a result (obviously switching yy for date as
the grouping variable)
ddply(dat, c("yy"), function(df) return(c(avg=mean(df$x),
sum=sum(df$x))))
#This is the data.frame that I would like to compare to dat.
dat2 <- data.frame(y = runif(2, 0, 125), date
as.Date(c("2009-09-27","2009-10-01"),
format="%Y-%m-%d"))
--
*****************************************************
Sam Albers
Geography Program
University of Northern British Columbia
3333 University Way
Prince George, British Columbia
Canada, V2N 4Z9
phone: 250 960-6777
*****************************************************
[[alternative HTML version deleted]]
