Sam Albers
2011-Feb-11 22:45 UTC
[R] Summarizing a response variable based on an irregular time period
Hello, I have a question about working with dates in R. I would like to summarize a response variable based on a designated and irregular time period. The purpose of this is to compare the summarized values (which were sampled daily) to another variable that was sampled less frequently. Below is a trivial example where I would like to summarize the response variable dat$x such that I have average and sum values from Sept25-27 and Sept28-Oct1. Can anyone suggest an efficient way to deal with dates like this? As an extremely tedious previous effort, I simply created another grouping variable but I had to do this manually. For a large dataset this really isn't a good option. Thanks in advance! Sam library(plyr) dat <- data.frame(x = runif(6, 0, 125), date as.Date(c("2009-09-25","2009-09-26","2009-09-27","2009-09-28","2009-09-29","2009-09-30","2009-10-01"), format="%Y-%m-%d"), yy = letters[1:2], stringsAsFactors = TRUE) #If I was using a regular factor, I would do something like this and this is what I would be hoping for as a result (obviously switching yy for date as the grouping variable) ddply(dat, c("yy"), function(df) return(c(avg=mean(df$x), sum=sum(df$x)))) #This is the data.frame that I would like to compare to dat. dat2 <- data.frame(y = runif(2, 0, 125), date as.Date(c("2009-09-27","2009-10-01"), format="%Y-%m-%d")) -- ***************************************************** Sam Albers Geography Program University of Northern British Columbia 3333 University Way Prince George, British Columbia Canada, V2N 4Z9 phone: 250 960-6777 ***************************************************** [[alternative HTML version deleted]]