New user here. My goal is pull daily averages from a long dataset. I've been working with some code I got from this list from https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html The code how I have been using it is as follows: library(zoo) library(chron) DB<-read.table("/Users/me/Desktop/R/data.csv", sep=",", header=TRUE, as.is =TRUE) z<-zoo(LTER6$temp, chron(LTER6$Date, LTER6$Time)) z.day=aggregate(z, trunc, mean) #This last line gives me daily averages for my data Simple and elegant- and it works. Thanks to the author the hard part is over. But I plan to tweak it so I have some questions about why this works 1- The data I have has the date and time format as a single string like this "2006-04-09 10:20:00". But the code was set up to read the data in two columns ie- "2006-04-09" & "10:20:00". Is this how the chrom package expects to have the data, or is there a way I can change the code to read the data as a single column. For now I am chopping up my date and time data manually before I run R. 2- I've read the help on "as.is", and I'm not sure why I need that function in the first line of code. This is what my original data looks like (with header) if this helps answer this this question line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56 3. Finally- how does the function "trunc" know to aggregate the data by day? If I wanted to do monthly averages I would need to specify with "as.yearmon", but I don't seem to need to specify "day" anywhere in the code. Thanks in advance. Your help has saved me multiple hours of spreadsheet time. [[alternative HTML version deleted]]
On Thu, Oct 27, 2011 at 4:18 PM, Vinny Moriarty <vwmoriarty at gmail.com> wrote:> New user here. My goal is pull daily averages from a long dataset. > > I've been working with some code I got from this list from > > https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html > > > The code how I have been using it is as follows: > > library(zoo) > library(chron) > > DB<-read.table("/Users/me/Desktop/R/data.csv", sep=",", header=TRUE, as.is > =TRUE) > z<-zoo(LTER6$temp, chron(LTER6$Date, LTER6$Time)) > z.day=aggregate(z, trunc, mean) #This last line gives me daily averages for > my data > > > Simple and elegant- and it works. Thanks to the author the hard part is > over. But I plan to tweak it so I have some questions about why this works > > 1- The data I have has the date and time format as a single string like this > "2006-04-09 10:20:00". But the code was set up to read the data in two > columns ?ie- "2006-04-09" & "10:20:00". Is this how the chrom package > expects to have the data, or is there a way I can change the code to read > the data as a single column. For now I am chopping up my date and time data > manually before I run R. > > 2- ?I've read the help on "as.is", and I'm not sure why I need that function > in the first line of code. This is what my original data looks like (with > header) if this helps answer this this question > > line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp > 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63 > 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56 > > 3. Finally- how does the function "trunc" know to aggregate the data by day? > If I wanted to do monthly averages I would need to specify with > "as.yearmon", but I don't seem to need to specify "day" anywhere in the > code.That link is several years old. Since then the zoo package has gained additional capabilities. Assuming the 2nd field is the desired date/time and the last field on each line is the one you want try this read.zoo statement. See ?read.zoo and also try: vignette("zoo-read") library(zoo) library(chron) # create test file Lines <- "line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56" cat(Lines, "\n", file = "data.txt") # NULL fields are removed temp <- read.zoo("data.txt", FUN = as.chron, header = TRUE, sep = ",", colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA)) # daily temp.day <- read.zoo("data.txt", FUN = as.Date, header = TRUE, sep = ",", aggregate = mean, colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA)) # monthly temp.ym <- read.zoo("data.txt", FUN = as.yearmon, header = TRUE, sep = ",", aggregate = mean, colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA)) chron represents date/time internally as days since the Epoch + fraction of day for the time. Thus truncating to an integer removes the fractional part (i.e. the time) leaving the day. See R News 4/1. We could alternately just use the Date class in the base of R as shown above. If we had read in temp and wanted to aggregate it rather than read it straight into an aggregated form then here are some possibilities: aggregate(temp, trunc, mean) # daily aggregate(temp, as.Date, mean) # daily with Date class aggregate(temp, as.yearmon, mean) # monthly -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Le 27/10/11 22:18, Vinny Moriarty a ?crit :> New user here. My goal is pull daily averages from a long dataset. > > I've been working with some code I got from this list from > > https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html > > > The code how I have been using it is as follows: > > library(zoo) > library(chron) > > DB<-read.table("/Users/me/Desktop/R/data.csv", sep=",", header=TRUE, as.is > =TRUE) > z<-zoo(LTER6$temp, chron(LTER6$Date, LTER6$Time)) > z.day=aggregate(z, trunc, mean) #This last line gives me daily averages for > my data > > > Simple and elegant- and it works. Thanks to the author the hard part is > over. But I plan to tweak it so I have some questions about why this works > > 1- The data I have has the date and time format as a single string like this > "2006-04-09 10:20:00". But the code was set up to read the data in two > columns ie- "2006-04-09"& "10:20:00". Is this how the chrom package > expects to have the data, or is there a way I can change the code to read > the data as a single column. For now I am chopping up my date and time data > manually before I run R.> strsplit("2006-04-09 10:20:00", " ")[[1]][1] [1] "2006-04-09" > strsplit("2006-04-09 10:20:00", " ")[[1]][2] [1] "10:20:00" Then replace with z<-zoo(LTER6$temp, chron(strsplit(chron(LTER6$DateTime, " ")[[1]][1], strsplit(LTER6$DateTime, " ")[[1]][2]))> > 2- I've read the help on "as.is", and I'm not sure why I need that function > in the first line of code. This is what my original data looks like (with > header) if this helps answer this this question > > line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp > 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63 > 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56Don't know> > 3. Finally- how does the function "trunc" know to aggregate the data by day? > If I wanted to do monthly averages I would need to specify with > "as.yearmon", but I don't seem to need to specify "day" anywhere in the > code.Explanation (but not still a solution for month aggregate) The numerical coding format for date-time is that integer part is the number of days since a reference and the decimal part is the time. Then if you use trunc, two different times of the same day will be identical.