Jeff Newmiller
2021-Sep-02 19:40 UTC
[R] Calculate daily means from 5-minute interval data
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the key is
learning how to create the column that is the same for all records corresponding
to the time interval of interest.
If you convert the sampdate to POSIXct, the tz IS important, because most of us
use local timezones that respect daylight savings time, and a naive conversion
of standard time will run into trouble if R is assuming daylight savings time
applies. The lubridate package gets around this by always assuming UTC and
giving you a function to "fix" the timezone after the conversion. I
prefer to always be specific about timezones, at least by using so something
like
Sys.setenv( TZ = "Etc/GMT+8" )
which does not respect daylight savings.
Regarding using character data for identifying the month, in order to have clean
plots of the data I prefer to use the trunc function but it returns a POSIXlt so
I convert it to POSIXct:
discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate, units
= "months" ) )
Then any of various ways can be used to aggregate the records by that column.
On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsimmo at
gmail.com> wrote:>You could use 'split' to create a list of data frames, and then
apply a
>function to each to get the means and sds.
>
>
>cols <- "cfs" # add more as necessary
>S <- split(discharge[cols], format(discharge$sampdate, format =
"%Y-%m"))
>means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
>sds <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd,
na.rm >TRUE)))
>
>On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshepard at
appl-ecosys.com>
>wrote:
>
>> On Thu, 2 Sep 2021, Rich Shepard wrote:
>>
>> > If I correctly understand the output of as.POSIXlt each date and
time
>> > element is separate, so input such as 2016-03-03 12:00 would now
be 2016
>> 03
>> > 03 12 00 (I've not read how the elements are separated). (The
TZ is not
>> > important because all data are either PST or PDT.)
>>
>> Using this script:
>> discharge <- read.csv('../data/water/discharge.dat', header
= TRUE, sep >> ',', stringsAsFactors = FALSE)
>> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz =
"",
>> format = '%Y-%m-%d %H:%M',
>> optional = 'logical')
>> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
>>
>> I get this result:
>> > head(discharge)
>> sampdate cfs
>> 1 2016-03-03 12:00:00 149000
>> 2 2016-03-03 12:10:00 150000
>> 3 2016-03-03 12:20:00 151000
>> 4 2016-03-03 12:30:00 156000
>> 5 2016-03-03 12:40:00 154000
>> 6 2016-03-03 12:50:00 150000
>>
>> I'm completely open to suggestions on using this output to
calculate
>> monthly
>> means and sds.
>>
>> If dplyr:summarize() will do so please show me how to modify this
command:
>> disc_monthly <- ( discharge
>> %>% group_by(sampdate)
>> %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
>> because it produces daily means, not monthly means.
>>
>> TIA,
>>
>> Rich
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.
On Thu, 2 Sep 2021, Jeff Newmiller wrote:> Regardless of whether you use the lower-level split function, or the > higher-level aggregate function, or the tidyverse group_by function, the > key is learning how to create the column that is the same for all records > corresponding to the time interval of interest.Jeff, I tried responding to only you but my message bounced: <jdnewmil at dcn.davis.ca.us>: host d9300a.ess.barracudanetworks.com[209.222.82.252] said: 550 permanent failure for one or more recipients (jdnewmil at dcn.davis.ca.us:blocked) (in reply to end of DATA command) My response was not pertininet to the entire list, IMO, so I sent it to your address. Rich
What is the best way to read (from a text file) timestamps from the fall
time change, where there are two 1:15am's? E.g., here is an extract from a
US Geological Survey web site giving data on the river through our county
on 2020-11-01, when we changed from PDT to PST,
https://nwis.waterdata.usgs.gov/wa/nwis/uv/?cb_00010=on&cb_00060=on&cb_00065=on&format=rdb&site_no=12200500&period=&begin_date=2020-11-01&end_date=2020-11-05
.
The timestamps include the date and time as well as PDT or PST.
river <-
c("datetime,tz,discharge,height,temp",
"2020-11-01 00:00,PDT,20500,16.44,9.3",
"2020-11-01 00:15,PDT,20500,16.44,9.3",
"2020-11-01 00:30,PDT,20500,16.43,9.3",
"2020-11-01 00:45,PDT,20400,16.40,9.3",
"2020-11-01 01:00,PDT,20400,16.40,9.3",
"2020-11-01 01:00,PST,20200,16.34,9.2",
"2020-11-01 01:15,PDT,20400,16.39,9.3",
"2020-11-01 01:15,PST,20200,16.34,9.2",
"2020-11-01 01:30,PDT,20300,16.37,9.2",
"2020-11-01 01:30,PST,20100,16.31,9.2",
"2020-11-01 01:45,PDT,20300,16.35,9.2",
"2020-11-01 01:45,PST,20100,16.29,9.2",
"2020-11-01 02:00,PST,20100,16.29,9.2",
"2020-11-01 02:15,PST,20000,16.27,9.1",
"2020-11-01 02:30,PST,20000,16.26,9.1"
)
d <- read.table(text=river, sep=",",header=TRUE)
The entries are obviously not in time order.
Is there a simple way to read the timedate and tz columns together? One
way is to use d$tz to construct an offset that can be read with
strptime's "%z".
> d$POSIXct <-
as.POSIXct(paste(d$datetime,ifelse(d$tz=="PDT","-0700","-0800")),
format="%Y-%m-%d %H:%M %z")> d
datetime tz discharge height temp POSIXct
1 2020-11-01 00:00 PDT 20500 16.44 9.3 2020-11-01 00:00:00
2 2020-11-01 00:15 PDT 20500 16.44 9.3 2020-11-01 00:15:00
3 2020-11-01 00:30 PDT 20500 16.43 9.3 2020-11-01 00:30:00
4 2020-11-01 00:45 PDT 20400 16.40 9.3 2020-11-01 00:45:00
5 2020-11-01 01:00 PDT 20400 16.40 9.3 2020-11-01 01:00:00
6 2020-11-01 01:00 PST 20200 16.34 9.2 2020-11-01 01:00:00
7 2020-11-01 01:15 PDT 20400 16.39 9.3 2020-11-01 01:15:00
8 2020-11-01 01:15 PST 20200 16.34 9.2 2020-11-01 01:15:00
9 2020-11-01 01:30 PDT 20300 16.37 9.2 2020-11-01 01:30:00
10 2020-11-01 01:30 PST 20100 16.31 9.2 2020-11-01 01:30:00
11 2020-11-01 01:45 PDT 20300 16.35 9.2 2020-11-01 01:45:00
12 2020-11-01 01:45 PST 20100 16.29 9.2 2020-11-01 01:45:00
13 2020-11-01 02:00 PST 20100 16.29 9.2 2020-11-01 02:00:00
14 2020-11-01 02:15 PST 20000 16.27 9.1 2020-11-01 02:15:00
15 2020-11-01 02:30 PST 20000 16.26 9.1 2020-11-01
02:30:00> with(d[order(d$POSIXct),], plot(temp)) # monotonic temperature
-Bill
On Thu, Sep 2, 2021 at 12:41 PM Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
wrote:
> Regardless of whether you use the lower-level split function, or the
> higher-level aggregate function, or the tidyverse group_by function, the
> key is learning how to create the column that is the same for all records
> corresponding to the time interval of interest.
>
> If you convert the sampdate to POSIXct, the tz IS important, because most
> of us use local timezones that respect daylight savings time, and a naive
> conversion of standard time will run into trouble if R is assuming daylight
> savings time applies. The lubridate package gets around this by always
> assuming UTC and giving you a function to "fix" the timezone
after the
> conversion. I prefer to always be specific about timezones, at least by
> using so something like
>
> Sys.setenv( TZ = "Etc/GMT+8" )
>
> which does not respect daylight savings.
>
> Regarding using character data for identifying the month, in order to have
> clean plots of the data I prefer to use the trunc function but it returns a
> POSIXlt so I convert it to POSIXct:
>
> discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate,
> units = "months" ) )
>
> Then any of various ways can be used to aggregate the records by that
> column.
>
> On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsimmo at
gmail.com>
> wrote:
> >You could use 'split' to create a list of data frames, and then
apply a
> >function to each to get the means and sds.
> >
> >
> >cols <- "cfs" # add more as necessary
> >S <- split(discharge[cols], format(discharge$sampdate, format =
"%Y-%m"))
> >means <- do.call("rbind", lapply(S, colMeans, na.rm =
TRUE))
> >sds <- do.call("rbind", lapply(S, function(xx)
sapply(xx, sd, na.rm > >TRUE)))
> >
> >On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshepard at
appl-ecosys.com>
> >wrote:
> >
> >> On Thu, 2 Sep 2021, Rich Shepard wrote:
> >>
> >> > If I correctly understand the output of as.POSIXlt each date
and time
> >> > element is separate, so input such as 2016-03-03 12:00 would
now be
> 2016
> >> 03
> >> > 03 12 00 (I've not read how the elements are separated).
(The TZ is
> not
> >> > important because all data are either PST or PDT.)
> >>
> >> Using this script:
> >> discharge <- read.csv('../data/water/discharge.dat',
header = TRUE, sep
> > >> ',', stringsAsFactors = FALSE)
> >> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz =
"",
> >> format = '%Y-%m-%d
%H:%M',
> >> optional = 'logical')
> >> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
> >>
> >> I get this result:
> >> > head(discharge)
> >> sampdate cfs
> >> 1 2016-03-03 12:00:00 149000
> >> 2 2016-03-03 12:10:00 150000
> >> 3 2016-03-03 12:20:00 151000
> >> 4 2016-03-03 12:30:00 156000
> >> 5 2016-03-03 12:40:00 154000
> >> 6 2016-03-03 12:50:00 150000
> >>
> >> I'm completely open to suggestions on using this output to
calculate
> >> monthly
> >> means and sds.
> >>
> >> If dplyr:summarize() will do so please show me how to modify this
> command:
> >> disc_monthly <- ( discharge
> >> %>% group_by(sampdate)
> >> %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
> >> because it produces daily means, not monthly means.
> >>
> >> TIA,
> >>
> >> Rich
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]