thr3ads.net - R help - [R] Calculate daily means from 5-minute interval data [Aug 2021]

If this information is useful, please help other people find it:
Share via:

Rich Shepard

2021-Aug-29 15:08 UTC

[R] Calculate daily means from 5-minute interval data

I have a year's hydraulic data (discharge, stage height, velocity, etc.)
from a USGS monitoring gauge recording values every 5 minutes. The data
files contain 90K-93K lines and plotting all these data would produce a
solid block of color.

What I want are the daily means and standard deviation from these data.

As an occasional R user (depending on project needs) I've no idea what
packages could be applied to these data frames. There likely are multiple
paths to extracting these daily values so summary statistics can be
calculated and plotted. I'd appreciate suggestions on where to start to
learn how I can do this.

TIA,

Rich

Eric Berger

2021-Aug-29 15:57 UTC

head link

[R] Calculate daily means from 5-minute interval data

Hi Rich,
Your request is a bit open-ended but here's a suggestion that might help
get you an answer.
Provide dummy data (e.g. 5-10 lines), say like the contents of a csv file,
and calculate by hand what you'd like to see in the plot. (And describe
what the plot would look like.)
It sounds like what you want could be done in a few lines of R code which
would work both on the dummy
data and the real data.

HTH,
Eric


On Sun, Aug 29, 2021 at 6:09 PM Rich Shepard <rshepard at appl-ecosys.com>
wrote:
> I have a year's hydraulic data (discharge, stage height, velocity,
etc.)
> from a USGS monitoring gauge recording values every 5 minutes. The data
> files contain 90K-93K lines and plotting all these data would produce a
> solid block of color.
>
> What I want are the daily means and standard deviation from these data.
>
> As an occasional R user (depending on project needs) I've no idea what
> packages could be applied to these data frames. There likely are multiple
> paths to extracting these daily values so summary statistics can be
> calculated and plotted. I'd appreciate suggestions on where to start to
> learn how I can do this.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2021-Aug-29 16:23 UTC

head link

[R] Calculate daily means from 5-minute interval data

The general idea is to create a "grouping" column with repeated values
for each day, and then to use aggregate to compute your combined results. The
dplyr package's group_by/summarise functions can also do this, and there are
also proponents of the data.table package which is high performance but tends to
depend on altering data in-place unlike most other R data handling functions.

Also pay attention to missing data... if you have any then you will need to
consider whether you want the strictness of na.rm=FALSE or permissiveness of
na.rm=TRUE for your aggregation functions.

On August 29, 2021 8:08:58 AM PDT, Rich Shepard <rshepard at
appl-ecosys.com> wrote:>I have a year's hydraulic data (discharge, stage height, velocity, etc.)
>from a USGS monitoring gauge recording values every 5 minutes. The data
>files contain 90K-93K lines and plotting all these data would produce a
>solid block of color.
>
>What I want are the daily means and standard deviation from these data.
>
>As an occasional R user (depending on project needs) I've no idea what
>packages could be applied to these data frames. There likely are multiple
>paths to extracting these daily values so summary statistics can be
>calculated and plotted. I'd appreciate suggestions on where to start to
>learn how I can do this.
>
>TIA,
>
>Rich
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Andrew Simmons

2021-Aug-29 17:13 UTC

head link

[R] Calculate daily means from 5-minute interval data

Hello,


I would suggest something like:


date <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"),
1)
time <- sprintf("%02d:%02d", rep(0:23, each = 12), seq.int(0, 55,
5))
x <- data.frame(
    date = rep(date, each = length(time)),
    time = time
)
x$cfs <- stats::rnorm(nrow(x))


cols2aggregate <- "cfs"  # add more as necessary


S <- split(x[cols2aggregate], x$date)


means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd,
na.rm TRUE)))

On Sun, Aug 29, 2021 at 11:09 AM Rich Shepard <rshepard at
appl-ecosys.com>
wrote:
> I have a year's hydraulic data (discharge, stage height, velocity,
etc.)
> from a USGS monitoring gauge recording values every 5 minutes. The data
> files contain 90K-93K lines and plotting all these data would produce a
> solid block of color.
>
> What I want are the daily means and standard deviation from these data.
>
> As an occasional R user (depending on project needs) I've no idea what
> packages could be applied to these data frames. There likely are multiple
> paths to extracting these daily values so summary statistics can be
> calculated and plotted. I'd appreciate suggestions on where to start to
> learn how I can do this.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Richard O'Keefe

2021-Aug-30 02:09 UTC

head link

[R] Calculate daily means from 5-minute interval data

Why would you need a package for this?> samples.per.day <- 12*24
That's 12 5-minute intervals per hour and 24 hours per day.
Generate some fake data.
> x <- rnorm(samples.per.day * 365)
> length(x)[1] 105120

Reshape the fake data into a matrix where each row represents one
24-hour period.
> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Now we can summarise the rows any way we want.
The basic tool here is ?apply.
?rowMeans is said to be faster than using apply to calculate means,
so we'll use that.  There is no *rowSds so we have to use apply
for the standard deviation.  I use ?head because I don't want to
post tens of thousands of meaningless numbers.
> head(rowMeans(m))[1] -0.03510177  0.11817337  0.06725203 -0.03578195 -0.02448077
-0.03033692> head(apply(m, MARGIN=1, FUN=sd))[1] 1.0017718 0.9922920 1.0100550 0.9956810 1.0077477 0.9833144

Now whether this is a *sensible* way to summarise your flow data is a question
that a hydrologist would be better placed to answer.  I would have started
with> plot(density(x))which I just did with some real river data (only a month of it, sigh).
Very long tail.
Even> plot(density(log(r)))shows a very long tail.  Time to plot the data against time.  Oh my!
All of the long tail came from a single event.
There's a period of low flow, then there's a big rainstorm and the
flow goes WAY up, then over about two days the flow subsides to a new
somewhat higher level.

None of this is reflected in means or standard deviations.
This is *time series* data, and time series data of a fairly special kind.

One thing that might be helpful with your data would simply
be> image(log(m))For my one month sample, the spike showed up very clearly that way.
Because right now, your first task is to get an idea of what the data
look like, and means-and-standard-deviations won't really do that.

Oh heck, here's another reason to go with image(log(m)).
With image(m) I just see the one big spike.
With image(log(m)), I can see that little spikes often start in the
afternoon of one day and continue into the morning of the
next.>From daily means, it looks like two unusual, but not veryunusual, days.  From the image, it's clearly ONE rainfall event
that just happens to straddle a day boundary.

This is all very basic stuff, which is really the point.  You want to use
elementary tools to look at the data before you reach for fancy ones.

On Mon, 30 Aug 2021 at 03:09, Rich Shepard <rshepard at appl-ecosys.com>
wrote:>
> I have a year's hydraulic data (discharge, stage height, velocity,
etc.)
> from a USGS monitoring gauge recording values every 5 minutes. The data
> files contain 90K-93K lines and plotting all these data would produce a
> solid block of color.
>
> What I want are the daily means and standard deviation from these data.
>
> As an occasional R user (depending on project needs) I've no idea what
> packages could be applied to these data frames. There likely are multiple
> paths to extracting these daily values so summary statistics can be
> calculated and plotted. I'd appreciate suggestions on where to start to
> learn how I can do this.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rich Shepard

2021-Sep-02 18:16 UTC

head link

[R] Calculate daily means from 5-minute interval data

On Mon, 30 Aug 2021, Richard O'Keefe wrote:
>> x <- rnorm(samples.per.day * 365)
>> length(x)
> [1] 105120
>
> Reshape the fake data into a matrix where each row represents one
> 24-hour period.
>
>> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Richard,

Now I understand the need to keep the date and time as a single datetime
column; separately dplyr's sumamrize() provides daily means (too many data
points to plot over 3-5 years). I reformatted the data to provide a
sampledatetime column and a values column.

If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either PST or PDT.)
> Now we can summarise the rows any way we want.
> The basic tool here is ?apply.
> ?rowMeans is said to be faster than using apply to calculate means,
> so we'll use that.  There is no *rowSds so we have to use apply
> for the standard deviation.  I use ?head because I don't want to
> post tens of thousands of meaningless numbers.
If I create a matrix using the above syntax the resulting rows contain all
recorded values for a specific day. What would be the syntax to collect all
values for each month?

This would result in 12 rows per year; the periods of record for the five
variables availble from that gauge station vary in length.

Regards,

Rich

R help - Aug 2021 - Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data