thr3ads.net - R help - [R] Calculate daily means from 5-minute interval data [Aug 2021]

If this information is useful, please help other people find it:
Share via:

Richard O'Keefe

2021-Aug-30 02:09 UTC

[R] Calculate daily means from 5-minute interval data

Why would you need a package for this?> samples.per.day <- 12*24
That's 12 5-minute intervals per hour and 24 hours per day.
Generate some fake data.
> x <- rnorm(samples.per.day * 365)
> length(x)[1] 105120

Reshape the fake data into a matrix where each row represents one
24-hour period.
> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Now we can summarise the rows any way we want.
The basic tool here is ?apply.
?rowMeans is said to be faster than using apply to calculate means,
so we'll use that.  There is no *rowSds so we have to use apply
for the standard deviation.  I use ?head because I don't want to
post tens of thousands of meaningless numbers.
> head(rowMeans(m))[1] -0.03510177  0.11817337  0.06725203 -0.03578195 -0.02448077
-0.03033692> head(apply(m, MARGIN=1, FUN=sd))[1] 1.0017718 0.9922920 1.0100550 0.9956810 1.0077477 0.9833144

Now whether this is a *sensible* way to summarise your flow data is a question
that a hydrologist would be better placed to answer.  I would have started
with> plot(density(x))which I just did with some real river data (only a month of it, sigh).
Very long tail.
Even> plot(density(log(r)))shows a very long tail.  Time to plot the data against time.  Oh my!
All of the long tail came from a single event.
There's a period of low flow, then there's a big rainstorm and the
flow goes WAY up, then over about two days the flow subsides to a new
somewhat higher level.

None of this is reflected in means or standard deviations.
This is *time series* data, and time series data of a fairly special kind.

One thing that might be helpful with your data would simply
be> image(log(m))For my one month sample, the spike showed up very clearly that way.
Because right now, your first task is to get an idea of what the data
look like, and means-and-standard-deviations won't really do that.

Oh heck, here's another reason to go with image(log(m)).
With image(m) I just see the one big spike.
With image(log(m)), I can see that little spikes often start in the
afternoon of one day and continue into the morning of the
next.>From daily means, it looks like two unusual, but not veryunusual, days.  From the image, it's clearly ONE rainfall event
that just happens to straddle a day boundary.

This is all very basic stuff, which is really the point.  You want to use
elementary tools to look at the data before you reach for fancy ones.

On Mon, 30 Aug 2021 at 03:09, Rich Shepard <rshepard at appl-ecosys.com>
wrote:>
> I have a year's hydraulic data (discharge, stage height, velocity,
etc.)
> from a USGS monitoring gauge recording values every 5 minutes. The data
> files contain 90K-93K lines and plotting all these data would produce a
> solid block of color.
>
> What I want are the daily means and standard deviation from these data.
>
> As an occasional R user (depending on project needs) I've no idea what
> packages could be applied to these data frames. There likely are multiple
> paths to extracting these daily values so summary statistics can be
> calculated and plotted. I'd appreciate suggestions on where to start to
> learn how I can do this.
>
> TIA,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2021-Aug-30 02:47 UTC

head link

[R] Calculate daily means from 5-minute interval data

IMO assuming periodicity is a bad practice for this. Missing timestamps happen
too, and there is no reason to build a broken analysis process.

On August 29, 2021 7:09:01 PM PDT, Richard O'Keefe <raoknz at
gmail.com> wrote:>Why would you need a package for this?
>> samples.per.day <- 12*24
>
>That's 12 5-minute intervals per hour and 24 hours per day.
>Generate some fake data.
>
>> x <- rnorm(samples.per.day * 365)
>> length(x)
>[1] 105120
>
>Reshape the fake data into a matrix where each row represents one
>24-hour period.
>
>> m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
>
>Now we can summarise the rows any way we want.
>The basic tool here is ?apply.
>?rowMeans is said to be faster than using apply to calculate means,
>so we'll use that.  There is no *rowSds so we have to use apply
>for the standard deviation.  I use ?head because I don't want to
>post tens of thousands of meaningless numbers.
>
>> head(rowMeans(m))
>[1] -0.03510177  0.11817337  0.06725203 -0.03578195 -0.02448077 -0.03033692
>> head(apply(m, MARGIN=1, FUN=sd))
>[1] 1.0017718 0.9922920 1.0100550 0.9956810 1.0077477 0.9833144
>
>Now whether this is a *sensible* way to summarise your flow data is a
question
>that a hydrologist would be better placed to answer.  I would have started
with
>> plot(density(x))
>which I just did with some real river data (only a month of it, sigh).
>Very long tail.
>Even
>> plot(density(log(r)))
>shows a very long tail.  Time to plot the data against time.  Oh my!
>All of the long tail came from a single event.
>There's a period of low flow, then there's a big rainstorm and the
>flow goes WAY up, then over about two days the flow subsides to a new
>somewhat higher level.
>
>None of this is reflected in means or standard deviations.
>This is *time series* data, and time series data of a fairly special kind.
>
>One thing that might be helpful with your data would simply be
>> image(log(m))
>For my one month sample, the spike showed up very clearly that way.
>Because right now, your first task is to get an idea of what the data
>look like, and means-and-standard-deviations won't really do that.
>
>Oh heck, here's another reason to go with image(log(m)).
>With image(m) I just see the one big spike.
>With image(log(m)), I can see that little spikes often start in the
>afternoon of one day and continue into the morning of the next.
>From daily means, it looks like two unusual, but not very
>unusual, days.  From the image, it's clearly ONE rainfall event
>that just happens to straddle a day boundary.
>
>This is all very basic stuff, which is really the point.  You want to use
>elementary tools to look at the data before you reach for fancy ones.
>
>
>On Mon, 30 Aug 2021 at 03:09, Rich Shepard <rshepard at
appl-ecosys.com> wrote:
>>
>> I have a year's hydraulic data (discharge, stage height, velocity,
etc.)
>> from a USGS monitoring gauge recording values every 5 minutes. The data
>> files contain 90K-93K lines and plotting all these data would produce a
>> solid block of color.
>>
>> What I want are the daily means and standard deviation from these data.
>>
>> As an occasional R user (depending on project needs) I've no idea
what
>> packages could be applied to these data frames. There likely are
multiple
>> paths to extracting these daily values so summary statistics can be
>> calculated and plotted. I'd appreciate suggestions on where to
start to
>> learn how I can do this.
>>
>> TIA,
>>
>> Rich
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Rich Shepard

2021-Aug-30 12:42 UTC

head link

[R] Calculate daily means from 5-minute interval data

On Mon, 30 Aug 2021, Richard O'Keefe wrote:
> Why would you need a package for this?
>> samples.per.day <- 12*24
>
> That's 12 5-minute intervals per hour and 24 hours per day.
> Generate some fake data.
Richard,

The problem is that there are days with fewer than 12 recorded values for
various reasons.

When testing algorithms I use small subsets of actual data rather than fake
data.

Thanks for your detailed procedure.

Regards,

Rich

R help - Aug 2021 - Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data

[R] Calculate daily means from 5-minute interval data