I think we wandered away into a package rather than base R, but the request
seems easy enough.
Just FYI, Rich, as you seem not to have incorporated the advice we gave yet
about the first argument, your use of group_by() is a tad odd.
disc %>%
group_by(hour) %>%
group_by(day) %>%
group_by(year, month) %>%
summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))
Not sure why you use disc once and disc_by_month the second superfluous time but
if you read the manual page for group_by()
https://dplyr.tidyverse.org/reference/group_by.html you may note it tends to be
called ONCE with multiple arguments in sequence that specify what columns in the
data.frame to group by sequentially.
disc %>%
group_by(hour, day, year, month) %>%
summarize(vol = mean(cfs, na.rm = TRUE))
Not sure most people would group that way as the above sorts by hours first.
Many might reverse that sequence.
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 6:32 PM
To: R mailing list <r-help at r-project.org>
Subject: Re: [R] tidyverse: grouped summaries (with summerize)
On Tue, 14 Sep 2021, Eric Berger wrote:
> This code is not correct:
> disc_by_month %>%
> group_by(year, month) %>%
> summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) It should
> be:
> disc %>% group_by(year,month) %>% summarize(vol=mean(cfs,na.rm=TRUE)
Eric/Avi:
That makes no difference:> disc_by_month
# A tibble: 590,940 ? 6
# Groups: year, month [66]
year month day hour min cfs
<int> <int> <int> <int> <int> <dbl>
1 2016 3 3 12 0 149000
2 2016 3 3 12 10 150000
3 2016 3 3 12 20 151000
4 2016 3 3 12 30 156000
5 2016 3 3 12 40 154000
6 2016 3 3 12 50 150000
7 2016 3 3 13 0 153000
8 2016 3 3 13 10 156000
9 2016 3 3 13 20 154000
10 2016 3 3 13 30 155000
# ? with 590,930 more rows
I wondered if I need to group first by hour, then day, then year-month.
This, too, produces the same output:
disc %>%
group_by(hour) %>%
group_by(day) %>%
group_by(year, month) %>%
summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))
And disc shows the read dataframe.
I don't understand why the columns are not grouping.
Thanks,
Rich
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.