On Mon, 13 Sep 2021, Rich Shepard wrote:> That's what I thought I did. I'll rewrite the script and work toward the > output I need.Still not the correct syntax. Command is now: disc_by_month %>% group_by(year, month) %>% summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) and results are:> source('disc.R')`summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.> disc_by_month# A tibble: 590,940 ? 6 # Groups: year, month [66] year month day hour min cfs <int> <int> <int> <int> <int> <dbl> 1 2016 3 3 12 0 149000 2 2016 3 3 12 10 150000 3 2016 3 3 12 20 151000 4 2016 3 3 12 30 156000 5 2016 3 3 12 40 154000 6 2016 3 3 12 50 150000 7 2016 3 3 13 0 153000 8 2016 3 3 13 10 156000 9 2016 3 3 13 20 154000 10 2016 3 3 13 30 155000 # ? with 590,930 more rows The grouping is still not right. I expected to see a mean value for each month of each year in the data set, not for each minute. Rich
This code is not correct: disc_by_month %>% group_by(year, month) %>% summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) It should be: disc %>% group_by(year,month) %>% summarize(vol=mean(cfs,na.rm=TRUE) On Tue, Sep 14, 2021 at 12:51 AM Rich Shepard <rshepard at appl-ecosys.com> wrote:> On Mon, 13 Sep 2021, Rich Shepard wrote: > > > That's what I thought I did. I'll rewrite the script and work toward the > > output I need. > > Still not the correct syntax. Command is now: > disc_by_month %>% > group_by(year, month) %>% > summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) > > and results are: > > source('disc.R') > `summarise()` has grouped output by 'year', 'month'. You can override > using the `.groups` argument. > > > disc_by_month > # A tibble: 590,940 ? 6 > # Groups: year, month [66] > year month day hour min cfs > <int> <int> <int> <int> <int> <dbl> > 1 2016 3 3 12 0 149000 > 2 2016 3 3 12 10 150000 > 3 2016 3 3 12 20 151000 > 4 2016 3 3 12 30 156000 > 5 2016 3 3 12 40 154000 > 6 2016 3 3 12 50 150000 > 7 2016 3 3 13 0 153000 > 8 2016 3 3 13 10 156000 > 9 2016 3 3 13 20 154000 > 10 2016 3 3 13 30 155000 > # ? with 590,930 more rows > > The grouping is still not right. I expected to see a mean value for each > month of each year in the data set, not for each minute. > > Rich > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
As Eric has pointed out, perhaps Rich is not thinking pipelined. Summarize() takes a first argument as: summarise(.data=whatever, ...) But in a pipeline, you OMIT the first argument and let the pipeline supply an argument silently. What I think summarize saw was something like: summarize(. , disc_by_month, vol = mean(cfs, na.rm = TRUE)) There is now a superfluous SECOND argument in a place it expected not a data.frame type of variable but the name of a column in the hidden data.frame-like object it was passed. You do not have a column called disc_by_month and presumably some weird logic made it suggest it was replacing that by the first column or something. I hope this makes sense. You do not cobble a pipeline together from parts without carefully making sure all first arguments otherwise used are NOT used. And, just FYI, the subject line should not use a word that some see as the opposite companion of "winterize" ... -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Rich Shepard Sent: Monday, September 13, 2021 5:51 PM To: r-help at r-project.org Subject: Re: [R] tidyverse: grouped summaries (with summerize) On Mon, 13 Sep 2021, Rich Shepard wrote:> That's what I thought I did. I'll rewrite the script and work toward > the output I need.Still not the correct syntax. Command is now: disc_by_month %>% group_by(year, month) %>% summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) and results are:> source('disc.R')`summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.> disc_by_month# A tibble: 590,940 ? 6 # Groups: year, month [66] year month day hour min cfs <int> <int> <int> <int> <int> <dbl> 1 2016 3 3 12 0 149000 2 2016 3 3 12 10 150000 3 2016 3 3 12 20 151000 4 2016 3 3 12 30 156000 5 2016 3 3 12 40 154000 6 2016 3 3 12 50 150000 7 2016 3 3 13 0 153000 8 2016 3 3 13 10 156000 9 2016 3 3 13 20 154000 10 2016 3 3 13 30 155000 # ? with 590,930 more rows The grouping is still not right. I expected to see a mean value for each month of each year in the data set, not for each minute. Rich ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.