Ulrik Stervbo
2016-Jul-05 04:03 UTC
[R] dplyr : row total for all groups in dplyr summarise
That will give you the wrong result when used on summarised data David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 02:10:> I thought there was an nrow() function? > > Sent from my iPhone > > On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: > > If you want the total number of rows in the original data.frame after > counting the rows in each group, you can ungroup and sum the row counts, > like: > > library("dplyr") > > > mtcars %>% > group_by (am, gear) %>% > summarise (n=n()) %>% > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% > ungroup() %>% > mutate(row.tot = sum(n)) > > HTH > Ulrik > > On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net> > wrote: > >> >> > On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote: >> > >> > Hello, >> > How can I aggregate row total for all groups in dplyr summarise ? >> >> Row total ? of what? Aggregate ? how? What is the desired answer? >> >> >> >> > library(dplyr) >> > mtcars %>% >> > group_by (am, gear) %>% >> > summarise (n=n()) %>% >> > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >> > >> > best regard >> > Maicel Monzon >> > >> > >> > >> > ---------------------------------------------------------------- >> > >> > >> > >> > >> > -- >> > Este mensaje le ha llegado mediante el servicio de correo electronico >> que ofrece Infomed para respaldar el cumplimiento de las misiones del >> Sistema Nacional de Salud. La persona que envia este correo asume el >> compromiso de usar el servicio a tales fines y cumplir con las regulaciones >> establecidas >> > >> > Infomed: http://www.sld.cu/ >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
David Winsemius
2016-Jul-05 05:45 UTC
[R] dplyr : row total for all groups in dplyr summarise
nrow(mtcars) Sent from my iPhone> On Jul 4, 2016, at 9:03 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: > > That will give you the wrong result when used on summarised data > > > David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 02:10: >> I thought there was an nrow() function? >> >> Sent from my iPhone >> >>> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: >>> >>> If you want the total number of rows in the original data.frame after counting the rows in each group, you can ungroup and sum the row counts, like: >>> >>> library("dplyr") >>> >>> >>> mtcars %>% >>> group_by (am, gear) %>% >>> summarise (n=n()) %>% >>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >>> ungroup() %>% >>> mutate(row.tot = sum(n)) >>> >>> HTH >>> Ulrik >>> >>>> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net> wrote: >>>> >>>> > On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote: >>>> > >>>> > Hello, >>>> > How can I aggregate row total for all groups in dplyr summarise ? >>>> >>>> Row total ? of what? Aggregate ? how? What is the desired answer? >>>> >>>> >>>> >>>> > library(dplyr) >>>> > mtcars %>% >>>> > group_by (am, gear) %>% >>>> > summarise (n=n()) %>% >>>> > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >>>> > >>>> > best regard >>>> > Maicel Monzon >>>> > >>>> > >>>> > >>>> > ---------------------------------------------------------------- >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas >>>> > >>>> > Infomed: http://www.sld.cu/ >>>> > >>>> > ______________________________________________ >>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> > https://stat.ethz.ch/mailman/listinfo/r-help >>>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Ulrik Stervbo
2016-Jul-05 05:50 UTC
[R] dplyr : row total for all groups in dplyr summarise
Yes. But in the sample code the data is summarised. In which case you get 4 rows and not the correct 32. On Tue, 5 Jul 2016, 07:48 David Winsemius, <dwinsemius at comcast.net> wrote:> nrow(mtcars) > > > Sent from my iPhone > > On Jul 4, 2016, at 9:03 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: > > That will give you the wrong result when used on summarised data > > David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 > 02:10: > >> I thought there was an nrow() function? >> >> Sent from my iPhone >> >> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> >> wrote: >> >> If you want the total number of rows in the original data.frame after >> counting the rows in each group, you can ungroup and sum the row counts, >> like: >> >> library("dplyr") >> >> >> mtcars %>% >> group_by (am, gear) %>% >> summarise (n=n()) %>% >> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >> ungroup() %>% >> mutate(row.tot = sum(n)) >> >> HTH >> Ulrik >> >> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net> >> wrote: >> >>> >>> > On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote: >>> > >>> > Hello, >>> > How can I aggregate row total for all groups in dplyr summarise ? >>> >>> Row total ? of what? Aggregate ? how? What is the desired answer? >>> >>> >>> >>> > library(dplyr) >>> > mtcars %>% >>> > group_by (am, gear) %>% >>> > summarise (n=n()) %>% >>> > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >>> > >>> > best regard >>> > Maicel Monzon >>> > >>> > >>> > >>> > ---------------------------------------------------------------- >>> > >>> > >>> > >>> > >>> > -- >>> > Este mensaje le ha llegado mediante el servicio de correo electronico >>> que ofrece Infomed para respaldar el cumplimiento de las misiones del >>> Sistema Nacional de Salud. La persona que envia este correo asume el >>> compromiso de usar el servicio a tales fines y cumplir con las regulaciones >>> establecidas >>> > >>> > Infomed: http://www.sld.cu/ >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]
G.Maubach at weinwolf.de
2016-Jul-05 09:27 UTC
[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise
Hi guys, I checked out your example but I can't follow the results.:> mtcars %>%+ group_by (am, gear) %>% + summarise (n=n()) %>% + mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% + ungroup() %>% + mutate(row.tot = sum(n)) Source: local data frame [4 x 5] am gear n rel.freq row.tot (dbl) (dbl) (int) (chr) (int) 1 0 3 15 79% 32 2 0 4 4 21% 32 3 1 4 8 62% 32 4 1 5 5 38% 32 We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. The same with the other columns. How is 79 % calculated? When searching the web I saw this example: -- cut -- #-- not run -- url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv" response <- GET(url) Hollywoodmovies2011 <- content(x = GET(url), as = data.frame) #-- end not run Hollywoodmovies2011 %>% group_by(genre) %>% summarize(count = n()) %>% mutate(rf = count / sum(count)) -- cut -- which gives Source: local data frame [9 x 3] Genre count % (fctr) (int) (dbl) 1 Action 32 0.235294118 2 Adventure 1 0.007352941 3 Animation 12 0.088235294 4 Comedy 27 0.198529412 5 Drama 21 0.154411765 6 Fantasy 2 0.014705882 7 Horror 17 0.125000000 8 Romance 11 0.080882353 9 Thriller 13 0.095588235 Here the % correspond to the count and the sum of count, e. g. sum = 136 and 32 / 136 = 0,2352941. What is the difference when counting? What do the relative counts in the first example mean? Kind regards Georg Von: Ulrik Stervbo <ulrik.stervbo at gmail.com> An: David Winsemius <dwinsemius at comcast.net>, Kopie: r-help at r-project.org, maicel at infomed.sld.cu Datum: 05.07.2016 06:06 Betreff: Re: [R] dplyr : row total for all groups in dplyr summarise Gesendet von: "R-help" <r-help-bounces at r-project.org> That will give you the wrong result when used on summarised data David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 02:10:> I thought there was an nrow() function? > > Sent from my iPhone > > On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>wrote:> > If you want the total number of rows in the original data.frame after > counting the rows in each group, you can ungroup and sum the row counts, > like: > > library("dplyr") > > > mtcars %>% > group_by (am, gear) %>% > summarise (n=n()) %>% > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% > ungroup() %>% > mutate(row.tot = sum(n)) > > HTH > Ulrik > > On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net> > wrote: > >> >> > On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote: >> > >> > Hello, >> > How can I aggregate row total for all groups in dplyr summarise ? >> >> Row total ? of what? Aggregate ? how? What is the desired answer? >> >> >> >> > library(dplyr) >> > mtcars %>% >> > group_by (am, gear) %>% >> > summarise (n=n()) %>% >> > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >> > >> > best regard >> > Maicel Monzon >> > >> > >> > >> > ---------------------------------------------------------------- >> > >> > >> > >> > >> > -- >> > Este mensaje le ha llegado mediante el servicio de correo electronico >> que ofrece Infomed para respaldar el cumplimiento de las misiones del >> Sistema Nacional de Salud. La persona que envia este correo asume el >> compromiso de usar el servicio a tales fines y cumplir con lasregulaciones>> establecidas >> > >> > Infomed: http://www.sld.cu/ >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2016-Jul-05 16:47 UTC
[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise
> On Jul 5, 2016, at 2:27 AM, G.Maubach at weinwolf.de wrote: > > Hi guys, > > I checked out your example but I can't follow the results.: > >> mtcars %>% > + group_by (am, gear) %>% > + summarise (n=n()) %>% > + mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% > + ungroup() %>% > + mutate(row.tot = sum(n)) > Source: local data frame [4 x 5] > > am gear n rel.freq row.tot > (dbl) (dbl) (int) (chr) (int) > 1 0 3 15 79% 32 > 2 0 4 4 21% 32 > 3 1 4 8 62% 32 > 4 1 5 5 38% 32 > > We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. > The same with the other columns. How is 79 % calculated? >It is apparently the number of items in the first ?group determinant?> mtcars %>%+ group_by (am, gear) %>% + summarise (n=n()) %>% + mutate(sum = sum(n)) %>% + ungroup() Source: local data frame [4 x 4] am gear n sum (dbl) (dbl) (int) (int) 1 0 3 15 19 2 0 4 4 19 3 1 4 8 13 4 1 5 5 13> ?n > with(mtcars,table(am,gear))gear am 3 4 5 0 15 4 0 1 0 8 5 The documentation for the `n` functions is particularly unhelpful in letting one know what to expect from it: "Description This function is implemented special for each data source and can only be used from within summarise, mutate and filter" ? David.> When searching the web I saw this example: > > -- cut -- > > #-- not run -- > url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv" > response <- GET(url) > Hollywoodmovies2011 <- content(x = GET(url), as = data.frame) > #-- end not run > > Hollywoodmovies2011 %>% > group_by(genre) %>% > summarize(count = n()) %>% > mutate(rf = count / sum(count)) > > -- cut -- > > which gives > > Source: local data frame [9 x 3] > > Genre count % > (fctr) (int) (dbl) > 1 Action 32 0.235294118 > 2 Adventure 1 0.007352941 > 3 Animation 12 0.088235294 > 4 Comedy 27 0.198529412 > 5 Drama 21 0.154411765 > 6 Fantasy 2 0.014705882 > 7 Horror 17 0.125000000 > 8 Romance 11 0.080882353 > 9 Thriller 13 0.095588235 > > Here the % correspond to the count and the sum of count, e. g. sum = 136 > and 32 / 136 = 0,2352941. > > What is the difference when counting? What do the relative counts in the > first example mean? > > Kind regards > > Georg > > > > > > Von: Ulrik Stervbo <ulrik.stervbo at gmail.com> > An: David Winsemius <dwinsemius at comcast.net>, > Kopie: r-help at r-project.org, maicel at infomed.sld.cu > Datum: 05.07.2016 06:06 > Betreff: Re: [R] dplyr : row total for all groups in dplyr > summarise > Gesendet von: "R-help" <r-help-bounces at r-project.org> > > > > That will give you the wrong result when used on summarised data > > David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016 > 02:10: > >> I thought there was an nrow() function? >> >> Sent from my iPhone >> >> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> > wrote: >> >> If you want the total number of rows in the original data.frame after >> counting the rows in each group, you can ungroup and sum the row counts, >> like: >> >> library("dplyr") >> >> >> mtcars %>% >> group_by (am, gear) %>% >> summarise (n=n()) %>% >> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >> ungroup() %>% >> mutate(row.tot = sum(n)) >> >> HTH >> Ulrik >> >> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net> >> wrote: >> >>> >>>> On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote: >>>> >>>> Hello, >>>> How can I aggregate row total for all groups in dplyr summarise ? >>> >>> Row total ? of what? Aggregate ? how? What is the desired answer? >>> >>> >>> >>>> library(dplyr) >>>> mtcars %>% >>>> group_by (am, gear) %>% >>>> summarise (n=n()) %>% >>>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >>>> >>>> best regard >>>> Maicel Monzon >>>> >>>> >>>> >>>> ---------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> -- >>>> Este mensaje le ha llegado mediante el servicio de correo electronico >>> que ofrece Infomed para respaldar el cumplimiento de las misiones del >>> Sistema Nacional de Salud. La persona que envia este correo asume el >>> compromiso de usar el servicio a tales fines y cumplir con las > regulaciones >>> establecidas >>>> >>>> Infomed: http://www.sld.cu/ >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >