Hi All, How can I generate mean by group. The sample data looks like as follow, dat<-read.table(text="Year Sex wt 2001 M 15 2001 M 14 2001 M 16 2001 F 12 2001 F 11 2001 F 13 2002 M 14 2002 M 18 2002 M 17 2002 F 11 2002 F 15 2002 F 14 2003 M 18 2003 M 13 2003 M 14 2003 F 15 2003 F 10 2003 F 11 ",header=TRUE) The desired output is, M F 2001 15 12 2002 16.33 13.33 2003 15 12 Thank you,
I would usually use 'tapply'. It splits an object into groups, performs some function on each group, and then (optionally) converts the input to something simpler. For example: tapply(dat$wt, dat$Year, mean) # mean by Year tapply(dat$wt, dat$Sex , mean) # mean by Sex tapply(dat$wt, list(dat$Year, dat$Sex), mean) # mean by Year and Sex The documentation ?tapply has many more details about how this works, but that's the basics at least. I hope this helps! On Mon, Nov 1, 2021 at 5:09 PM Val <valkremk at gmail.com> wrote:> Hi All, > > How can I generate mean by group. The sample data looks like as follow, > dat<-read.table(text="Year Sex wt > 2001 M 15 > 2001 M 14 > 2001 M 16 > 2001 F 12 > 2001 F 11 > 2001 F 13 > 2002 M 14 > 2002 M 18 > 2002 M 17 > 2002 F 11 > 2002 F 15 > 2002 F 14 > 2003 M 18 > 2003 M 13 > 2003 M 14 > 2003 F 15 > 2003 F 10 > 2003 F 11 ",header=TRUE) > > The desired output is, > M F > 2001 15 12 > 2002 16.33 13.33 > 2003 15 12 > > Thank you, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Val, I think you answered your own question: by(dat$wt,dat[,c("Sex","Year")],mean) Jim On Tue, Nov 2, 2021 at 8:09 AM Val <valkremk at gmail.com> wrote:> > Hi All, > > How can I generate mean by group. The sample data looks like as follow, > dat<-read.table(text="Year Sex wt > 2001 M 15 > 2001 M 14 > 2001 M 16 > 2001 F 12 > 2001 F 11 > 2001 F 13 > 2002 M 14 > 2002 M 18 > 2002 M 17 > 2002 F 11 > 2002 F 15 > 2002 F 14 > 2003 M 18 > 2003 M 13 > 2003 M 14 > 2003 F 15 > 2003 F 10 > 2003 F 11 ",header=TRUE) > > The desired output is, > M F > 2001 15 12 > 2002 16.33 13.33 > 2003 15 12 > > Thank you, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
This is a fairly simple request and well covered by introductory reading material. A decent example was given and I see Andrew provided a base R reply that should be sufficient. But I do not think he realized you wanted something different so his answer is not in the format you wanted:> tapply(dat$wt, dat$Year, mean) # mean by Year2001 2002 2003 13.50000 14.83333 13.50000> tapply(dat$wt, dat$Sex , mean) # mean by Sex tapply(dat$wt,list(dat$Year, dat$Sex), mean) # mean by Year and Sex F M 12.44444 15.44444 I personally often prefer to the tidyverse approach which optionally includes pipes and allows a data frame to be grouped any way you want and followed by commands. It is easier to output your result this way by grouping BOTH by Year and Sex at once and getting multiple lines of output. Note the code below requires a line once like install.packages("tidyverse) library(tidyverse) dat <- read.table( text = "Year Sex wt 2001 M 15 2001 M 14 2001 M 16 2001 F 12 2001 F 11 2001 F 13 2002 M 14 2002 M 18 2002 M 17 2002 F 11 2002 F 15 2002 F 14 2003 M 18 2003 M 13 2003 M 14 2003 F 15 2003 F 10 2003 F 11 ", header = TRUE ) dat %>% group_by(Year, Sex) %>% summarize( M = mean(wt, na.rm=TRUE)) The output of the above is the rows below:> dat %>%+ group_by(Year, Sex) %>% + summarize( M = mean(wt, na.rm=TRUE)) `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument. # A tibble: 6 x 3 # Groups: Year [3] Year Sex M <int> <chr> <dbl> 1 2001 F 12 2 2001 M 15 3 2002 F 13.3 4 2002 M 16.3 5 2003 F 12 6 2003 M 15 Note Male and Female have their own rows. It is not that hard to switch it to your format by rearranging the intermediate data set with pivot_wider() in the pipeline asking to make multiple new columns from variable Sex and populating them from the created variable M. The new complete pipeline is now: dat %>% group_by(Year, Sex) %>% summarize( M = mean(wt, na.rm=TRUE)) %>% pivot_wider(names_from = Sex, values_from = M) The output as a tibble is: Year F M <int> <dbl> <dbl> 1 2001 12 15 2 2002 13.3 16.3 3 2003 12 15 Or as a data.frame which seems to add zeroes: dat %>% + group_by(Year, Sex) %>% + summarize( M = mean(wt, na.rm=TRUE)) %>% + pivot_wider(names_from = Sex, values_from = M) %>% + as.data.frame `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument. Year F M 1 2001 12.00000 15.00000 2 2002 13.33333 16.33333 3 2003 12.00000 15.00000 Your expected output is too rounded as it expects 13.3 and 16.3 but if you insist on a single significant digit after the decimal point, ask for it to be rounded:> dat %>%+ group_by(Year, Sex) %>% + summarize( M = mean(wt, na.rm=TRUE)) %>% + pivot_wider(names_from = Sex, values_from = M) %>% + as.data.frame %>% + round(1) `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument. Year F M 1 2001 12.0 15.0 2 2002 13.3 16.3 3 2003 12.0 15.0 And, yes, any of the above can be done in various ways using plain old R, and especially in the recent versions that have added a somewhat different way to do pipelines. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Val Sent: Monday, November 1, 2021 5:08 PM To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org> Subject: [R] by group Hi All, How can I generate mean by group. The sample data looks like as follow, dat<-read.table(text="Year Sex wt 2001 M 15 2001 M 14 2001 M 16 2001 F 12 2001 F 11 2001 F 13 2002 M 14 2002 M 18 2002 M 17 2002 F 11 2002 F 15 2002 F 14 2003 M 18 2003 M 13 2003 M 14 2003 F 15 2003 F 10 2003 F 11 ",header=TRUE) The desired output is, M F 2001 15 12 2002 16.33 13.33 2003 15 12 Thank you, ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Although you got several answers, simple aggregate was omitted.> with(dat, aggregate(wt, list(Year=Year, Sex=Sex), mean))Year Sex x 1 2001 F 12.00000 2 2002 F 13.33333 3 2003 F 12.00000 4 2001 M 15.00000 5 2002 M 16.33333 6 2003 M 15.00000 you can reshape the result> library(reshape2)Warning message: package 'reshape2' was built under R version 4.0.4> dcast(res, Year~Sex)Using x as value column: use value.var to override. Year F M 1 2001 12.00000 15.00000 2 2002 13.33333 16.33333 3 2003 12.00000 15.00000 Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Val > Sent: Monday, November 1, 2021 10:08 PM > To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org> > Subject: [R] by group > > Hi All, > > How can I generate mean by group. The sample data looks like as follow, > dat<-read.table(text="Year Sex wt > 2001 M 15 > 2001 M 14 > 2001 M 16 > 2001 F 12 > 2001 F 11 > 2001 F 13 > 2002 M 14 > 2002 M 18 > 2002 M 17 > 2002 F 11 > 2002 F 15 > 2002 F 14 > 2003 M 18 > 2003 M 13 > 2003 M 14 > 2003 F 15 > 2003 F 10 > 2003 F 11 ",header=TRUE) > > The desired output is, > M F > 2001 15 12 > 2002 16.33 13.33 > 2003 15 12 > > Thank you, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.