thr3ads.net - R help - [R] by group [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Val

2021-Nov-01 21:08 UTC

[R] by group

Hi All,

How can I generate mean by group. The sample data looks like as follow,
dat<-read.table(text="Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",header=TRUE)

The desired  output  is,
             M        F
2001    15        12
2002    16.33   13.33
2003    15          12

Thank you,

Andrew Simmons

2021-Nov-01 21:24 UTC

head link

[R] by group

I would usually use 'tapply'. It splits an object into groups, performs
some function on each group, and then (optionally) converts the input to
something simpler.
For example:


tapply(dat$wt, dat$Year, mean)  # mean by Year
tapply(dat$wt, dat$Sex , mean)  # mean by Sex
tapply(dat$wt, list(dat$Year, dat$Sex), mean)  # mean by Year and Sex


The documentation ?tapply has many more details about how this works, but
that's the basics at least. I hope this helps!

On Mon, Nov 1, 2021 at 5:09 PM Val <valkremk at gmail.com> wrote:
> Hi All,
>
> How can I generate mean by group. The sample data looks like as follow,
> dat<-read.table(text="Year Sex wt
> 2001 M 15
> 2001 M 14
> 2001 M 16
> 2001 F 12
> 2001 F 11
> 2001 F 13
> 2002 M 14
> 2002 M 18
> 2002 M 17
> 2002 F 11
> 2002 F 15
> 2002 F 14
> 2003 M 18
> 2003 M 13
> 2003 M 14
> 2003 F 15
> 2003 F 10
> 2003 F 11  ",header=TRUE)
>
> The desired  output  is,
>              M        F
> 2001    15        12
> 2002    16.33   13.33
> 2003    15          12
>
> Thank you,
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jim Lemon

2021-Nov-01 22:24 UTC

head link

[R] by group

Hi Val,
I think you answered your own question:

by(dat$wt,dat[,c("Sex","Year")],mean)

Jim

On Tue, Nov 2, 2021 at 8:09 AM Val <valkremk at gmail.com>
wrote:>
> Hi All,
>
> How can I generate mean by group. The sample data looks like as follow,
> dat<-read.table(text="Year Sex wt
> 2001 M 15
> 2001 M 14
> 2001 M 16
> 2001 F 12
> 2001 F 11
> 2001 F 13
> 2002 M 14
> 2002 M 18
> 2002 M 17
> 2002 F 11
> 2002 F 15
> 2002 F 14
> 2003 M 18
> 2003 M 13
> 2003 M 14
> 2003 F 15
> 2003 F 10
> 2003 F 11  ",header=TRUE)
>
> The desired  output  is,
>              M        F
> 2001    15        12
> 2002    16.33   13.33
> 2003    15          12
>
> Thank you,
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Avi Gross

2021-Nov-01 22:44 UTC

head link

[R] by group

This is a fairly simple request and well covered by introductory reading
material.

A decent example was given and I see Andrew provided a base R reply that
should be sufficient. But I do not think he realized you wanted something
different so his answer is not in the format you wanted:
> tapply(dat$wt, dat$Year, mean)  # mean by Year 2001     2002     2003 
13.50000 14.83333 13.50000 > tapply(dat$wt, dat$Sex , mean)  # mean by Sex tapply(dat$wt,list(dat$Year, dat$Sex), mean)  # mean by Year and Sex
F        M
12.44444 15.44444

I personally often prefer to the tidyverse approach which optionally
includes pipes and allows a data frame to be grouped any way you want and
followed by commands. It is easier to output your result this way by
grouping BOTH by Year and Sex at once and getting multiple lines of output.
Note the code below requires a line once like install.packages("tidyverse)

library(tidyverse)
dat <- read.table(
  text = "Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",
  header = TRUE
)

dat %>%
  group_by(Year, Sex) %>%
  summarize( M = mean(wt, na.rm=TRUE))

The output of the above is the rows below:
> dat %>%  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE))
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
# A tibble: 6 x 3
# Groups:   Year [3]
Year Sex       M
<int> <chr> <dbl>
  1  2001 F      12  
2  2001 M      15  
3  2002 F      13.3
4  2002 M      16.3
5  2003 F      12  
6  2003 M      15  

Note Male and Female have their own rows. It is not that hard to switch it
to your format by rearranging the intermediate data set with pivot_wider()
in the pipeline asking to make multiple new columns from variable Sex and
populating them from the created variable M. The new complete pipeline is
now:

dat %>%
  group_by(Year, Sex) %>%
  summarize( M = mean(wt, na.rm=TRUE)) %>%
  pivot_wider(names_from = Sex, values_from = M)

The output as a tibble is:

Year     F     M
<int> <dbl> <dbl>
  1  2001  12    15  
2  2002  13.3  16.3
3  2003  12    15  

Or as a data.frame which seems to add zeroes:

dat %>%
  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE)) %>%
  +   pivot_wider(names_from = Sex, values_from = M) %>%
  +   as.data.frame
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year        F        M
1 2001 12.00000 15.00000
2 2002 13.33333 16.33333
3 2003 12.00000 15.00000

Your expected output is too rounded as it expects 13.3 and 16.3 but if you
insist on a single significant digit after the decimal point, ask for it to
be rounded:
> dat %>%  +   group_by(Year, Sex) %>%
  +   summarize( M = mean(wt, na.rm=TRUE)) %>%
  +   pivot_wider(names_from = Sex, values_from = M) %>%
  +   as.data.frame %>%
  +   round(1)
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year    F    M
1 2001 12.0 15.0
2 2002 13.3 16.3
3 2003 12.0 15.0

And, yes, any of the above can be done in various ways using plain old R,
and especially in the recent versions that have added a somewhat different
way to do pipelines.





-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, November 1, 2021 5:08 PM
To: r-help at R-project.org (r-help at r-project.org) <r-help at
r-project.org>
Subject: [R] by group

Hi All,

How can I generate mean by group. The sample data looks like as follow,
dat<-read.table(text="Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",header=TRUE)

The desired  output  is,
             M        F
2001    15        12
2002    16.33   13.33
2003    15          12

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

PIKAL Petr

2021-Nov-02 11:23 UTC

head link

[R] by group

Hi

Although you got several answers, simple aggregate was omitted.
> with(dat, aggregate(wt, list(Year=Year, Sex=Sex), mean))  Year Sex        x
1 2001   F 12.00000
2 2002   F 13.33333
3 2003   F 12.00000
4 2001   M 15.00000
5 2002   M 16.33333
6 2003   M 15.00000

you can reshape the result > library(reshape2)Warning message:
package 'reshape2' was built under R version 4.0.4
> dcast(res, Year~Sex)Using x as value column: use value.var to override.
  Year        F        M
1 2001 12.00000 15.00000
2 2002 13.33333 16.33333
3 2003 12.00000 15.00000

Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
> Sent: Monday, November 1, 2021 10:08 PM
> To: r-help at R-project.org (r-help at r-project.org) <r-help at
r-project.org>
> Subject: [R] by group
> 
> Hi All,
> 
> How can I generate mean by group. The sample data looks like as follow,
> dat<-read.table(text="Year Sex wt
> 2001 M 15
> 2001 M 14
> 2001 M 16
> 2001 F 12
> 2001 F 11
> 2001 F 13
> 2002 M 14
> 2002 M 18
> 2002 M 17
> 2002 F 11
> 2002 F 15
> 2002 F 14
> 2003 M 18
> 2003 M 13
> 2003 M 14
> 2003 F 15
> 2003 F 10
> 2003 F 11  ",header=TRUE)
> 
> The desired  output  is,
>              M        F
> 2001    15        12
> 2002    16.33   13.33
> 2003    15          12
> 
> Thank you,
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2021 - by group

[R] by group

[R] by group

[R] by group

[R] by group

[R] by group