Tony Breyal
2009-Oct-21 11:03 UTC
[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
Dear all, Lets say I have the following data frame:> set.seed(1) > col1 <- c(rep('happy',9), rep('sad', 9)) > col2 <- rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) > dates <- as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) > score=rnorm(18, 10, 3) > df1<-data.frame(col1=col1, col2=col2, Date=dates, score=score)col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? Thanks in advance, Tony O/S: Windows Vista Ultimate> sessionInfo()R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
Benilton Carvalho
2009-Oct-21 12:13 UTC
[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) b On Oct 21, 2009, at 9:03 AM, Tony Breyal wrote:> Dear all, > > Lets say I have the following data frame: > >> set.seed(1) >> col1 <- c(rep('happy',9), rep('sad', 9)) >> col2 <- rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) >> dates <- as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) >> score=rnorm(18, 10, 3) >> df1<-data.frame(col1=col1, col2=col2, Date=dates, score=score) > > col1 col2 Date score > 1 happy alpha 2009-10-13 8.120639 > 2 happy alpha 2009-10-14 10.550930 > 3 happy alpha 2009-10-15 7.493114 > 4 happy beta 2009-10-13 14.785842 > 5 happy beta 2009-10-14 10.988523 > 6 happy beta 2009-10-15 7.538595 > 7 happy gamma 2009-10-13 11.462287 > 8 happy gamma 2009-10-14 12.214974 > 9 happy gamma 2009-10-15 11.727344 > 10 sad alpha 2009-10-13 9.083835 > 11 sad alpha 2009-10-14 14.535344 > 12 sad alpha 2009-10-15 11.169530 > 13 sad beta 2009-10-13 8.136278 > 14 sad beta 2009-10-14 3.355900 > 15 sad beta 2009-10-15 13.374793 > 16 sad gamma 2009-10-13 9.865199 > 17 sad gamma 2009-10-14 9.951429 > 18 sad gamma 2009-10-15 12.831509 > > > Is it possible to get the following, whereby I am averaging the values > within each group of values in col2: > > col1 col2 Date score Average > 1 happy alpha 13/10/2009 8.120639 8.721561 > 2 happy alpha 14/10/2009 10.550930 8.721561 > 3 happy alpha 15/10/2009 7.493114 8.721561 > 4 happy beta 13/10/2009 14.785842 11.104320 > 5 happy beta 14/10/2009 10.988523 11.104320 > 6 happy beta 15/10/2009 7.538595 11.104320 > 7 happy gamma 13/10/2009 11.462287 11.801535 > 8 happy gamma 14/10/2009 12.214974 11.801535 > 9 happy gamma 15/10/2009 11.727344 11.801535 > 10 sad alpha 13/10/2009 9.083835 11.596236 > 11 sad alpha 14/10/2009 14.535344 11.596236 > 12 sad alpha 15/10/2009 11.169530 11.596236 > 13 sad beta 13/10/2009 8.136278 8.288990 > 14 sad beta 14/10/2009 3.355900 8.288990 > 15 sad beta 15/10/2009 13.374793 8.288990 > 16 sad gamma 13/10/2009 9.865199 10.882712 > 17 sad gamma 14/10/2009 9.951429 10.882712 > 18 sad gamma 15/10/2009 12.831509 10.882712 > > > My feeling is that I should be using the ?aggregate is some fashion > but can't see how to do it. Or possibly there's another function i > should be using? > > Thanks in advance, > Tony > > O/S: Windows Vista Ultimate >> sessionInfo() > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Chuck Cleland
2009-Oct-21 12:16 UTC
[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
On 10/21/2009 7:03 AM, Tony Breyal wrote:> Dear all, > > Lets say I have the following data frame: > >> set.seed(1) >> col1 <- c(rep('happy',9), rep('sad', 9)) >> col2 <- rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) >> dates <- as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) >> score=rnorm(18, 10, 3) >> df1<-data.frame(col1=col1, col2=col2, Date=dates, score=score) > > col1 col2 Date score > 1 happy alpha 2009-10-13 8.120639 > 2 happy alpha 2009-10-14 10.550930 > 3 happy alpha 2009-10-15 7.493114 > 4 happy beta 2009-10-13 14.785842 > 5 happy beta 2009-10-14 10.988523 > 6 happy beta 2009-10-15 7.538595 > 7 happy gamma 2009-10-13 11.462287 > 8 happy gamma 2009-10-14 12.214974 > 9 happy gamma 2009-10-15 11.727344 > 10 sad alpha 2009-10-13 9.083835 > 11 sad alpha 2009-10-14 14.535344 > 12 sad alpha 2009-10-15 11.169530 > 13 sad beta 2009-10-13 8.136278 > 14 sad beta 2009-10-14 3.355900 > 15 sad beta 2009-10-15 13.374793 > 16 sad gamma 2009-10-13 9.865199 > 17 sad gamma 2009-10-14 9.951429 > 18 sad gamma 2009-10-15 12.831509 > > > Is it possible to get the following, whereby I am averaging the values > within each group of values in col2: > > col1 col2 Date score Average > 1 happy alpha 13/10/2009 8.120639 8.721561 > 2 happy alpha 14/10/2009 10.550930 8.721561 > 3 happy alpha 15/10/2009 7.493114 8.721561 > 4 happy beta 13/10/2009 14.785842 11.104320 > 5 happy beta 14/10/2009 10.988523 11.104320 > 6 happy beta 15/10/2009 7.538595 11.104320 > 7 happy gamma 13/10/2009 11.462287 11.801535 > 8 happy gamma 14/10/2009 12.214974 11.801535 > 9 happy gamma 15/10/2009 11.727344 11.801535 > 10 sad alpha 13/10/2009 9.083835 11.596236 > 11 sad alpha 14/10/2009 14.535344 11.596236 > 12 sad alpha 15/10/2009 11.169530 11.596236 > 13 sad beta 13/10/2009 8.136278 8.288990 > 14 sad beta 14/10/2009 3.355900 8.288990 > 15 sad beta 15/10/2009 13.374793 8.288990 > 16 sad gamma 13/10/2009 9.865199 10.882712 > 17 sad gamma 14/10/2009 9.951429 10.882712 > 18 sad gamma 15/10/2009 12.831509 10.882712 > > > My feeling is that I should be using the ?aggregate is some fashion > but can't see how to do it. Or possibly there's another function i > should be using??ave For example, try something like this: transform(df1, Average = ave(score, col1, col2))> Thanks in advance, > Tony > > O/S: Windows Vista Ultimate >> sessionInfo() > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894