Hi all, I'm trying to improve my R skills and make my programming more efficient and succinct. I can solve the following question, but wonder if there's a better way to do it: I'm trying to calculate mean by several variables and then put this back into the original data set as a new variable. For example, if I were measuring weight, I might want to have each individual's weight, and also the group mean by, say, race, sex, and geographic region. The following code works:> x1<-rep(c("A","B","C"),3) > x2<-c(rep(1,3),rep(2,3),1,2,1) > x3<-c(1,2,3,4,5,6,2,6,4) > x<-as.data.frame(cbind(x1,x2,x3)) > x3.mean<-rep(0,nrow(x)) > for (i in 1:nrow(x)){+ x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2][i]])) + }> cbind(x,x3.mean)x1 x2 x3 x3.mean 1 A 1 1 1.5 2 B 1 2 2.0 3 C 1 3 3.5 4 A 2 4 4.0 5 B 2 5 5.5 6 C 2 6 6.0 7 A 1 2 1.5 8 B 2 6 5.5 9 C 1 4 3.5 However, I'd love to be able to do this with "apply" rather than a for-loop. Or is there a built-in function? Any suggestions? Also, any way to avoid the hassles with having to convert to a data frame and then again to numeric when one variable is character? Cheers, Alan Cohen
Not exactly the output you asked for, but perhaps you can consider, library(doBy) > summaryBy(x3~x2+x1,data=x,FUN=mean)> x2 x1 x3.mean > 1 1 A 1.5 > 2 1 B 2.0 > 3 1 C 3.5 > 4 2 A 4.0 > 5 2 B 5.5 > 6 2 C 6.0the plyr package also provides similar functionality, as do the ?by, ? ave, and ?tapply base functions. HTH, baptiste On 31 Mar 2009, at 17:09, Alan Cohen wrote:> Hi all, > > I'm trying to improve my R skills and make my programming more > efficient and succinct. I can solve the following question, but > wonder if there's a better way to do it: > > I'm trying to calculate mean by several variables and then put this > back into the original data set as a new variable. For example, if > I were measuring weight, I might want to have each individual's > weight, and also the group mean by, say, race, sex, and geographic > region. The following code works: > >> x1<-rep(c("A","B","C"),3) >> x2<-c(rep(1,3),rep(2,3),1,2,1) >> x3<-c(1,2,3,4,5,6,2,6,4) >> x<-as.data.frame(cbind(x1,x2,x3)) >> x3.mean<-rep(0,nrow(x)) >> for (i in 1:nrow(x)){ > + x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2] > [i]])) > + } >> cbind(x,x3.mean) > x1 x2 x3 x3.mean > 1 A 1 1 1.5 > 2 B 1 2 2.0 > 3 C 1 3 3.5 > 4 A 2 4 4.0 > 5 B 2 5 5.5 > 6 C 2 6 6.0 > 7 A 1 2 1.5 > 8 B 2 6 5.5 > 9 C 1 4 3.5 > > However, I'd love to be able to do this with "apply" rather than a > for-loop. Or is there a built-in function? Any suggestions? > > Also, any way to avoid the hassles with having to convert to a data > frame and then again to numeric when one variable is character? > > Cheers, > Alan Cohen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code._____________________________ Baptiste Augui? School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag
On Tue, Mar 31, 2009 at 11:31 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:> Not exactly the output you asked for, but perhaps you can consider, > > library(doBy) >> summaryBy(x3~x2+x1,data=x,FUN=mean) >> >> ?x2 x1 x3.mean >> 1 ?1 ?A ? ? 1.5 >> 2 ?1 ?B ? ? 2.0 >> 3 ?1 ?C ? ? 3.5 >> 4 ?2 ?A ? ? 4.0 >> 5 ?2 ?B ? ? 5.5 >> 6 ?2 ?C ? ? 6.0 > > > the plyr package also provides similar functionality, as do the ?by, ?ave, > and ?tapply base functions.In plyr it would look like: x1 <- rep(c("A", "B", "C"), 3) x2 <- c(rep(1, 3), rep(2, 3), 1, 2, 1) x3 <- c(1, 2, 3, 4, 5, 6, 2, 6, 4) df <- data.frame(x1, x2, x3) ddply(df, .(x1, x2), transform, x3.mean = mean(x3)) Note how I created the data frame - only use cbind if you want a matrix (i.e. all the columns have the same type) Hadley -- http://had.co.nz/
A different solution (using aggregate for the table of means and merge for adding it to the dataframe): x1<-rep(c("A","B","C"),3) x2<-c(rep(1,3),rep(2,3),1,2,1) x3<-c(1,2,3,4,5,6,2,6,4) x<-data.frame(x1,x2,x3) #here using data.frame the x1 variable is directly converted to factor x3means <- aggregate(x$x3, by=list(x$x1), FUN="mean") merge(x, x3means, by.x="x1", by.y="Group.1") Ciao, domenico Alan Cohen wrote:> Hi all, > > I'm trying to improve my R skills and make my programming more efficient and succinct. I can solve the following question, but wonder if there's a better way to do it: > > I'm trying to calculate mean by several variables and then put this back into the original data set as a new variable. For example, if I were measuring weight, I might want to have each individual's weight, and also the group mean by, say, race, sex, and geographic region. The following code works: > > >> x1<-rep(c("A","B","C"),3) >> x2<-c(rep(1,3),rep(2,3),1,2,1) >> x3<-c(1,2,3,4,5,6,2,6,4) >> x<-as.data.frame(cbind(x1,x2,x3)) >> x3.mean<-rep(0,nrow(x)) >> for (i in 1:nrow(x)){ >> > + x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2][i]])) > + } > >> cbind(x,x3.mean) >> > x1 x2 x3 x3.mean > 1 A 1 1 1.5 > 2 B 1 2 2.0 > 3 C 1 3 3.5 > 4 A 2 4 4.0 > 5 B 2 5 5.5 > 6 C 2 6 6.0 > 7 A 1 2 1.5 > 8 B 2 6 5.5 > 9 C 1 4 3.5 > > However, I'd love to be able to do this with "apply" rather than a for-loop. Or is there a built-in function? Any suggestions? > > Also, any way to avoid the hassles with having to convert to a data frame and then again to numeric when one variable is character? > > Cheers, > Alan Cohen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Sorry, there was a mistake in the previous mail: Domenico Vistocco wrote:> A different solution (using aggregate for the table of means and merge > for adding it to the dataframe): > > x1<-rep(c("A","B","C"),3) > x2<-c(rep(1,3),rep(2,3),1,2,1) > x3<-c(1,2,3,4,5,6,2,6,4) > x<-data.frame(x1,x2,x3) #here using data.frame the x1 variable is > directly converted to factor > > > x3means <- aggregate(x$x3, by=list(x$x1), FUN="mean") > merge(x, x3means, by.x="x1", by.y="Group.1")#I forgot the second variable in the by argument (both for aggregate and by): x3means <- aggregate(x$x3, by=list(x$x1, x$x2), FUN="mean") merge(x, x3means, by.x=c("x1","x2"), by.y=c("Group.1", "Group.2"))> > > Ciao, > domenico > > Alan Cohen wrote: >> Hi all, >> >> I'm trying to improve my R skills and make my programming more >> efficient and succinct. I can solve the following question, but >> wonder if there's a better way to do it: >> >> I'm trying to calculate mean by several variables and then put this >> back into the original data set as a new variable. For example, if I >> were measuring weight, I might want to have each individual's weight, >> and also the group mean by, say, race, sex, and geographic region. >> The following code works: >> >> >>> x1<-rep(c("A","B","C"),3) >>> x2<-c(rep(1,3),rep(2,3),1,2,1) >>> x3<-c(1,2,3,4,5,6,2,6,4) >>> x<-as.data.frame(cbind(x1,x2,x3)) >>> x3.mean<-rep(0,nrow(x)) >>> for (i in 1:nrow(x)){ >>> >> + x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2][i]])) >> + } >>> cbind(x,x3.mean) >>> >> x1 x2 x3 x3.mean >> 1 A 1 1 1.5 >> 2 B 1 2 2.0 >> 3 C 1 3 3.5 >> 4 A 2 4 4.0 >> 5 B 2 5 5.5 >> 6 C 2 6 6.0 >> 7 A 1 2 1.5 >> 8 B 2 6 5.5 >> 9 C 1 4 3.5 >> >> However, I'd love to be able to do this with "apply" rather than a >> for-loop. Or is there a built-in function? Any suggestions? >> >> Also, any way to avoid the hassles with having to convert to a data >> frame and then again to numeric when one variable is character? >> >> Cheers, >> Alan Cohen >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
That is precisely the reason for the existence of the ave function. Using Wickham's example: > x1 <- rep(c("A", "B", "C"), 3) > x2 <- c(rep(1, 3), rep(2, 3), 1, 2, 1) > x3 <- c(1, 2, 3, 4, 5, 6, 2, 6, 4) > df <- data.frame(x1, x2, x3) > df$grpx3 <- ave(df$x3, list(x1,x2)) > df x1 x2 x3 grpx3 1 A 1 1 1.5 2 B 1 2 2.0 3 C 1 3 3.5 4 A 2 4 4.0 5 B 2 5 5.5 6 C 2 6 6.0 7 A 1 2 1.5 8 B 2 6 5.5 9 C 1 4 3.5 Note that the default function is mean() but other functions could be specified. -- David Winsemius On Mar 31, 2009, at 12:09 PM, Alan Cohen wrote:> Hi all, > > I'm trying to improve my R skills and make my programming more > efficient and succinct. I can solve the following question, but > wonder if there's a better way to do it: > > I'm trying to calculate mean by several variables and then put this > back into the original data set as a new variable. For example, if > I were measuring weight, I might want to have each individual's > weight, and also the group mean by, say, race, sex, and geographic > region. The following code works: > >> x1<-rep(c("A","B","C"),3) >> x2<-c(rep(1,3),rep(2,3),1,2,1) >> x3<-c(1,2,3,4,5,6,2,6,4) >> x<-as.data.frame(cbind(x1,x2,x3)) >> x3.mean<-rep(0,nrow(x)) >> for (i in 1:nrow(x)){ > + x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2] > [i]])) > + } >> cbind(x,x3.mean) > x1 x2 x3 x3.mean > 1 A 1 1 1.5 > 2 B 1 2 2.0 > 3 C 1 3 3.5 > 4 A 2 4 4.0 > 5 B 2 5 5.5 > 6 C 2 6 6.0 > 7 A 1 2 1.5 > 8 B 2 6 5.5 > 9 C 1 4 3.5 > > However, I'd love to be able to do this with "apply" rather than a > for-loop. Or is there a built-in function? Any suggestions? > > Also, any way to avoid the hassles with having to convert to a data > frame and then again to numeric when one variable is character? > > Cheers, > Alan CohenDavid Winsemius, MD Heritage Laboratories West Hartford, CT