david hilton shanabrook
2010-Jan-04 03:46 UTC
[R] function in aggregate applied to specific columns only
I want to use aggregate with the mean function on specific columns gender <- factor(c("m", "m", "f", "f", "m")) student <- c(0001, 0002, 0003, 0003, 0001) score <- c(50, 60, 70, 65, 60) basicSub <- data.frame(student, gender, score) basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. Thanks [[alternative HTML version deleted]]
David Winsemius
2010-Jan-04 03:58 UTC
[R] function in aggregate applied to specific columns only
On Jan 3, 2010, at 10:46 PM, david hilton shanabrook wrote:> I want to use aggregate with the mean function on specific columns > > gender <- factor(c("m", "m", "f", "f", "m")) > student <- c(0001, 0002, 0003, 0003, 0001) > score <- c(50, 60, 70, 65, 60) > basicSub <- data.frame(student, gender, score) > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), > FUN=mean, na.rm=TRUE)> basicSubMean <- aggregate(basicSub$score, by=list(basicSub $student), FUN=mean, na.rm=TRUE) > basicSubMean Group.1 x 1 1 55.0 2 2 60.0 3 3 67.5> > This doesn't work, one cannot take the mean of a factor (gender). > Is there any way of specifying which columns to use for the mean? I > want to aggregate by student, obtaining mean scores, and assume any > other factors are unchanging in a specific student, ie. gender. > > Thanks > [[alternative HTML version deleted]]-- David Winsemius, MD Heritage Laboratories West Hartford, CT
Dennis Murphy
2010-Jan-04 03:59 UTC
[R] function in aggregate applied to specific columns only
Hi: Perhaps the plyr package would be useful. It contains functions colwise(), numcolwise() and catcolwise() that will perform the same operation on the stated type of object. In this case, numcolwise() is appropriate:> str(basicSub)'data.frame': 5 obs. of 3 variables: $ student: num 1 2 3 3 1 $ gender : Factor w/ 2 levels "f","m": 2 2 1 1 2 $ score : num 50 60 70 65 60> basicSub$student <- factor(basicSub$student) # convert student to factor > library(plyr)# First argument is data frame, the next is the grouping variable, the # third is the function to apply.> ddply(basicSub, .(student), numcolwise(mean))student score 1 1 55.0 2 2 60.0 3 3 67.5 HTH, Dennis On Sun, Jan 3, 2010 at 7:46 PM, david hilton shanabrook < dhshanab@acad.umass.edu> wrote:> I want to use aggregate with the mean function on specific columns > > gender <- factor(c("m", "m", "f", "f", "m")) > student <- c(0001, 0002, 0003, 0003, 0001) > score <- c(50, 60, 70, 65, 60) > basicSub <- data.frame(student, gender, score) > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, > na.rm=TRUE) > > This doesn't work, one cannot take the mean of a factor (gender). Is there > any way of specifying which columns to use for the mean? I want to > aggregate by student, obtaining mean scores, and assume any other factors > are unchanging in a specific student, ie. gender. > > Thanks > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
milton ruser
2010-Jan-04 04:04 UTC
[R] function in aggregate applied to specific columns only
You want this?> basicSubMean <- aggregate(basicSub[c("score")], by=list(basicSub$student),FUN=mean, na.rm=TRUE)> basicSubMeanGroup.1 score 1 1 55.0 2 2 60.0 3 3 67.5 bests milton On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook < dhshanab@acad.umass.edu> wrote:> I want to use aggregate with the mean function on specific columns > > gender <- factor(c("m", "m", "f", "f", "m")) > student <- c(0001, 0002, 0003, 0003, 0001) > score <- c(50, 60, 70, 65, 60) > basicSub <- data.frame(student, gender, score) > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, > na.rm=TRUE) > > This doesn't work, one cannot take the mean of a factor (gender). Is there > any way of specifying which columns to use for the mean? I want to > aggregate by student, obtaining mean scores, and assume any other factors > are unchanging in a specific student, ie. gender. > > Thanks > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Gabor Grothendieck
2010-Jan-04 04:14 UTC
[R] function in aggregate applied to specific columns only
Here are 6 ways: 1. aggregate> aggregate(basicSub["score"], basicSub["student"], mean)student score 1 1 55.0 2 2 60.0 3 3 67.5 2. tapply> with(basicSub, tapply(score, student, mean))1 2 3 55.0 60.0 67.5 3. summaryBy in doBy package> library(doBy) > summaryBy(. ~ student, basicSub)student score.mean 1 1 55.0 2 2 60.0 3 3 67.5 4. sqldf in sqldf package. Uses SQL:> library(sqldf) > sqldf("select student, avg(score) from basicSub group by student")student avg(score) 1 1 55.0 2 2 60.0 3 3 67.5 5. summary.formula in Hmisc> summary(score ~ student, basicSub)score N=5 +-------+-+-+-----+ | | |N|score| +-------+-+-+-----+ |student|1|2|55.0 | | |2|1|60.0 | | |3|2|67.5 | +-------+-+-+-----+ |Overall| |5|61.0 | +-------+-+-+-----+ 6. plyr (see Dennis Murphy's solution in this thread) On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook <dhshanab at acad.umass.edu> wrote:> I want to use aggregate with the mean function on specific columns > > gender <- factor(c("m", "m", "f", "f", "m")) > student <- c(0001, 0002, 0003, 0003, 0001) > score <- c(50, 60, 70, 65, 60) > basicSub <- data.frame(student, gender, score) > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE) > > This doesn't work, one cannot take the mean of a factor (gender). ?Is there any way of specifying which columns to use for the mean? ?I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender. > > Thanks > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >