I have a data set that has student test scores along with several categorical variables. I would like to generate a set of summary stats (mean, variance, n) for the data grouped by school authority and by exam topic. I have tried the by() function but that seems to only be able to handle one level of grouping. In particular what I would like is something like the following Board Subject Mean Variance N board1 english 70 150 600 board2 english 66 210 510 board1 science 69 180 605 board2 science 71 220 520 and so on. I have already generated the stats that I need using "GROUP BY" in a select query in MySQL. I'm just curious now about doing the same thing in R thanks in advance, Neil
You could use aggregate: agg.mean <- aggregate(my.data, by=list(Board, Subject), FUN=mean) The caveat is that you can only use one aggregator function at a time. You could rerun the same for FUN=var and FUN=length to get the additional aggregate statustics that you need and then cbind the results: cbind(agg.mean, agg.var, agg.n) -Christos -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Neil Hepburn Sent: Sunday, April 16, 2006 2:56 PM To: r-help at stat.math.ethz.ch Subject: [R] summary stats I have a data set that has student test scores along with several categorical variables. I would like to generate a set of summary stats (mean, variance, n) for the data grouped by school authority and by exam topic. I have tried the by() function but that seems to only be able to handle one level of grouping. In particular what I would like is something like the following Board Subject Mean Variance N board1 english 70 150 600 board2 english 66 210 510 board1 science 69 180 605 board2 science 71 220 520 and so on. I have already generated the stats that I need using "GROUP BY" in a select query in MySQL. I'm just curious now about doing the same thing in R thanks in advance, Neil ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Dear Neil, Coincidentally, more or less the same question was asked on r-help yesterday and today. You can use either the by() function or aggregate(), though you'll have to do a bit of work on the result if you want it to look just like your example. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Neil Hepburn > Sent: Sunday, April 16, 2006 1:56 PM > To: r-help at stat.math.ethz.ch > Subject: [R] summary stats > > I have a data set that has student test scores along with > several categorical variables. I would like to generate a set > of summary stats (mean, variance, n) for the data grouped by > school authority and by exam topic. I have tried the by() > function but that seems to only be able to handle one level > of grouping. In particular what I would like is something > like the following > > Board Subject Mean Variance N > board1 english 70 150 600 > board2 english 66 210 510 > board1 science 69 180 605 > board2 science 71 220 520 > > and so on. > > > I have already generated the stats that I need using "GROUP > BY" in a select query in MySQL. I'm just curious now about > doing the same thing in R > > thanks in advance, > Neil > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html