Hello! I'm trying to runs stats on two vars at a time in a big data frame. I knew how to do this in SAS many years ago, but have half-forgotten that as well! I need, for instance, mean(value) by x-y combination: x y z value 1 1 1 10 1 1 2 20 1 2 1 30 with results: x y mean(value) 1 1 15 1 2 30 Any help? Thanks, ~Zack -- View this message in context: http://old.nabble.com/By-processing-on-two-variables-at-once--tp26312115p26312115.html Sent from the R help mailing list archive at Nabble.com.
Hi, On Wed, Nov 11, 2009 at 8:51 PM, zwarren <zack.warren at yahoo.com> wrote:> > Hello! > > I'm trying to runs stats on two vars at a time in a big data frame. ?I knew > how to do this in SAS many years ago, but have half-forgotten that as well! > > I need, for instance, mean(value) by x-y combination: > x ? y ? z ? value > 1 ? 1 ? 1 ? ?10 > 1 ? 1 ? 2 ? ?20 > 1 ? 2 ? 1 ? ?30 > > with results: > x ? y ? mean(value) > 1 ? 1 ? ?15 > 1 ? 2 ? ?30What happend to your "z" column? Anyway, there are a few ways you can do this. 1. If you just want to use the standard library, try the aggregate function. Roghly: R> df <- data.frame(x=c(1,1,1), y=c(1,1,2), z=c(1,2,1), value=c(10,20,30)) R> aggregate(df, by=list(df$x, df$y), mean) Group.1 Group.2 x y z value 1 1 1 1 1 1.5 15 2 1 2 1 2 1.0 30 2. You can try using the plyr library: R> library(plyr) R> ddply(df, .(x, y), mean) x y z value 1 1 1 1.5 15 2 1 2 1.0 30 HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On Nov 11, 2009, at 8:51 PM, zwarren wrote:> > Hello! > > I'm trying to runs stats on two vars at a time in a big data frame. > I knew > how to do this in SAS many years ago, but have half-forgotten that > as well! > > I need, for instance, mean(value) by x-y combination: > x y z value > 1 1 1 10 > 1 1 2 20 > 1 2 1 30 > > with results: > x y mean(value) > 1 1 15 > 1 2 30It's generally a bad idea to incorporate "(" in variable names, but it's possible: > aggregate(dat$value, list(dat$x, dat$y), mean) Group.1 Group.2 x 1 1 1 15 2 1 2 30 > newdat <-aggregate(dat$value, list(dat$x, dat$y), mean) > names(newdat) <- c("x","y",bquote("mean(value)") ) > newdat x y mean(value) 1 1 1 15 2 1 2 30> > Any help? > > Thanks, > > ~Zack > -- > View this message in context: http://old.nabble.com/By-processing-on-two-variables-at-once--tp26312115p26312115.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT