I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat <- data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f <- function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, "x", f) ddply(dat, "y", f) ddply(dat, c("x", "y"), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat <- data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function "list(mean.z = mean(z), sd.z = sd(z))" as a variable that I can reuse, similar to the function "f" above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernstein at fdopartners.com
On Tue, Aug 7, 2012 at 4:36 PM, Elliot Joel Bernstein <elliot.bernstein at fdopartners.com> wrote:> I have been using ddply to do aggregation, and I frequently define a > single aggregation function that I use to aggregate over different > groups. For example, > > require(plyr) > > dat <- data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, > replace = TRUE), z = rnorm(100)) > > f <- function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } > > ddply(dat, "x", f) > ddply(dat, "y", f) > ddply(dat, c("x", "y"), f) > > I recently discovered the data.table package, which dramatically > speeds up the aggregation: > > require(data.table) > dat <- data.table(dat) > > dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] > dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] > dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] > > But I can't figure out how to save the aggregation function > "list(mean.z = mean(z), sd.z = sd(z))" as a variable that I can reuse, > similar to the function "f" above. Can someone please explain how to > do that?One exceptionally kludgy way: zzz <- expression(list(mean.z = mean(z), sd.z = sd(z))) dat[, eval(zzz), list(x,y)] Michael> > Thanks. > > - Elliot > > -- > Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC > 134 Mount Auburn Street | Cambridge, MA | 02138 > Phone: (617) 503-4619 | Email: elliot.bernstein at fdopartners.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
HI, Try this: fun1<-function(x,.expr){ ? .expr<-expression(list(mean.z=mean(z),sd.z=sd(z))) ?z1<-eval(.expr) ?} #or fun1<-function(x,.expr){ ? .expr<-expression(list(mean.z=mean(z),sd.z=sd(z))) ?z1<-.expr ?} ?dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] A.K. ----- Original Message ----- From: Elliot Joel Bernstein <elliot.bernstein at fdopartners.com> To: r-help at r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat <- data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f <- function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, "x", f) ddply(dat, "y", f) ddply(dat, c("x", "y"), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat <- data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function "list(mean.z = mean(z), sd.z = sd(z))" as a variable that I can reuse, similar to the function "f" above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernstein at fdopartners.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.