Matt Pocernich
2008-Mar-21 21:15 UTC
[R] Aggregate with functions using multiple arguments
Hello, I would like to use aggregate with a function that requires several argument that are columns of data. As a simple example, suppose I have data in the following dataframe and I would like to summarize the difference between columns obs and frc for each site. How would I do this? (In reality, the function is slightly more complicated, but still requires several columns of data.) > DAT <- data.frame(site = rep(c("A","B"), 10), obs = rnorm(20), frc rnorm(20)) > DAT site obs frc 1 A 1.27451057 -1.68995017 2 B 1.43942253 0.41672963 3 A -0.10875319 0.77108721 4 B -0.63198144 -0.21772356 5 A -0.42084163 0.50997647 .... I have tried variations on the following syntax with no success. > F <- function(sub){ mean(sub[,"obs"] - sub[,"frc"] ) } > aggregate(DAT[,c("obs", "frc")], by = list(DAT$site), F) I have had partial success with the by command, but the by-class object is awkward and I would like to use the aggregate command to be consistent with other functions. Thanks, Matt -- Matt Pocernich National Center for Atmospheric Research Research Applications Laboratory (303) 497-8312 -- Matt Pocernich National Center for Atmospheric Research Research Applications Laboratory (303) 497-8312
Is this what you want?> DATsite obs frc 1 A 1.2745106 -1.6899502 2 B 1.4394225 0.4167296 3 A -0.1087532 0.7710872 4 B -0.6319814 -0.2177236 5 A -0.4208416 0.5099765> (z <- by(DAT, list(DAT$site), function(x) mean(x$obs - x$frc))): A [1] 0.3846007 ------------------------------------------------------------------------------ : B [1] 0.3042175> cbind(z)z A 0.3846007 B 0.3042175>On Fri, Mar 21, 2008 at 4:15 PM, Matt Pocernich <pocernic at rap.ucar.edu> wrote:> Hello, > > I would like to use aggregate with a function that > requires several argument that are columns of data. > > As a simple example, suppose I have data in the following dataframe and I > would like to summarize the difference between columns obs and frc for each > site. How would I do this? (In reality, the function is slightly more > complicated, but still requires several columns of data.) > > > > DAT <- data.frame(site = rep(c("A","B"), 10), obs = rnorm(20), frc > rnorm(20)) > > > DAT > site obs frc > 1 A 1.27451057 -1.68995017 > 2 B 1.43942253 0.41672963 > 3 A -0.10875319 0.77108721 > 4 B -0.63198144 -0.21772356 > 5 A -0.42084163 0.50997647 > .... > > I have tried variations on the following syntax with no success. > > > F <- function(sub){ > mean(sub[,"obs"] - sub[,"frc"] ) > } > > > aggregate(DAT[,c("obs", "frc")], by = list(DAT$site), F) > > > I have had partial success with the by command, but the by-class object is > awkward and I would like to use the aggregate command to be consistent with > other functions. > > > Thanks, > > Matt > > > > > -- > Matt Pocernich > National Center for Atmospheric Research > Research Applications Laboratory > (303) 497-8312 > > -- > Matt Pocernich > National Center for Atmospheric Research > Research Applications Laboratory > (303) 497-8312 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?