Hi:
A nice package for doing this sort of thing is doBy. Let's manufacture an
example
since you didn't provide one:
set.seed(126)
d <- data.frame(g = rep(letters[1:3], each = 10),
x1 = rnorm(30),
x2 = rnorm(30, mean = 5),
x3 = rnorm(30, mean = 10, s = 4))
# --Case 1: no grouping variables
# If there are no grouping variables, you can define a function to apply
# to each variable (column) with the apply() function.
f <- function(x) c(mean(x), median(x))
# Apply to all numeric variables (not column 1):
apply(d[, -1], 2, f)
x1 x2 x3
[1,] -0.0647788 4.813318 10.21010
[2,] -0.0881492 4.916123 10.68559
# The mean of each variable is in the first row, the median in the second.
# --Case 2: one or more grouping variables
library(doBy)
# If you have grouping variables, you can create a function with
# names to apply to each variable groupwise. Notice that I named the
# output variables mean and median, normally a no-no, and watch what
# happens when it is used in summaryBy().
# Define the output function to apply to each variable
f2 <- function(x) c(mean = mean(x), median = median(x))
# The leading dot on the left hand side of the formula in summaryBy()
# indicates that the summary function is to be applied to all variables
# not on the RHS of the formula:
summaryBy(. ~ g, data = d, FUN = f2)
g x1.mean x1.median x2.mean x2.median x3.mean x3.median
1 a 0.04571262 -0.06361278 4.253444 4.223015 11.259677 11.06834
2 b -0.15746011 -0.14223959 4.913657 5.116526 10.037674 11.32120
3 c -0.08258890 -0.06227865 5.272853 5.524493 9.332942 10.14600
You can use multiple grouping variables in the formula if desired. The
function is meant to be applied to each LHS variable in each subgroup.
It is required that the input object of summaryBy() be a data frame.
The doBy package comes with a well-written vignette, wherein all of this
is well described.
HTH,
Dennis
On Thu, Jul 15, 2010 at 7:45 PM, Murat Tasan <mmuurr@gmail.com> wrote:
> hi all - i'm just wondering what sort of code people write to
> essentially performa an aggregate call, but with different functions
> being applied to the various columns.
>
> for example, if i have a data frame x and would like to marginalize by
> a factor f for the rows, but apply mean() to col1 and median() to
> col2.
>
> if i wanted to apply mean() to both columns, i would call:
>
> aggregate(x, list(f), mean)
>
> but to get the mean of col1 and the median of col2, i have to write
> separate tapply calls, then wrap back into a data frame:
>
> data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
>
> this is a somewhat inelegant solution for data frames with potentially
> many columns.
>
> what i would like is for aggregate to take a list of functions for
> columns, something like:
>
> aggregate(x, list(f), list(mean, median))
>
>
> i'm just curious how others get around this limitation in aggregate().
> do most simply make the individual tapply() calls separately, then
> possibly wrap them back up (as done in the example above), or is there
> a more elegant solution using some function of R that i might be
> unaware of?
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]