thr3ads.net - R help - [R] aggregate(...) with multiple functions [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Murat Tasan

2010-Jul-16 02:45 UTC

[R] aggregate(...) with multiple functions

hi all - i'm just wondering what sort of code people write to
essentially performa an aggregate call, but with different functions
being applied to the various columns.

for example, if i have a data frame x and would like to marginalize by
a factor f for the rows, but apply mean() to col1 and median() to
col2.

if i wanted to apply mean() to both columns, i would call:

aggregate(x, list(f), mean)

but to get the mean of col1 and the median of col2, i have to write
separate tapply calls, then wrap back into a data frame:

data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))

this is a somewhat inelegant solution for data frames with potentially
many columns.

what i would like is for aggregate to take a list of functions for
columns, something like:

aggregate(x, list(f), list(mean, median))


i'm just curious how others get around this limitation in aggregate().
do most simply make the individual tapply() calls separately, then
possibly wrap them back up (as done in the example above), or is there
a more elegant solution using some function of R that i might be
unaware of?

Gabor Grothendieck

2010-Jul-16 03:02 UTC

head link

[R] aggregate(...) with multiple functions

On Thu, Jul 15, 2010 at 10:45 PM, Murat Tasan <mmuurr at gmail.com>
wrote:> hi all - i'm just wondering what sort of code people write to
> essentially performa an aggregate call, but with different functions
> being applied to the various columns.
>
> for example, if i have a data frame x and would like to marginalize by
> a factor f for the rows, but apply mean() to col1 and median() to
> col2.
>
> if i wanted to apply mean() to both columns, i would call:
>
> aggregate(x, list(f), mean)
>
> but to get the mean of col1 and the median of col2, i have to write
> separate tapply calls, then wrap back into a data frame:
>
> data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
>
> this is a somewhat inelegant solution for data frames with potentially
> many columns.
>
> what i would like is for aggregate to take a list of functions for
> columns, something like:
>
> aggregate(x, list(f), list(mean, median))
>
>
> i'm just curious how others get around this limitation in aggregate().
> do most simply make the individual tapply() calls separately, then
> possibly wrap them back up (as done in the example above), or is there
> a more elegant solution using some function of R that i might be
> unaware of?
>
Using sqldf we can write:
> library(sqldf)
> sqldf("select Treatment, avg(conc), median(uptake) from CO2 group by
Treatment")   Treatment avg(conc) median(uptake)
1    chilled       435           19.7
2 nonchilled       435           31.3

See http://sqldf.googlecode.com for more info.

Dennis Murphy

2010-Jul-16 04:10 UTC

head link

[R] aggregate(...) with multiple functions

Hi:

A nice package for doing this sort of thing is doBy. Let's manufacture an
example
since you didn't provide one:

set.seed(126)
d <- data.frame(g = rep(letters[1:3], each = 10),
                 x1 = rnorm(30),
                 x2 = rnorm(30, mean = 5),
                 x3 = rnorm(30, mean = 10, s = 4))

# --Case 1: no grouping variables

# If there are no grouping variables, you can define a function to apply
# to each variable (column) with the apply() function.

f  <- function(x) c(mean(x), median(x))

# Apply to all numeric variables (not column 1):
apply(d[, -1], 2, f)
             x1       x2       x3
[1,] -0.0647788 4.813318 10.21010
[2,] -0.0881492 4.916123 10.68559


# The mean of each variable is in the first row, the median in the second.


# --Case 2: one or more grouping variables

library(doBy)

# If you have grouping variables, you can create a function with
# names to apply to each variable groupwise. Notice that I named the
# output variables mean and median, normally a no-no, and watch what
# happens when it is used in summaryBy().

# Define the output function to apply to each variable
f2 <- function(x) c(mean = mean(x), median = median(x))

# The leading dot on the left hand side of the formula  in summaryBy()
# indicates that the summary function is to be applied to all variables
# not on the RHS of  the formula:

summaryBy(. ~ g, data = d, FUN = f2)
  g     x1.mean   x1.median  x2.mean x2.median   x3.mean x3.median
1 a  0.04571262 -0.06361278 4.253444  4.223015 11.259677  11.06834
2 b -0.15746011 -0.14223959 4.913657  5.116526 10.037674  11.32120
3 c -0.08258890 -0.06227865 5.272853  5.524493  9.332942  10.14600

You can use multiple grouping variables in the formula if desired. The
function is meant to be applied to each LHS variable in each subgroup.
It is required that the input object of summaryBy() be a data frame.

The doBy package comes with a well-written vignette, wherein all of this
is well described.

HTH,
Dennis


On Thu, Jul 15, 2010 at 7:45 PM, Murat Tasan <mmuurr@gmail.com> wrote:
> hi all - i'm just wondering what sort of code people write to
> essentially performa an aggregate call, but with different functions
> being applied to the various columns.
>
> for example, if i have a data frame x and would like to marginalize by
> a factor f for the rows, but apply mean() to col1 and median() to
> col2.
>
> if i wanted to apply mean() to both columns, i would call:
>
> aggregate(x, list(f), mean)
>
> but to get the mean of col1 and the median of col2, i have to write
> separate tapply calls, then wrap back into a data frame:
>
> data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
>
> this is a somewhat inelegant solution for data frames with potentially
> many columns.
>
> what i would like is for aggregate to take a list of functions for
> columns, something like:
>
> aggregate(x, list(f), list(mean, median))
>
>
> i'm just curious how others get around this limitation in aggregate().
> do most simply make the individual tapply() calls separately, then
> possibly wrap them back up (as done in the example above), or is there
> a more elegant solution using some function of R that i might be
> unaware of?
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jul 2010 - aggregate(...) with multiple functions

[R] aggregate(...) with multiple functions

[R] aggregate(...) with multiple functions

[R] aggregate(...) with multiple functions

Reasonably Related Threads