Hi:
I'm a big fan of the reshape package, but this time I think that the doBy
and plyr
packages may better suit your needs. Since you mentioned wanting to get the
min/mean/max of several variables simultaneously, I took out line54 and
added
some vectors of Gaussian(0, 1) random numbers for testing:
test <- data.frame(mF[, -5], x1 = rnorm(23), x2 = rnorm(23), x3 = rnorm(23))
### doBy approach:
# Create a function for doBy to use on a specific variable:
f <- function(x) {
c(min = min(x, na.rm = TRUE), mean = mean(x, na.rm = TRUE),
max = max(x, na.rm = TRUE))
}
library(doBy)> summaryBy(x1 + x2 + x3 ~ Season, data = test, FUN = f)
Season x1.min x1.mean x1.max x2.min x2.mean x2.max
1 1 -1.108496 -0.2590727 1.692468 -0.8958644 -0.00485722 0.6525678
2 2 -1.686261 0.4655741 2.097220 -0.9484292 0.37197098 2.6325965
3 3 -1.093520 -0.2049273 0.390061 -0.6886613 0.49534667 2.4263802
x3.min x3.mean x3.max
1 -2.07369239 -0.05164301 1.6199843
2 -0.43556155 0.31221804 1.1939009
3 -0.04847558 0.15200570 0.4355102
The LHS of the formula consists of the variables you want summarized,
the RHS contains the grouping variable(s), the data supplied MUST be a data
frame and FUN is the function you want applied to each variable. In this
case,
the function returns a vector of the min, mean and max of the input
variable.
Notice that the names given in the function are appended to the variable
name,
separated by a dot. (A nice touch by the package author...)
If you have a number of variables to summarize in this fashion, doBy is well
designed for this type of task in the sense that the syntax is pretty
straightforward.
#### plyr approach
To accomplish the same task in plyr with ddply(), you've got to be a little
more clever -
use numcolwise() in combination with each(). numcolwise() applies the same
function
to each numeric variable in the input data frame; each() applies the list of
functions
supplied as its arguments to a single input variable. The call below is a
composition of
the two functions:
> ddply(test, .(Season), numcolwise(each(min, mean, max)))
Season x1 x2 x3
1 1 -1.1084957 -0.89586438 -2.07369239
2 1 -0.2590727 -0.00485722 -0.05164301
3 1 1.6924681 0.65256782 1.61998433
4 2 -1.6862610 -0.94842919 -0.43556155
5 2 0.4655741 0.37197098 0.31221804
6 2 2.0972202 2.63259653 1.19390094
7 3 -1.0935199 -0.68866127 -0.04847558
8 3 -0.2049273 0.49534667 0.15200570
9 3 0.3900610 2.42638021 0.43551022
To distinguish the measures in each row, create a factor of stat names
and then rearrange the order of columns to get something a little more
presentable:> summ <- ddply(test, .(Season), numcolwise(each(min, mean, max)))
> summ$stat <- rep(c('Min', 'Mean', 'Max'), 3) #
add vector of names
> summ <- summ[, c(1, 5, 2:4)] # column rearrangement
> summ
Season stat x1 x2 x3
1 1 Min -1.1084957 -0.89586438 -2.07369239
2 1 Mean -0.2590727 -0.00485722 -0.05164301
3 1 Max 1.6924681 0.65256782 1.61998433
4 2 Min -1.6862610 -0.94842919 -0.43556155
5 2 Mean 0.4655741 0.37197098 0.31221804
6 2 Max 2.0972202 2.63259653 1.19390094
7 3 Min -1.0935199 -0.68866127 -0.04847558
8 3 Mean -0.2049273 0.49534667 0.15200570
9 3 Max 0.3900610 2.42638021 0.43551022
The two functions give you two different ways to present the summaries; take
your pick.
HTH,
Dennis
On Wed, Apr 21, 2010 at 10:16 AM, Ben Stewart <bpstewar@uvic.ca> wrote:
> I've got a problem with the sparseby command (reshape library), and I
have
> reached the peak of my R knowledge (it isn't really that high).
>
> I have a small data frame of 23 rows and 15 columns, here is a subset, the
> first four columns are factors and the rest are numeric (only one, line54
> is
> provided).
>
> bearID YEAR Season SEX line54
> 5 1900 8 3 0 16.3923519
> 11 2270 5 1 0 233.7414014
> 12 2271 5 1 0 290.8207652
> 13 2271 5 2 0 244.7820844
> 15 2291 5 1 0 0.0000000
> 16 2291 5 2 0 14.5037795
> 17 2291 6 1 0 0.0000000
> 18 2293 5 2 0 144.7440752
> 19 2293 5 3 0 0.0000000
> 20 2293 6 1 0 16.0592270
> 21 2293 6 2 0 30.1383426
> 28 2298 5 1 0 0.9741067
> 29 2298 5 2 0 9.6641018
> 30 2298 6 2 0 8.6533828
> 31 2309 5 2 0 85.9781303
> 32 2325 6 1 0 110.8892153
> 35 2331 6 1 0 26.7335562
> 44 2390 7 2 0 7.1690620
> 45 2390 8 2 0 44.1109897
> 46 2390 8 3 0 503.9074898
> 47 2390 9 2 0 8.4393660
> 54 2416 7 3 0 48.6910907
> 58 2418 8 2 0 5.7951139
>
> Sparseby works fine when I try to calculate mean
>
> >sparseby(mF[1:5], mF$Season, mean)
>
> mF$Season bearID YEAR Season SEX line54
> 1 1 NA NA NA 0 84.90228
> 2 2 NA NA NA 0 54.90713
> 3 3 NA NA NA 0 142.24773
>
> But it goes nuts when looking for max or min
>
> > sparseby(mF[5:6], mF$Season, max)
> mF$Season structure(c(2169.49621795108, 1885.22677689026, 2492.17544685464
> 1 1
> 2169.496
> 2 2
> 1885.227
> 3 3
> 2492.175
>
> Any ideas? All I want is to calculate create three data.frames, mean, min
> and max.
>
> Thanks,
>
> Ben Stewart
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]