saschaview at gmail.com
2011-Nov-18 13:05 UTC
[R] Apply functions along "layers" of a data matrix
Hello
How can I apply functions along "layers" of a data matrix?
Example:
daf <- data.frame(
'id' = rep(1:5, 3),
matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4,
sep='') )),
rep = rep(1:3, each=5)
)
The data frame "daf" contains 3 repetitions/layers (rep) of 4
variables
of 5 persons (id). For some reason, I want to calculate various
statistics (e.g., mean, median) *along* the repetitions. The "mean"
calculation, for example, would produce the means of daf[1, 'v1']
*along* the 3 repetition:
(daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3
That is to say, each of the calculations would result in a data frame
with 4 variables (and the id) of the 5 persons:
id v1 v2 v3 v4
1 1 6 21 36 51
2 2 7 22 37 52
3 3 8 23 38 53
4 4 9 24 39 54
5 5 10 25 40 55
Currently, I do this in a loop, but I was wondering about a quick and
ressource-friendly way to achieve this?
Thanks
*S*
--
Sascha Vieweg, saschaview at gmail.com
Hi:
Here are two ways to do it; further solutions can be found in the doBy
and data.table packages, among others.
library('plyr')
ddply(daf, .(id), colwise(mean, c('v1', 'v2', 'v3',
'v4')))
aggregate(cbind(v1, v2, v3, v4) ~ id, data = daf, FUN = mean)
# Result of each:
id v1 v2 v3 v4
1 1 6 21 36 51
2 2 7 22 37 52
3 3 8 23 38 53
4 4 9 24 39 54
5 5 10 25 40 55
Dennis
On Fri, Nov 18, 2011 at 5:05 AM, <saschaview at gmail.com>
wrote:> Hello
>
> How can I apply functions along "layers" of a data matrix?
>
> Example:
>
> daf <- data.frame(
> ?'id' = rep(1:5, 3),
> ?matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4,
sep='') )),
> ?rep = rep(1:3, each=5)
> )
>
> The data frame "daf" contains 3 repetitions/layers (rep) of 4
variables of 5
> persons (id). For some reason, I want to calculate various statistics
(e.g.,
> mean, median) *along* the repetitions. The "mean" calculation,
for example,
> would produce the means of daf[1, 'v1'] *along* the 3 repetition:
>
> (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3
>
> That is to say, each of the calculations would result in a data frame with
4
> variables (and the id) of the 5 persons:
>
> ?id v1 v2 v3 v4
> 1 ?1 ?6 21 36 51
> 2 ?2 ?7 22 37 52
> 3 ?3 ?8 23 38 53
> 4 ?4 ?9 24 39 54
> 5 ?5 10 25 40 55
>
> Currently, I do this in a loop, but I was wondering about a quick and
> ressource-friendly way to achieve this?
>
> Thanks
> *S*
>
> --
> Sascha Vieweg, saschaview at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
On 11/18/2011 01:05 PM, saschaview at gmail.com wrote:> daf <- data.frame( > 'id' = rep(1:5, 3), > matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), > rep = rep(1:3, each=5) > )Hi, This seems like a job for plyr! library(plyr) ddply(daf, .(rep), summarise, mn = mean(v1)) hope this helps, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
On Nov 18, 2011, at 8:05 AM, <saschaview at gmail.com> wrote:> Hello > > How can I apply functions along "layers" of a data matrix? > > Example: > > daf <- data.frame( > 'id' = rep(1:5, 3), > matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, > sep='') )), > rep = rep(1:3, each=5) > ) > > The data frame "daf" contains 3 repetitions/layers (rep) of 4 > variables of 5 persons (id). For some reason, I want to calculate > various statistics (e.g., mean, median) *along* the repetitions. The > "mean" calculation, for example, would produce the means of daf[1, > 'v1'] *along* the 3 repetition: > > (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3 > > That is to say, each of the calculations would result in a data > frame with 4 variables (and the id) of the 5 persons: > > id v1 v2 v3 v4 > 1 1 6 21 36 51 > 2 2 7 22 37 52 > 3 3 8 23 38 53 > 4 4 9 24 39 54 > 5 5 10 25 40 55I see you have gotten acouple of plyr solutions but this is really easy in base R: > aggregate(daf[-c(1,6)], list(daf$id), mean) Group.1 v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 You read this as "use the mean function within categories defined by the "id" INDEX to aggregate the columns except tof the first and 6th columns"> > Currently, I do this in a loop, but I was wondering about a quick > and ressource-friendly way to achieve this? > > Thanks > *S* > > -- > Sascha Vieweg, saschaview at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT