saschaview at gmail.com
2011-Nov-18 13:05 UTC
[R] Apply functions along "layers" of a data matrix
Hello How can I apply functions along "layers" of a data matrix? Example: daf <- data.frame( 'id' = rep(1:5, 3), matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), rep = rep(1:3, each=5) ) The data frame "daf" contains 3 repetitions/layers (rep) of 4 variables of 5 persons (id). For some reason, I want to calculate various statistics (e.g., mean, median) *along* the repetitions. The "mean" calculation, for example, would produce the means of daf[1, 'v1'] *along* the 3 repetition: (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3 That is to say, each of the calculations would result in a data frame with 4 variables (and the id) of the 5 persons: id v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 Currently, I do this in a loop, but I was wondering about a quick and ressource-friendly way to achieve this? Thanks *S* -- Sascha Vieweg, saschaview at gmail.com
Hi: Here are two ways to do it; further solutions can be found in the doBy and data.table packages, among others. library('plyr') ddply(daf, .(id), colwise(mean, c('v1', 'v2', 'v3', 'v4'))) aggregate(cbind(v1, v2, v3, v4) ~ id, data = daf, FUN = mean) # Result of each: id v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 Dennis On Fri, Nov 18, 2011 at 5:05 AM, <saschaview at gmail.com> wrote:> Hello > > How can I apply functions along "layers" of a data matrix? > > Example: > > daf <- data.frame( > ?'id' = rep(1:5, 3), > ?matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), > ?rep = rep(1:3, each=5) > ) > > The data frame "daf" contains 3 repetitions/layers (rep) of 4 variables of 5 > persons (id). For some reason, I want to calculate various statistics (e.g., > mean, median) *along* the repetitions. The "mean" calculation, for example, > would produce the means of daf[1, 'v1'] *along* the 3 repetition: > > (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3 > > That is to say, each of the calculations would result in a data frame with 4 > variables (and the id) of the 5 persons: > > ?id v1 v2 v3 v4 > 1 ?1 ?6 21 36 51 > 2 ?2 ?7 22 37 52 > 3 ?3 ?8 23 38 53 > 4 ?4 ?9 24 39 54 > 5 ?5 10 25 40 55 > > Currently, I do this in a loop, but I was wondering about a quick and > ressource-friendly way to achieve this? > > Thanks > *S* > > -- > Sascha Vieweg, saschaview at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 11/18/2011 01:05 PM, saschaview at gmail.com wrote:> daf <- data.frame( > 'id' = rep(1:5, 3), > matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), > rep = rep(1:3, each=5) > )Hi, This seems like a job for plyr! library(plyr) ddply(daf, .(rep), summarise, mn = mean(v1)) hope this helps, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
On Nov 18, 2011, at 8:05 AM, <saschaview at gmail.com> wrote:> Hello > > How can I apply functions along "layers" of a data matrix? > > Example: > > daf <- data.frame( > 'id' = rep(1:5, 3), > matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, > sep='') )), > rep = rep(1:3, each=5) > ) > > The data frame "daf" contains 3 repetitions/layers (rep) of 4 > variables of 5 persons (id). For some reason, I want to calculate > various statistics (e.g., mean, median) *along* the repetitions. The > "mean" calculation, for example, would produce the means of daf[1, > 'v1'] *along* the 3 repetition: > > (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3 > > That is to say, each of the calculations would result in a data > frame with 4 variables (and the id) of the 5 persons: > > id v1 v2 v3 v4 > 1 1 6 21 36 51 > 2 2 7 22 37 52 > 3 3 8 23 38 53 > 4 4 9 24 39 54 > 5 5 10 25 40 55I see you have gotten acouple of plyr solutions but this is really easy in base R: > aggregate(daf[-c(1,6)], list(daf$id), mean) Group.1 v1 v2 v3 v4 1 1 6 21 36 51 2 2 7 22 37 52 3 3 8 23 38 53 4 4 9 24 39 54 5 5 10 25 40 55 You read this as "use the mean function within categories defined by the "id" INDEX to aggregate the columns except tof the first and 6th columns"> > Currently, I do this in a loop, but I was wondering about a quick > and ressource-friendly way to achieve this? > > Thanks > *S* > > -- > Sascha Vieweg, saschaview at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT