Martin Batholdy
2011-Jul-11 15:55 UTC
[R] best way to aggregate / rearrange data.frame with different data types
Hi, I have a data.frame that looks like this: Subject <- c(rep(1,4), rep(2,4), rep(3,4)) y <- rnorm(12, 3, 2) gender <- c(rep("w",4), rep("m",4), rep("w",4)) comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C",4)) data <- data.frame(Subject,y,gender,comment) data Subject y gender comment 1 1 2.86495339 w comment A 2 1 3.33758993 w comment A 3 1 7.00301094 w comment A 4 1 3.81585998 w comment A 5 2 2.50300460 m comment B 6 2 4.93830489 m comment B 7 2 5.08184289 m comment B 8 2 4.00552691 m comment B 9 3 3.16131181 w comment C 10 3 4.61620021 w comment C 11 3 3.68288799 w comment C 12 3 -0.05049953 w comment C So I have multiple lines for one subject because of a repeated measurement of variable y (the rest of the variables stay the same, like gender). Now I would like to transform this data.frame in two ways: 1. a aggregated form, where I only have one row left for each subject - for numerical variables within the data.frame (like y) a mean should be calculated. 2. a restructured form, where I only have one row for each subject, but four different y-columns (y1, y2, y3, y4). What is the easiest way to do this? Are there any functions who do this kind of data-frame rearranging in one step?
David Winsemius
2011-Jul-11 16:25 UTC
[R] best way to aggregate / rearrange data.frame with different data types
On Jul 11, 2011, at 11:55 AM, Martin Batholdy wrote:> Hi, > > > I have a data.frame that looks like this: > > > Subject <- c(rep(1,4), rep(2,4), rep(3,4)) > y <- rnorm(12, 3, 2) > gender <- c(rep("w",4), rep("m",4), rep("w",4)) > comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C", > 4)) > > data <- data.frame(Subject,y,gender,comment) > data > > Subject y gender comment > 1 1 2.86495339 w comment A > 2 1 3.33758993 w comment A > 3 1 7.00301094 w comment A > 4 1 3.81585998 w comment A > 5 2 2.50300460 m comment B > 6 2 4.93830489 m comment B > 7 2 5.08184289 m comment B > 8 2 4.00552691 m comment B > 9 3 3.16131181 w comment C > 10 3 4.61620021 w comment C > 11 3 3.68288799 w comment C > 12 3 -0.05049953 w comment C > > > > So I have multiple lines for one subject because of a repeated > measurement of variable y > (the rest of the variables stay the same, like gender). > > > Now I would like to transform this data.frame in two ways: > > 1. a aggregated form, > where I only have one row left for each subject - for numerical > variables within the data.frame (like y) a mean should be calculated.?aggregate # seems that you _should_ have already looked here.> > > 2. a restructured form, > where I only have one row for each subject, but four different y- > columns (y1, y2, y3, y4).You can use xtab . data$seqvar <- ave(data$y, data$Subject, FUN=seq) xtabs(y ~ Subject +seqvar, data=data) or .. # reshape (the function) > data$seqvar <- ave(data$y, data$Subject, FUN=seq) > reshape(data, idvar=c("Subject", "gender", "comment"), timevar="seqvar", direction="wide") or the easier to understand reshape or reshape2 packages. -- David Winsemius, MD West Hartford, CT
Dennis Murphy
2011-Jul-11 17:06 UTC
[R] best way to aggregate / rearrange data.frame with different data types
Hi: Here's another approach using the plyr and reshape packages. (There are multiple ways to do this, BTW.) ## (1) library(plyr)> ddply(dat, .(Subject, gender, comment), summarise, mean_y = mean(y))Subject gender comment mean_y 1 1 w comment A 3.881864 2 2 m comment B 2.213656 3 3 w comment C 2.568794 ## (2) # Add a 'time' variable holding names to associate to new columns dat$time <- paste('y', rep(1:4, 3), sep = '') # reshape from 'long' to 'wide' form cast(dat, Subject + gender + comment ~ time, value = 'y') Subject gender comment y1 y2 y3 y4 1 1 w comment A 5.9299385 3.268402 2.634573 3.694540 2 2 m comment B 2.0663910 1.475625 1.960885 3.351722 3 3 w comment C 0.6656096 3.044818 4.833166 1.731582 HTH, Dennis On Mon, Jul 11, 2011 at 8:55 AM, Martin Batholdy <batholdy at googlemail.com> wrote:> Hi, > > > I have a data.frame that looks like this: > > > Subject <- c(rep(1,4), rep(2,4), rep(3,4)) > y <- rnorm(12, 3, 2) > gender <- c(rep("w",4), rep("m",4), rep("w",4)) > comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C",4)) > > data <- data.frame(Subject,y,gender,comment) > data > > ? Subject ? ? ? ? ? y gender ? comment > 1 ? ? ? ?1 ?2.86495339 ? ? ?w comment A > 2 ? ? ? ?1 ?3.33758993 ? ? ?w comment A > 3 ? ? ? ?1 ?7.00301094 ? ? ?w comment A > 4 ? ? ? ?1 ?3.81585998 ? ? ?w comment A > 5 ? ? ? ?2 ?2.50300460 ? ? ?m comment B > 6 ? ? ? ?2 4.93830489 ? ? ?m comment B > 7 ? ? ? ?2 5.08184289 ? ? ?m comment B > 8 ? ? ? ?2 ?4.00552691 ? ? ?m comment B > 9 ? ? ? ?3 3.16131181 ? ? ?w comment C > 10 ? ? ? 3 ?4.61620021 ? ? ?w comment C > 11 ? ? ? 3 3.68288799 ? ? ?w comment C > 12 ? ? ? 3 -0.05049953 ? ? ?w comment C > > > > So I have multiple lines for one subject because of a repeated measurement of variable y > (the rest of the variables stay the same, like gender). > > > Now I would like to transform this data.frame in two ways: > > 1. a aggregated form, > where I only have one row left for each subject - for numerical variables within the data.frame (like y) a mean should be calculated. > > > 2. a restructured form, > where I only have one row for each subject, but four different y-columns (y1, y2, y3, y4). > > > What is the easiest way to do this? > Are there any functions who do this kind of data-frame rearranging in one step? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >