thr3ads.net - R help - [R] best way to aggregate / rearrange data.frame with different data types [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Martin Batholdy

2011-Jul-11 15:55 UTC

[R] best way to aggregate / rearrange data.frame with different data types

Hi,


I have a data.frame that looks like this:


Subject <- c(rep(1,4), rep(2,4), rep(3,4))
y <- rnorm(12, 3, 2)
gender <- c(rep("w",4), rep("m",4), rep("w",4))
comment <- c(rep("comment A",4), rep("comment B",4),
rep("comment C",4))

data <- data.frame(Subject,y,gender,comment)
data

   Subject           y gender   comment
1        1  2.86495339      w comment A
2        1  3.33758993      w comment A
3        1  7.00301094      w comment A
4        1  3.81585998      w comment A
5        2  2.50300460      m comment B
6        2  4.93830489      m comment B
7        2  5.08184289      m comment B
8        2  4.00552691      m comment B
9        3  3.16131181      w comment C
10       3  4.61620021      w comment C
11       3  3.68288799      w comment C
12       3 -0.05049953      w comment C



So I have multiple lines for one subject because of a repeated measurement of
variable y
(the rest of the variables stay the same, like gender).


Now I would like to transform this data.frame in two ways:

1. a aggregated form, 
where I only have one row left for each subject - for numerical variables within
the data.frame (like y) a mean should be calculated.


2. a restructured form,
where I only have one row for each subject, but four different y-columns (y1,
y2, y3, y4).


What is the easiest way to do this?
Are there any functions who do this kind of data-frame rearranging in one step?

David Winsemius

2011-Jul-11 16:25 UTC

head link

[R] best way to aggregate / rearrange data.frame with different data types

On Jul 11, 2011, at 11:55 AM, Martin Batholdy wrote:
> Hi,
>
>
> I have a data.frame that looks like this:
>
>
> Subject <- c(rep(1,4), rep(2,4), rep(3,4))
> y <- rnorm(12, 3, 2)
> gender <- c(rep("w",4), rep("m",4),
rep("w",4))
> comment <- c(rep("comment A",4), rep("comment B",4),
rep("comment C",
> 4))
>
> data <- data.frame(Subject,y,gender,comment)
> data
>
>   Subject           y gender   comment
> 1        1  2.86495339      w comment A
> 2        1  3.33758993      w comment A
> 3        1  7.00301094      w comment A
> 4        1  3.81585998      w comment A
> 5        2  2.50300460      m comment B
> 6        2  4.93830489      m comment B
> 7        2  5.08184289      m comment B
> 8        2  4.00552691      m comment B
> 9        3  3.16131181      w comment C
> 10       3  4.61620021      w comment C
> 11       3  3.68288799      w comment C
> 12       3 -0.05049953      w comment C
>
>
>
> So I have multiple lines for one subject because of a repeated  
> measurement of variable y
> (the rest of the variables stay the same, like gender).
>
>
> Now I would like to transform this data.frame in two ways:
>
> 1. a aggregated form,
> where I only have one row left for each subject - for numerical  
> variables within the data.frame (like y) a mean should be calculated.
?aggregate     # seems that you _should_ have already looked here.
>
>
> 2. a restructured form,
> where I only have one row for each subject, but four different y- 
> columns (y1, y2, y3, y4).
You can use xtab .
  data$seqvar <- ave(data$y, data$Subject, FUN=seq)
  xtabs(y ~ Subject +seqvar, data=data)

or ..
# reshape (the function)
 > data$seqvar <- ave(data$y, data$Subject, FUN=seq)
 > reshape(data, idvar=c("Subject", "gender",
"comment"),
timevar="seqvar", direction="wide")

or the easier to understand reshape or reshape2 packages.

-- 
David Winsemius, MD
West Hartford, CT

Dennis Murphy

2011-Jul-11 17:06 UTC

head link

[R] best way to aggregate / rearrange data.frame with different data types

Hi:

Here's another approach using the plyr and reshape packages. (There
are multiple ways to do this, BTW.)

## (1)
library(plyr)> ddply(dat, .(Subject, gender, comment), summarise, mean_y = mean(y))  Subject gender   comment   mean_y
1       1      w comment A 3.881864
2       2      m comment B 2.213656
3       3      w comment C 2.568794

## (2)
# Add a 'time' variable holding names to associate to new columns
dat$time <- paste('y', rep(1:4, 3), sep = '')

# reshape from 'long' to 'wide' form
cast(dat, Subject + gender + comment ~ time, value = 'y')
  Subject gender   comment        y1       y2       y3       y4
1       1      w comment A 5.9299385 3.268402 2.634573 3.694540
2       2      m comment B 2.0663910 1.475625 1.960885 3.351722
3       3      w comment C 0.6656096 3.044818 4.833166 1.731582

HTH,
Dennis

On Mon, Jul 11, 2011 at 8:55 AM, Martin Batholdy
<batholdy at googlemail.com> wrote:> Hi,
>
>
> I have a data.frame that looks like this:
>
>
> Subject <- c(rep(1,4), rep(2,4), rep(3,4))
> y <- rnorm(12, 3, 2)
> gender <- c(rep("w",4), rep("m",4),
rep("w",4))
> comment <- c(rep("comment A",4), rep("comment B",4),
rep("comment C",4))
>
> data <- data.frame(Subject,y,gender,comment)
> data
>
> ? Subject ? ? ? ? ? y gender ? comment
> 1 ? ? ? ?1 ?2.86495339 ? ? ?w comment A
> 2 ? ? ? ?1 ?3.33758993 ? ? ?w comment A
> 3 ? ? ? ?1 ?7.00301094 ? ? ?w comment A
> 4 ? ? ? ?1 ?3.81585998 ? ? ?w comment A
> 5 ? ? ? ?2 ?2.50300460 ? ? ?m comment B
> 6 ? ? ? ?2 4.93830489 ? ? ?m comment B
> 7 ? ? ? ?2 5.08184289 ? ? ?m comment B
> 8 ? ? ? ?2 ?4.00552691 ? ? ?m comment B
> 9 ? ? ? ?3 3.16131181 ? ? ?w comment C
> 10 ? ? ? 3 ?4.61620021 ? ? ?w comment C
> 11 ? ? ? 3 3.68288799 ? ? ?w comment C
> 12 ? ? ? 3 -0.05049953 ? ? ?w comment C
>
>
>
> So I have multiple lines for one subject because of a repeated measurement
of variable y
> (the rest of the variables stay the same, like gender).
>
>
> Now I would like to transform this data.frame in two ways:
>
> 1. a aggregated form,
> where I only have one row left for each subject - for numerical variables
within the data.frame (like y) a mean should be calculated.
>
>
> 2. a restructured form,
> where I only have one row for each subject, but four different y-columns
(y1, y2, y3, y4).
>
>
> What is the easiest way to do this?
> Are there any functions who do this kind of data-frame rearranging in one
step?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jul 2011 - best way to aggregate / rearrange data.frame with different data types

[R] best way to aggregate / rearrange data.frame with different data types

[R] best way to aggregate / rearrange data.frame with different data types

[R] best way to aggregate / rearrange data.frame with different data types

Reasonably Related Threads