Stephan Lindner
2006-Jun-20 08:42 UTC
[R] Create variables with common values for each group
Dear all,
sorry, this is for sure really basic, but I searched a lot in the
internet, and just couldn't find a solution.
The problem is to create new variables from a data frame which
contains both individual and group variables, such as mean age for an
household. My data frame:
df
hhid h.age
1 10010020 23
2 10010020 23
3 10010126 42
4 10010126 60
5 10010142 20
6 10010142 49
7 10010142 52
8 10010150 18
9 10010150 51
10 10010150 28
where hhid is the same number for each household, h.age the age for
each household member.
I tried tapply, by(), and aggregate. The best I could get was:
by(df, df$hhid, function(subset) rep(mean(subset$h.age,na.rm=T),nrow(subset)))
df$hhid: 10010020
[1] 23 23
------------------------------------------------------------
df$hhid: 10010126
[1] 51 51
------------------------------------------------------------
df$hhid: 10010142
[1] 40.33333 40.33333 40.33333
------------------------------------------------------------
df$hhid: 10010150
[1] 32.33333 32.33333 32.33333
Now I principally only would have to stack up the mean values, and
this is where I'm stucked. The function aggregate works nice, and I
could loop then, but I was wondering whether there is a better way to
do that.
My end result should look like this (assigning mean.age to the data frame):
hhid h.age mean.age
1 10010020 23 23.00
2 10010020 23 23.00
3 10010126 42 51.00
4 10010126 60 51.00
5 10010142 20 40.33
6 10010142 49 40.33
7 10010142 52 40.33
8 10010150 18 32.33
9 10010150 51 32.33
10 10010150 28 32.33
Cheers, and thanks a lot,
Stephan Lindner
--
-----------------------
Stephan Lindner, Dipl.Vw.
1512 Gilbert Ct., V-17
Ann Arbor, Michigan 48105
U.S.A.
Tel.: 001-734-272-2437
E-Mail: lindners at umich.edu
"The prevailing ideas of a time were always only the ideas of the
ruling class" -- Karl Marx
Chuck Cleland
2006-Jun-20 09:02 UTC
[R] Create variables with common values for each group
Stephan Lindner wrote:> Dear all, > > sorry, this is for sure really basic, but I searched a lot in the > internet, and just couldn't find a solution. > > The problem is to create new variables from a data frame which > contains both individual and group variables, such as mean age for an > household. My data frame: > > > > df > > hhid h.age > 1 10010020 23 > 2 10010020 23 > 3 10010126 42 > 4 10010126 60 > 5 10010142 20 > 6 10010142 49 > 7 10010142 52 > 8 10010150 18 > 9 10010150 51 > 10 10010150 28 > > > where hhid is the same number for each household, h.age the age for > each household member. > > I tried tapply, by(), and aggregate. The best I could get was: > > by(df, df$hhid, function(subset) rep(mean(subset$h.age,na.rm=T),nrow(subset))) > > df$hhid: 10010020 > [1] 23 23 > ------------------------------------------------------------ > df$hhid: 10010126 > [1] 51 51 > ------------------------------------------------------------ > df$hhid: 10010142 > [1] 40.33333 40.33333 40.33333 > ------------------------------------------------------------ > df$hhid: 10010150 > [1] 32.33333 32.33333 32.33333 > > > Now I principally only would have to stack up the mean values, and > this is where I'm stucked. The function aggregate works nice, and I > could loop then, but I was wondering whether there is a better way to > do that.You could use aggregate() and then merge() the result with df. Something like this: > df.agg <- aggregate(df$h.age, list(hhid = df$hhid), mean) > > names(df.agg)[2] <- "mean.age" > > merge(df, df.agg) hhid h.age mean.age 1 10010020 23 23.00000 2 10010020 23 23.00000 3 10010126 42 51.00000 4 10010126 60 51.00000 5 10010142 20 40.33333 6 10010142 49 40.33333 7 10010142 52 40.33333 8 10010150 18 32.33333 9 10010150 51 32.33333 10 10010150 28 32.33333> My end result should look like this (assigning mean.age to the data frame): > > > > hhid h.age mean.age > 1 10010020 23 23.00 > 2 10010020 23 23.00 > 3 10010126 42 51.00 > 4 10010126 60 51.00 > 5 10010142 20 40.33 > 6 10010142 49 40.33 > 7 10010142 52 40.33 > 8 10010150 18 32.33 > 9 10010150 51 32.33 > 10 10010150 28 32.33 > > > > Cheers, and thanks a lot, > > > Stephan Lindner > > > >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Dimitris Rizopoulos
2006-Jun-20 09:03 UTC
[R] Create variables with common values for each group
you can use something like:
dat <- data.frame(hhid = rep(c(10010020, 10010126, 10010142,
10010150), c(2, 2, 3, 3)), h.age = sample(18:50, 10, TRUE))
###########
dat$mean.age <- rep(tapply(dat$h.age, dat$hhid, mean),
tapply(dat$h.age, dat$hhid, length))
dat
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Stephan Lindner" <lindners at umich.edu>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, June 20, 2006 10:42 AM
Subject: [R] Create variables with common values for each group
> Dear all,
>
> sorry, this is for sure really basic, but I searched a lot in the
> internet, and just couldn't find a solution.
>
> The problem is to create new variables from a data frame which
> contains both individual and group variables, such as mean age for
> an
> household. My data frame:
>
>
>
> df
>
> hhid h.age
> 1 10010020 23
> 2 10010020 23
> 3 10010126 42
> 4 10010126 60
> 5 10010142 20
> 6 10010142 49
> 7 10010142 52
> 8 10010150 18
> 9 10010150 51
> 10 10010150 28
>
>
> where hhid is the same number for each household, h.age the age for
> each household member.
>
> I tried tapply, by(), and aggregate. The best I could get was:
>
> by(df, df$hhid, function(subset)
> rep(mean(subset$h.age,na.rm=T),nrow(subset)))
>
> df$hhid: 10010020
> [1] 23 23
> ------------------------------------------------------------
> df$hhid: 10010126
> [1] 51 51
> ------------------------------------------------------------
> df$hhid: 10010142
> [1] 40.33333 40.33333 40.33333
> ------------------------------------------------------------
> df$hhid: 10010150
> [1] 32.33333 32.33333 32.33333
>
>
> Now I principally only would have to stack up the mean values, and
> this is where I'm stucked. The function aggregate works nice, and I
> could loop then, but I was wondering whether there is a better way
> to
> do that.
>
> My end result should look like this (assigning mean.age to the data
> frame):
>
>
>
> hhid h.age mean.age
> 1 10010020 23 23.00
> 2 10010020 23 23.00
> 3 10010126 42 51.00
> 4 10010126 60 51.00
> 5 10010142 20 40.33
> 6 10010142 49 40.33
> 7 10010142 52 40.33
> 8 10010150 18 32.33
> 9 10010150 51 32.33
> 10 10010150 28 32.33
>
>
>
> Cheers, and thanks a lot,
>
>
> Stephan Lindner
>
>
>
>
> --
> -----------------------
> Stephan Lindner, Dipl.Vw.
> 1512 Gilbert Ct., V-17
> Ann Arbor, Michigan 48105
> U.S.A.
> Tel.: 001-734-272-2437
> E-Mail: lindners at umich.edu
>
> "The prevailing ideas of a time were always only the ideas of the
> ruling class" -- Karl Marx
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Stephan Lindner <lindners <at> umich.edu> writes:> The problem is to create new variables from a data frame which > contains both individual and group variables, such as mean age for an > household. My data frame: > > df > > hhid h.age > 1 10010020 23 > 2 10010020 23...> where hhid is the same number for each household, h.age the age for > each household member. > > I tried tapply, by(), and aggregate. The best I could get was: > > by(df, df$hhid, function(subset) rep(mean(subset$h.age,na.rm=T),nrow(subset))) > > df$hhid: 10010020 > [1] 23 23 > ------------------------------------------------------------ > df$hhid: 10010126 > [1] 51 51try something like do.call("rbind",byresult) As you did not provide a running example, the suggestion is only approximately correct. Dieter