I am working with a longitudinal data set in the long format. This data set has three observations per grade level per year. Here are the first 10 rows of the data frame:>tenn.dat[1:10,]year schid type grade gain se new cohort 6 2001 100005 5 4 33.1 3.5 4 3 7 2002 100005 5 4 33.9 3.9 4 2 8 2003 100005 5 4 32.3 4.2 4 1 10 2001 100005 5 5 22.9 4.0 5 4 11 2002 100005 5 5 25.0 3.4 5 3 12 2003 100005 5 5 7.8 3.8 5 2 18 2001 100010 1 4 34.4 5.9 4 3 19 2002 100010 1 4 27.8 5.6 4 2 20 2003 100010 1 4 34.6 6.8 4 1 22 2001 100010 1 5 21.1 4.8 5 4 I need to create a new column in this data frame with the mean gain for each grade by year and the sd for each grade by year. So, I used tapply as follows: tapply(tenn.dat[,5],tenn.dat[,c(1,4)],mean) and tapply(tenn.dat[,5],tenn.dat[,c(1,4)],sd) which produces exactly the data I would like to attach in column 1 and 2 respectively. I am having a problem connecting this back with the corresponding rows in the data frame. If I used only one factor instead of two, I was successful connecting this with the data frame using: m.gain<-tapply(tenn.dat[,5],tenn.dat[,4],mean) tenn.dat$m.gain<-m.gain[as.character(tenn.dat$grade)] can anyone offer suggestions on a next step? Thanks, Harold [[alternative HTML version deleted]]
Have you considered "cbind" and "rbind"? If your data.frame has factors, they could present problems with "rbind". Try 'sapply(tenn.dat, class)'. If you have only class character or numeric, use "cbind" to match the columns of tenn.dat both in name and class, then use "rbind" to combine it with the original. hope this helps. spencer Doran, Harold wrote:>I am working with a longitudinal data set in the long format. This data >set has three observations per grade level per year. Here are the first >10 rows of the data frame: > > > > > >>tenn.dat[1:10,] >> >> > > > >year schid type grade gain se new cohort > >6 2001 100005 5 4 33.1 3.5 4 3 > >7 2002 100005 5 4 33.9 3.9 4 2 > >8 2003 100005 5 4 32.3 4.2 4 1 > >10 2001 100005 5 5 22.9 4.0 5 4 > >11 2002 100005 5 5 25.0 3.4 5 3 > >12 2003 100005 5 5 7.8 3.8 5 2 > >18 2001 100010 1 4 34.4 5.9 4 3 > >19 2002 100010 1 4 27.8 5.6 4 2 > >20 2003 100010 1 4 34.6 6.8 4 1 > >22 2001 100010 1 5 21.1 4.8 5 4 > > > >I need to create a new column in this data frame with the mean gain for >each grade by year and the sd for each grade by year. > > > >So, I used tapply as follows: > > > >tapply(tenn.dat[,5],tenn.dat[,c(1,4)],mean) and >tapply(tenn.dat[,5],tenn.dat[,c(1,4)],sd) which produces exactly the >data I would like to attach in column 1 and 2 respectively. > > > >I am having a problem connecting this back with the corresponding rows >in the data frame. > > > >If I used only one factor instead of two, I was successful connecting >this with the data frame using: > > > >m.gain<-tapply(tenn.dat[,5],tenn.dat[,4],mean) > > > >tenn.dat$m.gain<-m.gain[as.character(tenn.dat$grade)] > > > >can anyone offer suggestions on a next step? > > > >Thanks, > > > >Harold > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
Doran, Harold <HDoran <at> air.org> writes: : : I am working with a longitudinal data set in the long format. This data : set has three observations per grade level per year. Here are the first : 10 rows of the data frame: : : >tenn.dat[1:10,] : : year schid type grade gain se new cohort : : 6 2001 100005 5 4 33.1 3.5 4 3 : : 7 2002 100005 5 4 33.9 3.9 4 2 : : 8 2003 100005 5 4 32.3 4.2 4 1 : : 10 2001 100005 5 5 22.9 4.0 5 4 : : 11 2002 100005 5 5 25.0 3.4 5 3 : : 12 2003 100005 5 5 7.8 3.8 5 2 : : 18 2001 100010 1 4 34.4 5.9 4 3 : : 19 2002 100010 1 4 27.8 5.6 4 2 : : 20 2003 100010 1 4 34.6 6.8 4 1 : : 22 2001 100010 1 5 21.1 4.8 5 4 : : : I need to create a new column in this data frame with the mean gain for : each grade by year and the sd for each grade by year. : : So, I used tapply as follows: : : tapply(tenn.dat[,5],tenn.dat[,c(1,4)],mean) and : tapply(tenn.dat[,5],tenn.dat[,c(1,4)],sd) which produces exactly the : data I would like to attach in column 1 and 2 respectively. : : I am having a problem connecting this back with the corresponding rows : in the data frame. : : If I used only one factor instead of two, I was successful connecting : this with the data frame using: : : m.gain<-tapply(tenn.dat[,5],tenn.dat[,4],mean) : : tenn.dat$m.gain<-m.gain[as.character(tenn.dat$grade)] : : can anyone offer suggestions on a next step? Suggest you use by, instead of tapply, like this: f <- function(x) { x$mean.gain <- mean(x$gain); x$sd.gain <- sd(x$gain); x } res <- by(tenn, list(tenn$year, tenn$grade), f) do.call("rbind", res)