Hi all, after using Amelia II to create 10 imputed data sets I need to average them to have one unique data that includes the average for each cell of the variables imputed, in addition to the values for the variables not imputed. Such data has many variables (some numeric, other factors), and more than 20000 observations. I do not know how to average them out. Any help? Below I provide a small example: Suppose Amelia provided two datasets: d1 <- data.frame(subject = c("Felipe", "John"), eat1 = 1:2, eat3 = 5:6, trt = c("t1", "t2")) d2 <- data.frame(subject = c("Felipe", "John"), eat1 = 3:4, eat3 = 6:7, trt = c("t1", "t2")) I tried (d1 + d2)/2 but I lose my factors. mean() did not work either. The result I'd like is: subject eat1 eat3 trt 1 Felipe 2 5.5 t1 2 John 3 6.5 t2 thanks, *Felipe Nunes* CAPES/Fulbright Fellow PhD Student Political Science - UCLA Web: felipenunes.bol.ucla.edu [[alternative HTML version deleted]]
Hi, I might write a little function that does different things depending on the class of the variable. Along the lines of: where i is a column index: function(i) { if (is.numeric(imputeddata[, i])) { something } else if (is.factor(imputeddata[, i])) { something else } etc. then you can just do: combined <- lapply(1:nrow(imputeddata), yourfun) Alternately, you could consider some single imputation approaches since that is what you essentially end up doing. Cheers, Josh On Thu, Jan 12, 2012 at 10:16 PM, Felipe Nunes <felipnunes at gmail.com> wrote:> Hi all, > > after using Amelia II to create 10 imputed data sets I need to average them > to have one unique data that includes the average for each cell of the > variables imputed, in addition to the values for the variables not imputed. > Such data has many variables (some numeric, other factors), and more than > 20000 observations. I do not know how to average them out. Any help? > > Below I provide a small example: > > Suppose Amelia provided two datasets: > > d1 <- data.frame(subject = c("Felipe", "John"), eat1 = 1:2, eat3 = 5:6, trt > = c("t1", "t2")) > > d2 <- data.frame(subject = c("Felipe", "John"), eat1 = 3:4, eat3 = 6:7, trt > = c("t1", "t2")) > > I tried > > (d1 + d2)/2 > > but I lose my factors. mean() did not work either. > > The result I'd like is: > > ? ? subject ?eat1 ?eat3 ? trt > 1 ? Felipe ? ? 2 ? ? ?5.5 ? ? t1 > 2 ? ? John ? ? ?3 ? ? ?6.5 ? ? t2 > > thanks, > > *Felipe Nunes* > CAPES/Fulbright Fellow > PhD Student Political Science - UCLA > Web: felipenunes.bol.ucla.edu > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Here is a solution that works for your small example. It might be difficult to prepare your larger data sets to use the same method. db <-rbind(d1,d2) aggregate(subset(db,select=-c(subject,trt)), by=list(subject=db$subject),mean) ## or, for example, aggregate(subset(db,select=-c(subject,trt)), by=list(subject=db$subject, trt=db$trt),mean) In order for aggregate() to work, its first argument must have only numeric columns. That is what subset(db,select=-c(subject,trt)) does for you. (d1 + d2)/2 did not work because d1 and d2 are data frames, not numbers. Much more complicated, you could have done your averages one at a time, (d1$eat1[d1$subject=='Felipe'] + d2$eat1[d2$subjedt=='Felipe'])/2 and similarly for eat3 and John. But that is of course not practical for larger data sets. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/12/12 10:16 PM, "Felipe Nunes" <felipnunes at gmail.com> wrote:>Hi all, > >after using Amelia II to create 10 imputed data sets I need to average >them >to have one unique data that includes the average for each cell of the >variables imputed, in addition to the values for the variables not >imputed. >Such data has many variables (some numeric, other factors), and more than >20000 observations. I do not know how to average them out. Any help? > >Below I provide a small example: > >Suppose Amelia provided two datasets: > >d1 <- data.frame(subject = c("Felipe", "John"), eat1 = 1:2, eat3 = 5:6, >trt >= c("t1", "t2")) > >d2 <- data.frame(subject = c("Felipe", "John"), eat1 = 3:4, eat3 = 6:7, >trt >= c("t1", "t2")) > >I tried > >(d1 + d2)/2 > >but I lose my factors. mean() did not work either. > >The result I'd like is: > > subject eat1 eat3 trt >1 Felipe 2 5.5 t1 >2 John 3 6.5 t2 > >thanks, > >*Felipe Nunes* >CAPES/Fulbright Fellow >PhD Student Political Science - UCLA >Web: felipenunes.bol.ucla.edu > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.