Jessica Z
2007-Aug-29 00:37 UTC
[R] extracting dataset with average imputed values from aregImpute()
extracting dataset with average imputed values from aregImpute() Dear all: after many trials, i am still quite lost on how to exact a dataset with average imputed values after running aregImpute() take the eg in the aregImpute(Hmisc) documentation(-- which also appeared in our R-archive with prof. Frank E Harrell Jr as the author ): --------the following is the way on how to get a completed dataset (but for only one draw of the k multiple imputations)-- btw,i don''t quite see what "fit.mult.impute"(mentioned below) is for, but seems it has nothing to do with my question: aregImpute produces a list containing the multiple imputations: w <- aregImpute(. . .) w$imputed$blood.pressure # gets m by k matrix # m = number of subjects with blood pressure missing, # k = number of multiple imputations To get a completed dataset (but for only one draw of the k multiple imputations) see how fit.mult.impute does it. I have just added the following example to the help file for aregImpute. set.seed(23) x <- runif(200) y <- x + runif(200, -.05, .05) y[1:20] <- NA d <- data.frame(x,y) f <- aregImpute(~ x + y, n.impute=10, match=''closest'', data=d) # Here is how to create a completed dataset for imputation # number 3 as fit.mult.impute would do automatically. In this # degenerate case changing 3 to 1-2,4-10 will not alter the results. completed <- d imputed <- impute.transcan(f, imputation=3, data=d, list.out=TRUE, pr=FALSE, check=FALSE) completed[names(imputed)] <- imputed completed # 200 by 2 data frame -------------------however, how could one get a completed dataset for the average of the K draws of the k multiple imputations? say, after running: w <- aregImpute(. . .) w$imputed$blood.pressure we gets m by k matrix m = number of subjects with blood pressure missing, k = number of multiple imputations this m by k matrix is for each subject (or say, for each record) with missing data. So for each row (record), i could average its k multiple imputation results , then store the result in a separate column. HOwever, this could only provide myself a dataset with just that m rows (records) which have missing data. what i really want is to get a COMPLETED dataset, with every non-missing value there just as they were in the original dataset, and with each ''NA'' in the original dataset got replaced by its average imputation value (the average of its k imputations). many thanks! --------------------------------- [[replacing trailing spam]] [[alternative HTML version deleted]]