Jessica Z
2007-Aug-29 00:37 UTC
[R] extracting dataset with average imputed values from aregImpute()
extracting dataset with average imputed values from aregImpute()
Dear all:
after many trials, i am still quite lost on how to exact a dataset with
average imputed values after running aregImpute()
take the eg in the aregImpute(Hmisc) documentation(-- which also appeared in
our R-archive with prof. Frank E Harrell Jr as the author ):
--------the following is the way on how to get a completed dataset (but for only
one draw of the k multiple imputations)-- btw,i don''t quite see what
"fit.mult.impute"(mentioned below) is for, but seems it has nothing to
do with my question:
aregImpute produces a list containing the multiple imputations:
w <- aregImpute(. . .)
w$imputed$blood.pressure # gets m by k matrix
# m = number of subjects with blood pressure missing,
# k = number of multiple imputations
To get a completed dataset (but for only one draw of the k multiple
imputations) see how fit.mult.impute does it. I have just added the
following example to the help file for aregImpute.
set.seed(23)
x <- runif(200)
y <- x + runif(200, -.05, .05)
y[1:20] <- NA
d <- data.frame(x,y)
f <- aregImpute(~ x + y, n.impute=10, match=''closest'',
data=d)
# Here is how to create a completed dataset for imputation
# number 3 as fit.mult.impute would do automatically. In this
# degenerate case changing 3 to 1-2,4-10 will not alter the results.
completed <- d
imputed <- impute.transcan(f, imputation=3, data=d, list.out=TRUE,
pr=FALSE, check=FALSE)
completed[names(imputed)] <- imputed
completed # 200 by 2 data frame
-------------------however, how could one get a completed dataset for the
average of the K draws of the k multiple imputations?
say, after running:
w <- aregImpute(. . .)
w$imputed$blood.pressure
we gets m by k matrix
m = number of subjects with blood pressure missing,
k = number of multiple imputations
this m by k matrix is for each subject (or say, for each record) with missing
data. So for each row (record), i could average its k multiple imputation
results , then store the result in a separate column. HOwever, this could only
provide myself a dataset with just that m rows (records) which have missing
data.
what i really want is to get a COMPLETED dataset, with every non-missing value
there just as they were in the original dataset, and with each
''NA'' in the original dataset got replaced by its average
imputation value (the average of its k imputations).
many thanks!
---------------------------------
[[replacing trailing spam]]
[[alternative HTML version deleted]]