I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. "Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e." But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org -------------------------------> library(Hmisc)Loading required package: survival Loading required package: splines> data(kyphosis, package = "rpart") > kp <- lapply(kyphosis, function(x)+ { is.na(x) <- sample(1:length(x), size = 10); x })> kp <- data.frame(kp) > kp$kyp <- kp$Kyphosis == "present" > set.seed(7) > imp <- aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10,+ type = "pmm", match = "closest") Iteration 13> f <- fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp,+ family = "binomial", data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.06 1.28 1.17 1.12 Rate of Missing Information: (Intercept) Age Start Number 0.06 0.22 0.14 0.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e.> fCall: fitter(formula = formula, family = "binomial", data = completed.data) Coefficients: (Intercept) Age Start Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66> sqrt(diag(vcov(f)))(Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408> -0.1979/0.0652068[1] -3.0350> summary(f)Call: fitter(formula = formula, family = "binomial", data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.0064 1.85 0.0649 Start -0.1979 0.0652 -3.03 0.0024 Number 0.6937 0.2454 2.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 ==================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer.
On Thu, Mar 31, 2011 at 2:56 PM, Yuelin Li <liy12 at mskcc.org> wrote:> I tried multiple imputation with aregImpute() and > fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. > > The warning message below suggests that summary(f) of > fit.mult.impute() would only use the last imputed data set. > Thus, the whole imputation process is ignored. > > ?"Not using a Design fitting function; summary(fit) > ? will use standard errors, t, P from last imputation only. > ? Use vcov(fit) to get the correct covariance matrix, > ? sqrt(diag(vcov(fit))) to get s.e." >Hello. I fiddled around with rms & multiple imputation when I was preparing these notes from our R summer course. I ran into that same thing you did, and my conclusion is slightly different from yours. http://pj.freefaculty.org/guides/Rcourse/multipleImputation/multipleImputation-1-lecture.pdf Look down to slide 80 or so, where I launch off into that question. It appears to me that aregImpute will give the "right answer" for fitters from rms, but if you want to feel confident about the results for other fitters, you should use mitools or some other paramater combining approach. My conclusion (slide 105) is "Please note: the standard errors in the output based on lrm match the std.errors estimated by MItools. Thus I conclude sqrt(diag(cov(fit.mult.impute.object) did not give correct results"> But the standard errors in summary(f) agree with the values > from sqrt(diag(vcov(f))) to the 4th decimal point. ?It would > seem that summary(f) actually adjusts for multiple > imputation? > > Does summary(f) in Hmisc 3.8-3 actually adjust for MI? > > If it does not adjust for MI, then how do I get the > MI-adjusted coefficients and standard errors? > > I can't seem to find answers in the documentations, including > rereading section 8.10 of the Harrell (2001) book ?Googling > located a thread in R-help back in 2003, which seemed dated. > Many thanks in advance for the help, > > Yuelin. > http://idecide.mskcc.org > ------------------------------- >> library(Hmisc) > Loading required package: survival > Loading required package: splines >> data(kyphosis, package = "rpart") >> kp <- lapply(kyphosis, function(x) > + ? ? ? { is.na(x) <- sample(1:length(x), size = 10); x }) >> kp <- data.frame(kp) >> kp$kyp <- kp$Kyphosis == "present" >> set.seed(7) >> imp <- aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, > + ? ? ? ? ? ? ? ? ? ? ?type = "pmm", match = "closest") > Iteration 13 >> f <- fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, > + ? ? ? ? ? ? ? ? family = "binomial", data = kp) > > Variance Inflation Factors Due to Imputation: > > (Intercept) ? ? ? ? Age ? ? ? Start ? ? ?Number > ? ? ? 1.06 ? ? ? ?1.28 ? ? ? ?1.17 ? ? ? ?1.12 > > Rate of Missing Information: > > (Intercept) ? ? ? ? Age ? ? ? Start ? ? ?Number > ? ? ? 0.06 ? ? ? ?0.22 ? ? ? ?0.14 ? ? ? ?0.10 > > d.f. for t-distribution for Tests of Single Coefficients: > > (Intercept) ? ? ? ? Age ? ? ? Start ? ? ?Number > ? ?2533.47 ? ? ?193.45 ? ? ?435.79 ? ? ?830.08 > > The following fit components were averaged over the 10 model fits: > > ?fitted.values linear.predictors > > Warning message: > In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, ?: > ?Not using a Design fitting function; summary(fit) will use > standard errors, t, P from last imputation only. ?Use vcov(fit) to get the > correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. > > >> f > > Call: ?fitter(formula = formula, family = "binomial", data = completed.data) > > Coefficients: > (Intercept) ? ? ? ? ?Age ? ? ? ?Start ? ? ? Number > ? ?-3.6971 ? ? ? 0.0118 ? ? ?-0.1979 ? ? ? 0.6937 > > Degrees of Freedom: 80 Total (i.e. Null); ?77 Residual > Null Deviance: ? ? ?80.5 > Residual Deviance: 58 ? AIC: 66 >> sqrt(diag(vcov(f))) > (Intercept) ? ? ? ? Age ? ? ? Start ? ? ?Number > ?1.5444782 ? 0.0063984 ? 0.0652068 ? 0.2454408 >> -0.1979/0.0652068 > [1] -3.0350 >> summary(f) > > Call: > fitter(formula = formula, family = "binomial", data = completed.data) > > Deviance Residuals: > ? Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max > -1.240 ?-0.618 ?-0.288 ?-0.109 ? 2.409 > > Coefficients: > ? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|) > (Intercept) ?-3.6971 ? ? 1.5445 ? -2.39 ? 0.0167 > Age ? ? ? ? ? 0.0118 ? ? 0.0064 ? ?1.85 ? 0.0649 > Start ? ? ? ?-0.1979 ? ? 0.0652 ? -3.03 ? 0.0024 > Number ? ? ? ?0.6937 ? ? 0.2454 ? ?2.83 ? 0.0047 > > (Dispersion parameter for binomial family taken to be 1) > > ? ?Null deviance: 80.508 ?on 80 ?degrees of freedom > Residual deviance: 57.965 ?on 77 ?degrees of freedom > AIC: 65.97 > > Number of Fisher Scoring iterations: 5 > > > ? ? ====================================================================> > ? ? Please note that this e-mail and any files transmitted with it may be > ? ? privileged, confidential, and protected from disclosure under > ? ? applicable law. If the reader of this message is not the intended > ? ? recipient, or an employee or agent responsible for delivering this > ? ? message to the intended recipient, you are hereby notified that any > ? ? reading, dissemination, distribution, copying, or other use of this > ? ? communication or any of its attachments is strictly prohibited. ?If > ? ? you have received this communication in error, please notify the > ? ? sender immediately by replying to this message and deleting this > ? ? message, any attachments, and all copies and backups from your > ? ? computer. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
For your approach how do you know that either summary or vcov used multiple imputation? You are using a non-rms fitting function so be careful. Compare with using the lrm fitting function. Also repace Design with the rms package. Please omit confidentiality notices from your e-mails. Frank I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. "Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e." But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org -------------------------------> library(Hmisc)Loading required package: survival Loading required package: splines> data(kyphosis, package = "rpart") > kp <- lapply(kyphosis, function(x)+ { is.na(x) <- sample(1:length(x), size = 10); x })> kp <- data.frame(kp) > kp$kyp <- kp$Kyphosis == "present" > set.seed(7) > imp <- aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10,+ type = "pmm", match = "closest") Iteration 13> f <- fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp,+ family = "binomial", data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.06 1.28 1.17 1.12 Rate of Missing Information: (Intercept) Age Start Number 0.06 0.22 0.14 0.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e.> fCall: fitter(formula = formula, family = "binomial", data = completed.data) Coefficients: (Intercept) Age Start Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66> sqrt(diag(vcov(f)))(Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408> -0.1979/0.0652068[1] -3.0350> summary(f)Call: fitter(formula = formula, family = "binomial", data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.0064 1.85 0.0649 Start -0.1979 0.0652 -3.03 0.0024 Number 0.6937 0.2454 2.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/fit-mult-impute-in-Hmisc-tp3419037p3741881.html Sent from the R help mailing list archive at Nabble.com.