Inman, Brant A. M.D.
2006-Nov-13 23:41 UTC
[R] Profile confidence intervals and LR chi-square test
System: R 2.3.1 on Windows XP machine. I am building a logistic regression model for a sample of 100 cases in dataframe "d", in which there are 3 binary covariates: x1, x2 and x3. ----------------> summary(d)y x1 x2 x3 0:54 0:50 0:64 0:78 1:46 1:50 1:36 1:22> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))> summary(fit)Call: glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), data = d) Deviance Residuals: Min 1Q Median 3Q Max -1.6503 -1.0220 -0.7284 0.9965 1.7069 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.3772 0.3721 -1.014 0.3107 x11 -0.8144 0.4422 -1.842 0.0655 . x21 0.9226 0.4609 2.002 0.0453 * x31 1.3347 0.5576 2.394 0.0167 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 137.99 on 99 degrees of freedom Residual deviance: 120.65 on 96 degrees of freedom AIC: 128.65 Number of Fisher Scoring iterations: 4> exp(fit$coef)(Intercept) x11 x21 x31 0.6858006 0.4429233 2.5157321 3.7989873 --------------- After reading the appropriate sections in MASS4 (7.2 and 8.4 in particular), I decided to estimate the 95% confidence intervals for the odds ratios using the profile method implemented in the "confint" function. I then used the "anova" function to perform the deviance chi-square tests for each covariate. ---------------> ci <- confint(fit); exp(ci)Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 0.3246680 1.413684 x11 0.1834819 1.048154 x21 1.0256096 6.314473 x31 1.3221533 12.129210> anova(fit, test='Chisq')Analysis of Deviance Table Model: binomial, link: logit Response: y Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 99 137.989 x1 1 5.856 98 132.133 0.016 x2 1 5.271 97 126.862 0.022 x3 1 6.212 96 120.650 0.013 ---------------- My question relates to the interpretation of the significance of variable x1. The OR for x1 is 0.443 and its profile confidence interval is 0.183-1.048. If a type I error rate of 5% is assumed, this result would tend to suggest that x1 is NOT a significant predictor of y. However, the deviance chi-square test has a P-value of 0.016, which suggests that x1 is indeed a significant predictor of y. How do I reconcile these two differing messages? I do recognize that the upper bound of the confidence interval is pretty close to 1, but I am certain that some journal reviewer will point out the problem as inconsistent. Brant Inman
Henric Nilsson
2006-Nov-14 03:04 UTC
[R] Profile confidence intervals and LR chi-square test
On 2006-11-14 00:41, Inman, Brant A. M.D. skrev:> System: R 2.3.1 on Windows XP machine.Time to upgrade!> > I am building a logistic regression model for a sample of 100 cases in > dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.Please provide a reproducible example (as suggested by the posting guide).> > ---------------- > >> summary(d) > y x1 x2 x3 > 0:54 0:50 0:64 0:78 > 1:46 1:50 1:36 1:22 > >> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit)) > >> summary(fit) > > Call: > glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), > data = d) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.6503 -1.0220 -0.7284 0.9965 1.7069 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.3772 0.3721 -1.014 0.3107 > x11 -0.8144 0.4422 -1.842 0.0655 . > x21 0.9226 0.4609 2.002 0.0453 * > x31 1.3347 0.5576 2.394 0.0167 * > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 137.99 on 99 degrees of freedom > Residual deviance: 120.65 on 96 degrees of freedom > AIC: 128.65 > > Number of Fisher Scoring iterations: 4 > >> exp(fit$coef) > (Intercept) x11 x21 x31 > 0.6858006 0.4429233 2.5157321 3.7989873 > --------------- > > After reading the appropriate sections in MASS4 (7.2 and 8.4 in > particular), I decided to estimate the 95% confidence intervals for the > odds ratios using the profile method implemented in the "confint" > function. I then used the "anova" function to perform the deviance > chi-square tests for each covariate. > > --------------- >> ci <- confint(fit); exp(ci) > Waiting for profiling to be done... > 2.5 % 97.5 % > (Intercept) 0.3246680 1.413684 > x11 0.1834819 1.048154 > x21 1.0256096 6.314473 > x31 1.3221533 12.129210 > >> anova(fit, test='Chisq') > Analysis of Deviance Table > > Model: binomial, link: logit > > Response: y > > Terms added sequentially (first to last)^^^^^^^^^^^^ Hence, your use of the `anova' function doesn't return tests corresponding to the CIs computed above.> > > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 99 137.989 > x1 1 5.856 98 132.133 0.016 > x2 1 5.271 97 126.862 0.022 > x3 1 6.212 96 120.650 0.013 > ---------------- > > My question relates to the interpretation of the significance of > variable x1. The OR for x1 is 0.443 and its profile confidence interval > is 0.183-1.048. If a type I error rate of 5% is assumed, this result > would tend to suggest that x1 is NOT a significant predictor of y.This is also suggested by the Wald test computed by the `summary' function.> However, the deviance chi-square test has a P-value of 0.016, which > suggests that x1 is indeed a significant predictor of y. How do IThat p-value corresponds to adding x1 to a model containing only the intercept term.> reconcile these two differing messages? I do recognize that the upperGenerally, in order to compute the LR test for the null hypothesis of some subset of the parameters being equal to zero, you need to explicitly fit both the restricted and the unrestricted model and compare them using the `anova' function. Also, see FAQ 7.18. HTH, Henric> bound of the confidence interval is pretty close to 1, but I am certain > that some journal reviewer will point out the problem as inconsistent. > > Brant Inman > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Prof Brian Ripley
2006-Nov-14 07:55 UTC
[R] Profile confidence intervals and LR chi-square test
Your problem is the interpretation of anova(): it is a sequential test and x1 is the first term. Using dropterm() would give you the correct LR test. However, you also have a Wald test given by the line> x11 -0.8144 0.4422 -1.842 0.0655 .which is not significant at the 5% level. The correct LRT would be expected to be more accurate, and your inversion of the profile likelihood is just a way to compute the LRT. On Mon, 13 Nov 2006, Inman, Brant A. M.D. wrote:> > System: R 2.3.1 on Windows XP machine. > > I am building a logistic regression model for a sample of 100 cases in > dataframe "d", in which there are 3 binary covariates: x1, x2 and x3. > > ---------------- > >> summary(d) > y x1 x2 x3 > 0:54 0:50 0:64 0:78 > 1:46 1:50 1:36 1:22 > >> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit)) > >> summary(fit) > > Call: > glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), > data = d) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.6503 -1.0220 -0.7284 0.9965 1.7069 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.3772 0.3721 -1.014 0.3107 > x11 -0.8144 0.4422 -1.842 0.0655 . > x21 0.9226 0.4609 2.002 0.0453 * > x31 1.3347 0.5576 2.394 0.0167 * > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 137.99 on 99 degrees of freedom > Residual deviance: 120.65 on 96 degrees of freedom > AIC: 128.65 > > Number of Fisher Scoring iterations: 4 > >> exp(fit$coef) > (Intercept) x11 x21 x31 > 0.6858006 0.4429233 2.5157321 3.7989873 > --------------- > > After reading the appropriate sections in MASS4 (7.2 and 8.4 in > particular), I decided to estimate the 95% confidence intervals for the > odds ratios using the profile method implemented in the "confint" > function. I then used the "anova" function to perform the deviance > chi-square tests for each covariate. > > --------------- >> ci <- confint(fit); exp(ci) > Waiting for profiling to be done... > 2.5 % 97.5 % > (Intercept) 0.3246680 1.413684 > x11 0.1834819 1.048154 > x21 1.0256096 6.314473 > x31 1.3221533 12.129210 > >> anova(fit, test='Chisq') > Analysis of Deviance Table > > Model: binomial, link: logit > > Response: y > > Terms added sequentially (first to last) > > > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 99 137.989 > x1 1 5.856 98 132.133 0.016 > x2 1 5.271 97 126.862 0.022 > x3 1 6.212 96 120.650 0.013 > ---------------- > > My question relates to the interpretation of the significance of > variable x1. The OR for x1 is 0.443 and its profile confidence interval > is 0.183-1.048. If a type I error rate of 5% is assumed, this result > would tend to suggest that x1 is NOT a significant predictor of y. > However, the deviance chi-square test has a P-value of 0.016, which > suggests that x1 is indeed a significant predictor of y. How do I > reconcile these two differing messages? I do recognize that the upper > bound of the confidence interval is pretty close to 1, but I am certain > that some journal reviewer will point out the problem as inconsistent. > > Brant Inman > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Inman, Brant A. M.D.
2006-Nov-14 17:42 UTC
[R] Profile confidence intervals and LR chi-square test
Thank you to Prof Ripley and Henric Nilsson for their observation that I was using "anova" inappropriately for the question that I was trying to answer. Note that the Wald statistics and confidence interval were calculable in the previous email but that I prefered using the more accurate deviance statistics. I will demonstrate my error for the benefit of those new users of R (like me) that may not have appreciated how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY actually means in this context. Since the "anova" function is a sequential test of the current model, only the statistic for the last term in the model formula is a true deviance chi-square statistic for (full model) .vs. (full model - term). For instance, using the data upon which this question was based, consider the following 2 models: ----------------------> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),na.action=na.omit)> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)---------------------- Here, fit0 is the null (intercept-only) model and fit1 is the full model (without interactions because interactions are not biologically plausible in this medical dataset). Now notice the result of the "anova" function for the full model: ----------------------> anova(fit1, test='Chisq')... Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 99 137.989 x1 1 8.267 98 129.721 0.004 x2 1 5.639 97 124.083 0.018 x3 1 3.433 96 120.650 0.064 ----------------------- It is incorrect to interpret the deviance chi-square test presented above for x1 (P=0.004) as the deviance chi-square statistic comparing (y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1) .vs. (y~x1). Similarly, the deviance chi-square statistic calculated for x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs. (y~x1+x2). Lastly, the deviance chi-square statistic for x3(P=0.064) is probably the most intuitive because it is for the comparison of (y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to present for x3 in the full model. So how do you get the correct deviance chi-square statistics for x1 and x2 in the full model? There are a couple of ways of arriving at the same answer as I will demonstrate for the case of x1. Option#1: Reorder the full model so that x1 is the last term in the model formula -----------------------> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),na.action=na.omit)> anova(fit2, test='Chisq')... Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 99 137.989 x2 1 7.305 98 130.683 0.007 x3 1 3.821 97 126.862 0.051 x1 1 6.212 96 120.650 0.013 ----------------------- Notice that the deviance chi-square statistics for all of the variables have changed, despite fit2 being identical in content to fit1. Just the order of the terms in the model formula have changed from fit1 to fit2. The deviance statistic for x1 is now the correct one for the full model, that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3). Option#2: Compare the full model to a reduced model that does not include x1. -----------------------> fit3 <- update(fit1, . ~ . - x1) > anova(fit1, fit3, test='Chisq')... Model 1: y ~ x1 + x2 + x3 Model 2: y ~ x2 + x3 Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 96 120.650 2 97 126.862 -1 -6.212 0.013 ----------------------- Fit3 is the same model as fit1 except that it is missing the x1 term. Therefore, any change in the deviance chi-square statistic is due to the deletion of x1 from full model. Notice that the difference in residual deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same the difference b/t x1 and x3 in option#1. Brant