thr3ads.net - R help - [R] Profile confidence intervals and LR chi-square test [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Inman, Brant A. M.D.

2006-Nov-13 23:41 UTC

[R] Profile confidence intervals and LR chi-square test

System: R 2.3.1 on Windows XP machine.

I am building a logistic regression model for a sample of 100 cases in
dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.

----------------
> summary(d) y      x1     x2     x3    
 0:54   0:50   0:64   0:78  
 1:46   1:50   1:36   1:22  
> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))
> summary(fit)
Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), 
    data = d)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6503  -1.0220  -0.7284   0.9965   1.7069  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.3772     0.3721  -1.014   0.3107  
x11          -0.8144     0.4422  -1.842   0.0655 .
x21           0.9226     0.4609   2.002   0.0453 *
x31           1.3347     0.5576   2.394   0.0167 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 120.65  on 96  degrees of freedom
AIC: 128.65

Number of Fisher Scoring iterations: 4
> exp(fit$coef)(Intercept)         x11         x21         x31 
  0.6858006   0.4429233   2.5157321   3.7989873 
---------------

After reading the appropriate sections in MASS4 (7.2 and 8.4 in
particular), I decided to estimate the 95% confidence intervals for the
odds ratios using the profile method implemented in the "confint"
function. I then used the "anova" function to perform the deviance
chi-square tests for each covariate.

---------------> ci <- confint(fit); exp(ci)Waiting for profiling to be done...
                2.5 %    97.5 %
(Intercept) 0.3246680  1.413684
x11         0.1834819  1.048154
x21         1.0256096  6.314473
x31         1.3221533 12.129210
> anova(fit, test='Chisq')Analysis of Deviance Table

Model: binomial, link: logit

Response: y

Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x1    1    5.856        98    132.133     0.016
x2    1    5.271        97    126.862     0.022
x3    1    6.212        96    120.650     0.013
----------------

My question relates to the interpretation of the significance of
variable x1.  The OR for x1 is 0.443 and its profile confidence interval
is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
would tend to suggest that x1 is NOT a significant predictor of y.
However, the deviance chi-square test has a P-value of 0.016, which
suggests that x1 is indeed a significant predictor of y. How do I
reconcile these two differing messages? I do recognize that the upper
bound of the confidence interval is pretty close to 1, but I am certain
that some journal reviewer will point out the problem as inconsistent.

Brant Inman

Henric Nilsson

2006-Nov-14 03:04 UTC

head link

[R] Profile confidence intervals and LR chi-square test

On 2006-11-14 00:41, Inman, Brant A. M.D. skrev:> System: R 2.3.1 on Windows XP machine.
Time to upgrade!
> 
> I am building a logistic regression model for a sample of 100 cases in
> dataframe "d", in which there are 3 binary covariates: x1, x2 and
x3.
Please provide a reproducible example (as suggested by the posting guide).
> 
> ----------------
> 
>> summary(d)
>  y      x1     x2     x3    
>  0:54   0:50   0:64   0:78  
>  1:46   1:50   1:36   1:22  
> 
>> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))
> 
>> summary(fit)
> 
> Call:
> glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), 
>     data = d)
> 
> Deviance Residuals: 
>     Min       1Q   Median       3Q      Max  
> -1.6503  -1.0220  -0.7284   0.9965   1.7069  
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)  
> (Intercept)  -0.3772     0.3721  -1.014   0.3107  
> x11          -0.8144     0.4422  -1.842   0.0655 .
> x21           0.9226     0.4609   2.002   0.0453 *
> x31           1.3347     0.5576   2.394   0.0167 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 137.99  on 99  degrees of freedom
> Residual deviance: 120.65  on 96  degrees of freedom
> AIC: 128.65
> 
> Number of Fisher Scoring iterations: 4
> 
>> exp(fit$coef)
> (Intercept)         x11         x21         x31 
>   0.6858006   0.4429233   2.5157321   3.7989873 
> ---------------
> 
> After reading the appropriate sections in MASS4 (7.2 and 8.4 in
> particular), I decided to estimate the 95% confidence intervals for the
> odds ratios using the profile method implemented in the "confint"
> function. I then used the "anova" function to perform the
deviance
> chi-square tests for each covariate.
> 
> ---------------
>> ci <- confint(fit); exp(ci)
> Waiting for profiling to be done...
>                 2.5 %    97.5 %
> (Intercept) 0.3246680  1.413684
> x11         0.1834819  1.048154
> x21         1.0256096  6.314473
> x31         1.3221533 12.129210
> 
>> anova(fit, test='Chisq')
> Analysis of Deviance Table
> 
> Model: binomial, link: logit
> 
> Response: y
> 
> Terms added sequentially (first to last)               ^^^^^^^^^^^^
Hence, your use of the `anova' function doesn't return tests 
corresponding to the CIs computed above.
> 
> 
>      Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989          
> x1    1    5.856        98    132.133     0.016
> x2    1    5.271        97    126.862     0.022
> x3    1    6.212        96    120.650     0.013
> ----------------
> 
> My question relates to the interpretation of the significance of
> variable x1.  The OR for x1 is 0.443 and its profile confidence interval
> is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
> would tend to suggest that x1 is NOT a significant predictor of y.
This is also suggested by the Wald test computed by the `summary' function.
> However, the deviance chi-square test has a P-value of 0.016, which
> suggests that x1 is indeed a significant predictor of y. How do I
That p-value corresponds to adding x1 to a model containing only the 
intercept term.
> reconcile these two differing messages? I do recognize that the upper
Generally, in order to compute the LR test for the null hypothesis of 
some subset of the parameters being equal to zero, you need to 
explicitly fit both the restricted and the unrestricted model and 
compare them using the `anova' function.

Also, see FAQ 7.18.


HTH,
Henric


> bound of the confidence interval is pretty close to 1, but I am certain
> that some journal reviewer will point out the problem as inconsistent.
> 
> Brant Inman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Prof Brian Ripley

2006-Nov-14 07:55 UTC

head link

[R] Profile confidence intervals and LR chi-square test

Your problem is the interpretation of anova(): it is a sequential test and 
x1 is the first term.  Using dropterm() would give you the correct LR 
test.

However, you also have a Wald test given by the line
> x11          -0.8144     0.4422  -1.842   0.0655 .
which is not significant at the 5% level.  The correct LRT would be 
expected to be more accurate, and your inversion of the profile likelihood 
is just a way to compute the LRT.


On Mon, 13 Nov 2006, Inman, Brant A.   M.D. wrote:
>
> System: R 2.3.1 on Windows XP machine.
>
> I am building a logistic regression model for a sample of 100 cases in
> dataframe "d", in which there are 3 binary covariates: x1, x2 and
x3.
>
> ----------------
>
>> summary(d)
> y      x1     x2     x3
> 0:54   0:50   0:64   0:78
> 1:46   1:50   1:36   1:22
>
>> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))
>
>> summary(fit)
>
> Call:
> glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit),
>    data = d)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -1.6503  -1.0220  -0.7284   0.9965   1.7069
>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -0.3772     0.3721  -1.014   0.3107
> x11          -0.8144     0.4422  -1.842   0.0655 .
> x21           0.9226     0.4609   2.002   0.0453 *
> x31           1.3347     0.5576   2.394   0.0167 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 137.99  on 99  degrees of freedom
> Residual deviance: 120.65  on 96  degrees of freedom
> AIC: 128.65
>
> Number of Fisher Scoring iterations: 4
>
>> exp(fit$coef)
> (Intercept)         x11         x21         x31
>  0.6858006   0.4429233   2.5157321   3.7989873
> ---------------
>
> After reading the appropriate sections in MASS4 (7.2 and 8.4 in
> particular), I decided to estimate the 95% confidence intervals for the
> odds ratios using the profile method implemented in the "confint"
> function. I then used the "anova" function to perform the
deviance
> chi-square tests for each covariate.
>
> ---------------
>> ci <- confint(fit); exp(ci)
> Waiting for profiling to be done...
>                2.5 %    97.5 %
> (Intercept) 0.3246680  1.413684
> x11         0.1834819  1.048154
> x21         1.0256096  6.314473
> x31         1.3221533 12.129210
>
>> anova(fit, test='Chisq')
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: y
>
> Terms added sequentially (first to last)
>
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x1    1    5.856        98    132.133     0.016
> x2    1    5.271        97    126.862     0.022
> x3    1    6.212        96    120.650     0.013
> ----------------
>
> My question relates to the interpretation of the significance of
> variable x1.  The OR for x1 is 0.443 and its profile confidence interval
> is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
> would tend to suggest that x1 is NOT a significant predictor of y.
> However, the deviance chi-square test has a P-value of 0.016, which
> suggests that x1 is indeed a significant predictor of y. How do I
> reconcile these two differing messages? I do recognize that the upper
> bound of the confidence interval is pretty close to 1, but I am certain
> that some journal reviewer will point out the problem as inconsistent.
>
> Brant Inman
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Inman, Brant A. M.D.

2006-Nov-14 17:42 UTC

head link

[R] Profile confidence intervals and LR chi-square test

Thank you to Prof Ripley and Henric Nilsson for their observation that I
was using "anova" inappropriately for the question that I was trying
to
answer. Note that the Wald statistics and confidence interval were
calculable in the previous email but that I prefered using the more
accurate deviance statistics.  I will demonstrate my error for the
benefit of those new users of R (like me) that may not have appreciated
how the "anova" function is working SEQUENTIALLY, and what
SEQUENTIALLY
actually means in this context.

Since the "anova" function is a sequential test of the current model,
only the statistic for the last term in the model formula is a true
deviance chi-square statistic for (full model) .vs. (full model - term).
For instance, using the data upon which this question was based,
consider the following 2 models:

----------------------
> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),
na.action=na.omit)> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)
----------------------

Here, fit0 is the null (intercept-only) model and fit1 is the full model
(without interactions because interactions are not biologically
plausible in this medical dataset). Now notice the result of the
"anova"
function for the full model:

----------------------
> anova(fit1, test='Chisq')
...

     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x1    1    8.267        98    129.721     0.004
x2    1    5.639        97    124.083     0.018
x3    1    3.433        96    120.650     0.064

-----------------------

It is incorrect to interpret the deviance chi-square test presented
above for x1 (P=0.004) as the deviance chi-square statistic comparing
(y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)
.vs. (y~x1). Similarly, the deviance chi-square statistic calculated for
x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.
(y~x1+x2).  Lastly, the deviance chi-square statistic for x3(P=0.064) is
probably the most intuitive because it is for the comparison of
(y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to
present for x3 in the full model.  So how do you get the correct
deviance chi-square statistics for x1 and x2 in the full model?  There
are a couple of ways of arriving at the same answer as I will
demonstrate for the case of x1.

Option#1: Reorder the full model so that x1 is the last term in the
model formula

-----------------------
> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),
na.action=na.omit)> anova(fit2, test='Chisq')...

     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x2    1    7.305        98    130.683     0.007
x3    1    3.821        97    126.862     0.051
x1    1    6.212        96    120.650     0.013

-----------------------

Notice that the deviance chi-square statistics for all of the variables
have changed, despite fit2 being identical in content to fit1. Just the
order of the terms in the model formula have changed from fit1 to fit2.
The deviance statistic for x1 is now the correct one for the full model,
that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).

Option#2: Compare the full model to a reduced model that does not
include x1.

-----------------------
> fit3 <- update(fit1, . ~ . - x1)
> anova(fit1, fit3, test='Chisq')...

Model 1: y ~ x1 + x2 + x3
Model 2: y ~ x2 + x3
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1        96    120.650                      
2        97    126.862 -1   -6.212     0.013

-----------------------

Fit3 is the same model as fit1 except that it is missing the x1 term.
Therefore, any change in the deviance chi-square statistic is due to the
deletion of x1 from full model. Notice that the difference in residual
deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same
the difference b/t x1 and x3 in option#1.


Brant

Maybe Matching Threads

Search for more reasonably related threads

R help - Nov 2006 - Profile confidence intervals and LR chi-square test

[R] Profile confidence intervals and LR chi-square test

[R] Profile confidence intervals and LR chi-square test

[R] Profile confidence intervals and LR chi-square test

[R] Profile confidence intervals and LR chi-square test

Maybe Matching Threads