thr3ads.net - R help - [R] Difference between drop1() vs. anova() for Gaussian glm models [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Karl Ove Hufthammer

2015-Jul-20 12:59 UTC

[R] Difference between drop1() vs. anova() for Gaussian glm models

Dear list members,

I?m having some problems understanding why drop1() and anova() gives 
different results for *Gaussian* glm models. Here?s a simple example:

  d = data.frame(x=1:6,
                 group=factor(c(rep("A",2), rep("B", 4))))
  l = glm(x~group, data=d)

Running the following code gives *three* different p-values. (I would expect 
it to give two different p-values.)

  anova(l, test="F")     # p = 0.04179
  anova(l, test="Chisq") # p = 0.00313
  drop1(l, test="Chisq") # p = 0.00841

I?m used to anova() and drop1() giving identical results for the same ?test? 
argument. However, it looks like the first two tests above use the F-
statistic as a test statistic, while the last one uses a ?scaled deviance? 
statistic:

  1-pf(8.7273, 1, 4)  # F-statistic
  1-pchisq(8.7273, 1) # F-statistic
  1-pchisq(6.9447, 1) # Scaled deviance

I couldn?t find any documentation on this difference. The help page for 
drop1() does say:

  The F tests for the "glm" methods are based on analysis of
  deviance tests, so if the dispersion is estimated it is based
  on the residual deviance, unlike the F tests of anova.glm.

But here it?s talking about *F* tests. And drop1() with test="F"
actually
gives the *same* p-value as anova() with test="F":

  drop1(l, test="F") # p = 0.04179

Any ideas why anova() and drop(1) uses different test statistics for the 
same ?test? arguments? And why the help page implies (?) that the results 
should differ for F-tests (while not mentioning chi-squared test), but here 
they do not (and the chi-squared tests do)?

$ sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE 20150714 (Tumbleweed) (x86_64)

locale:
 [1] LC_CTYPE=nn_NO.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=nn_NO.UTF-8        LC_COLLATE=nn_NO.UTF-8    
 [5] LC_MONETARY=nn_NO.UTF-8    LC_MESSAGES=nn_NO.UTF-8   
 [7] LC_PAPER=nn_NO.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.1

-- 
Karl Ove Hufthammer
E-mail: karl at huftis.org
Jabber: huftis at jabber.no

peter dalgaard

2015-Jul-20 17:03 UTC

head link

[R] Difference between drop1() vs. anova() for Gaussian glm models

I am somewhat surprised that _anything_ sensible comes out of anova.glm(l,
test="Chisq"). I think it is mostly expected that you use F tests for
that case.

What does seem to come out is the same as for drop1(l, test="Rao"),
which gives the scaled score test, which would seem to be equivalent to scaled
deviance in this case.

drop1.glm(l, test="Chisq") appears to be calculating the
"real" likelihood ratio test, evaluated in its asymptotic chi-square
distribution:
> 2*(logLik(l) - logLik(update(l,.~1)))'log Lik.' 6.944717 (df=3)

(Apologies for the daft output there... Why does "-" not either
subtract the df or unclass the whole thing?)

Notice that the scaled tests basically assume that the scale is known, even if
it is estimated, so in that sense, the real LRT should be superior. However, in
that case it is well known that the asymptotic approximation can be improved  by
transforming the LRT to the F statistic, whose exact distribution is known.

The remaining part of the riddle is why anova.glm doesn't do likelihood
differences in the same fashion as drop1.glm. My best guess is that it tries to
be consistent with anova.lm and anova.lm tries not to have to refit the sequence
of submodels.

 
> On 20 Jul 2015, at 14:59 , Karl Ove Hufthammer <karl at huftis.org>
wrote:
> 
> Dear list members,
> 
> I?m having some problems understanding why drop1() and anova() gives 
> different results for *Gaussian* glm models. Here?s a simple example:
> 
>  d = data.frame(x=1:6,
>                 group=factor(c(rep("A",2), rep("B",
4))))
>  l = glm(x~group, data=d)
> 
> Running the following code gives *three* different p-values. (I would
expect
> it to give two different p-values.)
> 
>  anova(l, test="F")     # p = 0.04179
>  anova(l, test="Chisq") # p = 0.00313
>  drop1(l, test="Chisq") # p = 0.00841
> 
> I?m used to anova() and drop1() giving identical results for the same
?test?
> argument. However, it looks like the first two tests above use the F-
> statistic as a test statistic, while the last one uses a ?scaled deviance? 
> statistic:
> 
>  1-pf(8.7273, 1, 4)  # F-statistic
>  1-pchisq(8.7273, 1) # F-statistic
>  1-pchisq(6.9447, 1) # Scaled deviance
> 
> I couldn?t find any documentation on this difference. The help page for 
> drop1() does say:
> 
>  The F tests for the "glm" methods are based on analysis of
>  deviance tests, so if the dispersion is estimated it is based
>  on the residual deviance, unlike the F tests of anova.glm.
> 
> But here it?s talking about *F* tests. And drop1() with test="F"
actually
> gives the *same* p-value as anova() with test="F":
> 
>  drop1(l, test="F") # p = 0.04179
> 
> Any ideas why anova() and drop(1) uses different test statistics for the 
> same ?test? arguments? And why the help page implies (?) that the results 
> should differ for F-tests (while not mentioning chi-squared test), but here
> they do not (and the chi-squared tests do)?
> 
> $ sessionInfo()
> R version 3.2.1 (2015-06-18)
> Platform: x86_64-suse-linux-gnu (64-bit)
> Running under: openSUSE 20150714 (Tumbleweed) (x86_64)
> 
> locale:
> [1] LC_CTYPE=nn_NO.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=nn_NO.UTF-8        LC_COLLATE=nn_NO.UTF-8    
> [5] LC_MONETARY=nn_NO.UTF-8    LC_MESSAGES=nn_NO.UTF-8   
> [7] LC_PAPER=nn_NO.UTF-8       LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets 
> [6] methods   base     
> 
> loaded via a namespace (and not attached):
> [1] tools_3.2.1
> 
> -- 
> Karl Ove Hufthammer
> E-mail: karl at huftis.org
> Jabber: huftis at jabber.no
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Jul 2015 - Difference between drop1() vs. anova() for Gaussian glm models

[R] Difference between drop1() vs. anova() for Gaussian glm models

[R] Difference between drop1() vs. anova() for Gaussian glm models