thr3ads.net - R help - [R] deviance in GLM vs. summary.glm [May 2017]

If this information is useful, please help other people find it:
Share via:

array chip

2017-May-31 04:53 UTC

[R] deviance in GLM vs. summary.glm

Hi, I am running a logistic regression on a simple dataset (attached) using
glm:> dat<-read.table("dat.txt",sep='\t',header=T)
If I use summary() on a logistic model:> summary(glm(y~x1*x2,dat,family='binomial'))Coefficients:? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|)(Intercept) ?
?19.57 ? ?5377.01 ? 0.004 ? ?0.997x1 ? ? ? ? ? ?-18.59 ? ?5377.01 ?-0.003 ?
?0.997x2B ? ? ? ? ? -19.57 ? ?5377.01 ?-0.004 ? ?0.997x1:x2B ? ? ? ? 38.15 ?
?7604.24 ? 0.005 ? ?0.996
As you can see, the interaction term is very insignificant (p = 0.996)!
But if I use a anova() to compare a full vs reduced model to evaluate the
interaction term:> anova(glm(y~x1+x2,dat,family='binomial'),
glm(y~x1*x2,dat,family='binomial'))Analysis of Deviance TableModel 1: y ~ x1 + x2Model 2: y ~ x1 * x2? Resid. Df Resid. Dev Df Deviance1 ? ?
? ?22 ? ? 27.067 ? ? ? ? ? ?2 ? ? ? ?21 ? ? 21.209 ?1 ? 5.8579
This follows a chi-square distribution with 1 df, so the corresponding p value
is:> 1-pchisq(5.8679,1)[1] 0.01541944So I get very different p value on the interaction term, can someone share
what's going wrong here?
Thanks!
Yi

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dat.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20170531/7d302080/attachment.txt>

Berwin A Turlach

2017-May-31 07:54 UTC

head link

[R] deviance in GLM vs. summary.glm

Dear Yi,

On Wed, 31 May 2017 04:53:25 +0000 (UTC)
array chip via R-help <r-help at r-project.org> wrote:
> Hi, I am running a logistic regression on a simple dataset (attached)
> using glm: [...] As you can see, the interaction term is very
> insignificant (p = 0.996)! 
Well, all terms are not significant (actually, AFAIK, the phrase "very
insignificant" does not exist; and if it does, than it ought not).  

But look at the estimates and the standard errors too (might be easier
if you had formatted your e-mail in a way that was readable)!  What
does an estimate for the intercept of 19.57 on the linear predictor
scale mean?  What it the estimated probability if you transform back?

Perhaps the following command

R> xtabs(~x1+x2+y,dat)

will shed some more light on what is going on in your data set, and why
the interaction term is highly significant.
> But if I use a anova() to compare a full
> vs reduced model to evaluate the interaction term:
> > anova(glm(y~x1+x2,dat,family='binomial'),
> > glm(y~x1*x2,dat,family='binomial'))
To you know about the (optional) argument 'test="ChiSq"' for
the
anova() command if you use it to compare models fitted by glm()? (see
help(anova.glm))
> So I get very different p value on the interaction term, can someone
> share what's going wrong here?
Data separation, aka Hauck-Donner phenomenon, discussed in any good
book on logistic regression.

Best wishes,

	Berwin

========================== Full address ===========================A/Prof Berwin
A Turlach, Director     Tel.: +61 (8) 6488 3338 (secr)
Centre for Applied Statistics               +61 (8) 6488 3383 (self)
School of Maths and Stats (M019)      FAX : +61 (8) 6488 1028
The University of Western Australia                 
35 Stirling Highway                 e-mail: Berwin.Turlach at gmail.com
Crawley WA 6009          http://staffhome.ecm.uwa.edu.au/~00043886/
Australia                http://www.researcherid.com/rid/A-4995-2008

Rolf Turner

2017-May-31 09:31 UTC

head link

[R] [FORGED] deviance in GLM vs. summary.glm

On 31/05/17 16:53, array chip via R-help wrote:> Hi, I am running a logistic regression on a simple dataset (attached) using
glm:
>> dat<-read.table("dat.txt",sep='\t',header=T)
> If I use summary() on a logistic model:
>> summary(glm(y~x1*x2,dat,family='binomial'))
> Coefficients:            Estimate Std. Error z value Pr(>|z|)(Intercept)
19.57    5377.01   0.004    0.997x1            -18.59    5377.01  -0.003   
0.997x2B           -19.57    5377.01  -0.004    0.997x1:x2B         38.15   
7604.24   0.005    0.996
> As you can see, the interaction term is very insignificant (p = 0.996)!
> But if I use a anova() to compare a full vs reduced model to evaluate the
interaction term:
>> anova(glm(y~x1+x2,dat,family='binomial'),
glm(y~x1*x2,dat,family='binomial'))Analysis of Deviance Table
> Model 1: y ~ x1 + x2Model 2: y ~ x1 * x2  Resid. Df Resid. Dev Df Deviance1
22     27.067            2        21     21.209  1   5.8579
> This follows a chi-square distribution with 1 df, so the corresponding p
value is:
>> 1-pchisq(5.8679,1)[1] 0.01541944
> So I get very different p value on the interaction term, can someone share
what's going wrong here?
(1) To re-emphasize Berwin Turlach's exhortation, ***PLEASE*** do not 
post in html; your results are nearly impossible to read.

(2) To add a tiny bit to Berwin's comments:

There is something a bit weird about your data; note that if you look at 
the fitted values from your fit, you get only *three* distinct values 
--- whereas there should be four, since there are four distinct
treatment combinations, (0,A), (0,B), (1,A) and (1,B).  And all possible 
treatment combinations do indeed occur.

The deficiency occurs because the last three of your coefficients (the 
non-intercept terms) sum to (effectively) 0.  Thus you get the same 
fitted value from the (1,B) combination as from the (0,A) combination.

It is not clear to me what the implications of all this are --- 
particularly since it's getting on for my bed time :-) --- but it seems 
clear that this sort of "degeneracy" is going to mess things up.
And perhaps contribute to the Hauck-Donner effect that Berwin told you 
about.

I would also advise you to try experimenting with simulated data.  (When 
in doubt, simulate.  Even when not in doubt, simulate.  That is my 
guiding principle.)  I.e. try to simulate y-values from a coefficient 
vector that does not suffer from the "degeneracy", fit a model to the 
simulated y-values, and compare the two tests.  They still won't agree, 
but they will (surely?) be less wildly discrepant.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

R help - May 2017 - deviance in GLM vs. summary.glm

[R] deviance in GLM vs. summary.glm

[R] deviance in GLM vs. summary.glm

[R] [FORGED] deviance in GLM vs. summary.glm