thr3ads.net - R devel - [Rd] Apparant bug in binomial model in GLM (PR#13434) [Jan 2009]

If this information is useful, please help other people find it:
Share via:

soren.faurby at biology.au.dk

2009-Jan-06 11:25 UTC

[Rd] Apparant bug in binomial model in GLM (PR#13434)

Full_Name: S?ren Faurby
Version: 2.4.1 and 2.7.2
OS: 
Submission from: (NULL) (192.38.46.92)


There appear to be a bug in the estimation of significance in the binomial model
in GLM. This bug apparently appears when the correlation between two variables
is to strong.

Such as this dummy example
c(0,0,0,0,0,1,1,1,1,1)->a
a->b
m1<-glm(a~b, binomial)
summary(m1)

It is sufficient that all 1's correspond to 1's such as this example

c(0,0,0,0,0,1,1,1,1,1)->a
c(0,0,0,0,1,1,1,1,1,1)->c
m1<-glm(a~c, binomial)
summary(m1)

I hope that this message is understandable. 

Kind regards, S?ren

Prof Brian Ripley

2009-Jan-06 15:00 UTC

head link

[Rd] Apparant bug in binomial model in GLM (PR#13434)

This is a (too-little) known phenomenon: the problem is the low power of 
the Wald test in certain circumstances, and not the R implementation.

You can look it up in MASS (the book)  pp.197-9.

Can I ask how you 'knew for certain' what this should do? From the FAQ:

   But be sure you know for certain what it ought to have done. If you
   aren't familiar with the command, or don't know for certain how the
   command is supposed to work, then it might actually be working right.
   For example, people sometimes think there is a bug in R's mathematics
   because they don't understand how finite-precision arithmetic works.
   Rather than jumping to conclusions, show the problem to someone who
   knows for certain.

Who was your authority here?

On Tue, 6 Jan 2009, soren.faurby at biology.au.dk wrote:
> Full_Name: S?ren Faurby
> Version: 2.4.1 and 2.7.2
Neither of which is current.
> OS:
> Submission from: (NULL) (192.38.46.92)
>
>
> There appear to be a bug in the estimation of significance in the 
> binomial model in GLM. This bug apparently appears when the correlation 
> between two variables is to strong.
>
> Such as this dummy example
> c(0,0,0,0,0,1,1,1,1,1)->a
> a->b
> m1<-glm(a~b, binomial)
> summary(m1)
>
> It is sufficient that all 1's correspond to 1's such as this
example
>
> c(0,0,0,0,0,1,1,1,1,1)->a
> c(0,0,0,0,1,1,1,1,1,1)->c
> m1<-glm(a~c, binomial)
> summary(m1)
>
> I hope that this message is understandable.
>
> Kind regards, S?ren
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Dalgaard

2009-Jan-06 15:08 UTC

head link

[Rd] Apparant bug in binomial model in GLM (PR#13434)

soren.faurby at biology.au.dk wrote:> Full_Name: S?ren Faurby
> Version: 2.4.1 and 2.7.2
> OS: 
> Submission from: (NULL) (192.38.46.92)
> 
> 
> There appear to be a bug in the estimation of significance in the binomial
model
> in GLM. This bug apparently appears when the correlation between two
variables
> is to strong.
> 
> Such as this dummy example
> c(0,0,0,0,0,1,1,1,1,1)->a
> a->b
> m1<-glm(a~b, binomial)
> summary(m1)
> 
> It is sufficient that all 1's correspond to 1's such as this
example
> 
> c(0,0,0,0,0,1,1,1,1,1)->a
> c(0,0,0,0,1,1,1,1,1,1)->c
> m1<-glm(a~c, binomial)
> summary(m1)
That's not a bug, just the way things work. When the algorithm diverges,
 as seen by the huge Std.Error, Wald tests (z) are unreliable. (Notice
that the log OR in an a vs. c table is infinite whichever way you turn
it.) The likelihood ratio test (as in drop1(m1, test="Chisq")) is
somewhat less unreliable, but in these small examples, still quite some
distance from the table based approaches of fisher.test(a,c) and
chisq.test(a,c).

> 
> I hope that this message is understandable. 
> 
> Kind regards, S?ren
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Daniel Sabanés Bové

2009-Jan-07 12:41 UTC

head link

[Rd] Apparant bug in binomial model in GLM (PR#13434)

> There appear to be a bug in the estimation of significance in the binomial
model
> in GLM. This bug apparently appears when the correlation between two
variables
> is to strong.
>
> Such as this dummy example
> c(0,0,0,0,0,1,1,1,1,1)->a
> a->b
> m1<-glm(a~b, binomial)
> summary(m1)
>
> It is sufficient that all 1's correspond to 1's such as this
exampleIsn't this just the well-known failure of logistic regression (non-existence
of maximum likelihood estimator) for linearly separable data? (b > 0.5 =>
a == 1)> c(0,0,0,0,0,1,1,1,1,1)->a
> c(0,0,0,0,1,1,1,1,1,1)->c
> m1<-glm(a~c, binomial)
> summary(m1)This is not linearly separable, but almost, so the standard errors will be very
large.

Charles Geyer

2009-Jan-07 15:12 UTC

head link

[Rd] Apparant bug in binomial model in GLM (PR#13434)

> Date: Tue, 6 Jan 2009 15:00:19 +0000 (GMT)
> From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
> Subject: Re: [Rd] Apparant bug in binomial model in GLM (PR#13434)
> To: soren.faurby at biology.au.dk
> Cc: R-bugs at r-project.org, r-devel at stat.math.ethz.ch
> Message-ID: <alpine.LFD.2.00.0901061444380.9877 at
auk.stats.ox.ac.uk>
> Content-Type: text/plain; charset="iso-8859-1";
Format="flowed"
> 
> This is a (too-little) known phenomenon: the problem is the low power of 
> the Wald test in certain circumstances, and not the R implementation.
> 
> You can look it up in MASS (the book)  pp.197-9.
Good reference, but a much better reference tells what WORKS in this
situation not what doesn't work.

    http://www.stat.umn.edu/geyer/gdor/

has a paper and technical report that give methods detecting when the MLE
for a GLM does not exist in the conventional sense and for constructing
valid hypothesis tests and confidence intervals.  The methods are implemented
using the R contributed package rcdd and detailed examples are shown in
the technical report, which is made with Sweave, hence fully reproducible
by anyone who has R.

BTW the particular example given doesn't make clear WHAT question cannot
be answered correctly.  Some questions can be without fuss, for example
> y <- c(0,0,0,0,0,1,1,1,1,1)
> x <- seq(along = y)
> out1 <- glm(y ~ x, family = binomial)Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
etastart,  :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
etastart,  :
  fitted probabilities numerically 0 or 1 occurred> out0 <- glm(y ~ 1, family = binomial)
> anova(out0, out1, test = "Chisq")Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ x
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1         9    13.8629
2         8  7.865e-10  1  13.8629    0.0002

This P-value (P = 0.0002) is valid, because the MLE does exist for the null
hypothesis.  Hence we see that we have to use the model y ~ x for which
the MLE does not exist in the conventional sense.

To obtain valid one-sided confidence intervals for the linear predictor or
mean value parameters, follow the methods in the technical report.
> Can I ask how you 'knew for certain' what this should do? From the
FAQ:
> 
>    But be sure you know for certain what it ought to have done. If you
>    aren't familiar with the command, or don't know for certain how
the
>    command is supposed to work, then it might actually be working right.
>    For example, people sometimes think there is a bug in R's
mathematics
>    because they don't understand how finite-precision arithmetic works.
>    Rather than jumping to conclusions, show the problem to someone who
>    knows for certain.
> 
> Who was your authority here?
> 
> 
> On Tue, 6 Jan 2009, soren.faurby at biology.au.dk wrote:
> 
> > Full_Name: S?ren Faurby
> > Version: 2.4.1 and 2.7.2
> 
> Neither of which is current.
> 
> > OS:
> > Submission from: (NULL) (192.38.46.92)
> >
> >
> > There appear to be a bug in the estimation of significance in the 
> > binomial model in GLM. This bug apparently appears when the
correlation
> > between two variables is to strong.
> >
> > Such as this dummy example
> > c(0,0,0,0,0,1,1,1,1,1)->a
> > a->b
> > m1<-glm(a~b, binomial)
> > summary(m1)
> >
> > It is sufficient that all 1's correspond to 1's such as this
example
> >
> > c(0,0,0,0,0,1,1,1,1,1)->a
> > c(0,0,0,0,1,1,1,1,1,1)->c
> > m1<-glm(a~c, binomial)
> > summary(m1)
> >
> > I hope that this message is understandable.
> >
> > Kind regards, S?ren
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ------------------------------
> 
> Message: 6
> Date: Tue, 06 Jan 2009 16:08:21 +0100
> From: Peter Dalgaard <P.Dalgaard at biostat.ku.dk>
> Subject: Re: [Rd] Apparant bug in binomial model in GLM (PR#13434)
> To: soren.faurby at biology.au.dk
> Cc: R-bugs at r-project.org, r-devel at stat.math.ethz.ch
> Message-ID: <496373E5.9000901 at biostat.ku.dk>
> Content-Type: text/plain; charset=UTF-8
> 
> soren.faurby at biology.au.dk wrote:
> > Full_Name: S?ren Faurby
> > Version: 2.4.1 and 2.7.2
> > OS: 
> > Submission from: (NULL) (192.38.46.92)
> > 
> > 
> > There appear to be a bug in the estimation of significance in the
binomial model
> > in GLM. This bug apparently appears when the correlation between two
variables
> > is to strong.
> > 
> > Such as this dummy example
> > c(0,0,0,0,0,1,1,1,1,1)->a
> > a->b
> > m1<-glm(a~b, binomial)
> > summary(m1)
> > 
> > It is sufficient that all 1's correspond to 1's such as this
example
> > 
> > c(0,0,0,0,0,1,1,1,1,1)->a
> > c(0,0,0,0,1,1,1,1,1,1)->c
> > m1<-glm(a~c, binomial)
> > summary(m1)
> 
> That's not a bug, just the way things work. When the algorithm
diverges,
>  as seen by the huge Std.Error, Wald tests (z) are unreliable. (Notice
> that the log OR in an a vs. c table is infinite whichever way you turn
> it.) The likelihood ratio test (as in drop1(m1, test="Chisq")) is
> somewhat less unreliable, but in these small examples, still quite some
> distance from the table based approaches of fisher.test(a,c) and
> chisq.test(a,c).
> 
> 
> > 
> > I hope that this message is understandable. 
> > 
> > Kind regards, S?ren
> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
-- 
Charles Geyer
Professor, School of Statistics
University of Minnesota
charlie at stat.umn.edu

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Jan 2009 - Apparant bug in binomial model in GLM (PR#13434)

[Rd] Apparant bug in binomial model in GLM (PR#13434)

[Rd] Apparant bug in binomial model in GLM (PR#13434)

[Rd] Apparant bug in binomial model in GLM (PR#13434)

[Rd] Apparant bug in binomial model in GLM (PR#13434)

[Rd] Apparant bug in binomial model in GLM (PR#13434)

Possibly Parallel Threads