Hello all, I found a weird result of the GLM function that seems to be a bug. The code: a=c(rep(1,8),rep(2,8)) b=c(rep(0,8),rep(3,8)) cbind(a,b) model=glm(b~a, family=poisson) summary(model) generates a dataset with two groups. One group consists entirely of zeros, the other of 3's (as happened in a dataset I’m analyzing right now). Since they are count data, one should apply a poisson distribution. A GLM with poisson distribution delivers a p value > 0.99, thus, completely fails to detect the difference between the two groups. Why not and what should I do to avoid this error? A quasipoisson distribution detects the difference but I’m not sure whether it’s appropriate to use it. Thanks a lot to everybody who answers! Florian Version information: version 1.9.0 (2004-4-12) os mingw32 arch i386
On Tue, Jan 25, 2005 at 06:22:26AM -0800, Florian Menzel wrote:> Hello all, > I found a weird result of the GLM function that seems > to be a bug. > The code: > a=c(rep(1,8),rep(2,8)) > b=c(rep(0,8),rep(3,8)) > cbind(a,b) > model=glm(b~a, family=poisson) > summary(model) > generates a dataset with two groups. One group > consists entirely of zeros, the other of 3?s (as > happened in a dataset I?m analyzing right now). Since > they are count data, one should apply a poisson > distribution. A GLM with poisson distribution delivers > a p value > 0.99, thus, completely fails to detect the > difference between the two groups. Why not and what > should I do to avoid this error? A quasipoisson > distribution detects the difference but I?m not sure > whether it?s appropriate to use it. > Thanks a lot to everybody who answers!This seems to be a good example of the Hauk-Donner effect; the likelihood ratio test gives a p-value of 8.017e-09, while the Wald p-value is 1 ! -- G?ran Brostr?m tel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Ume? University http://www.stat.umu.se/egna/gb/ SE-90187 Ume?, Sweden e-mail: gb at stat.umu.se
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Florian Menzel > Sent: Tuesday, January 25, 2005 3:22 PM > To: r-help at stat.math.ethz.ch; r-help at stat.math.ethz.ch > Subject: [R] GLM function with poisson distribution > > > Hello all, > I found a weird result of the GLM function that seems > to be a bug. > The code: > a=c(rep(1,8),rep(2,8)) > b=c(rep(0,8),rep(3,8)) > cbind(a,b) > model=glm(b~a, family=poisson) > summary(model)It' because one of the values of b is 0, hence the linear predictor for the corresponding level of a is -Inf, viz. the value -49 for the intercept, and the huge standard errors. Usual theory breaks down. Replace the 0s with 1s and you get something which is closer to what is covered by standard theory. Bendix Carstensen> generates a dataset with two groups. One group > consists entirely of zeros, the other of 3's (as > happened in a dataset I'm analyzing right now). Since > they are count data, one should apply a poisson > distribution. A GLM with poisson distribution delivers > a p value > 0.99, thus, completely fails to detect the > difference between the two groups. Why not and what should I > do to avoid this error? A quasipoisson distribution detects > the difference but I'm not sure whether it's appropriate to > use it. Thanks a lot to everybody who answers! > Florian > > Version information: > version 1.9.0 (2004-4-12) > os mingw32 > arch i386 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >
On Tue, 25 Jan 2005, Florian Menzel wrote:> Hello all, > I found a weird result of the GLM function that seems > to be a bug.No, the problem is that you are using the Wald test when the mle is infinite, which is always going to be unreliable. It's even worse because you are using data that couldn't really have come from a Poisson distribution (because for a=1 you have mean 3 and variance 0). If you used anova(model) to get a likelihood ratio test the p-value would be 4e-10. -thomas> The code: > a=c(rep(1,8),rep(2,8)) > b=c(rep(0,8),rep(3,8)) > cbind(a,b) > model=glm(b~a, family=poisson) > summary(model) > generates a dataset with two groups. One group > consists entirely of zeros, the other of 3‘s (as > happened in a dataset I’m analyzing right now). Since > they are count data, one should apply a poisson > distribution. A GLM with poisson distribution delivers > a p value > 0.99, thus, completely fails to detect the > difference between the two groups. Why not and what > should I do to avoid this error? A quasipoisson > distribution detects the difference but I’m not sure > whether it’s appropriate to use it. > Thanks a lot to everybody who answers! > Florian > > Version information: > version 1.9.0 (2004-4-12) > os mingw32 > arch i386 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle