Antonin Ferry
2006-Sep-13 09:00 UTC
[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor
Dear members, here is my trouble: My data consists of counts of trapped insects in different attractive traps. I usually use GLMs with a poisson error distribution to find out the differences between my traitments (and to look at other factor effects). But for some dataset where one traitment contains only zeros, GLM with poisson family fail to find any difference between this particular traitment and anyother one (even with traitment that have trapped a lot of insects). GLMs with gaussian family does not seem to have the same problem but GLMs with binomial family does. I'm not sure if it is a statistical problem or if it comes from R... in the latter case I think some solution exists (perhaps in the options of the glm() function ?). Thank you for your help. Here I figure out an exemple to past in the console: ## START ############################################################################## # Take a data set of counts for two traitments, one containing only zeros A=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) B=c(1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0,1,1,1,0,1) traitment=c(rep("A",40),rep("B",40)) response=c(A,B) mydata=data.frame(traitment ,response) # Make a GLM on this dataset , with "family=poisson" g=glm(response~traitment, data=mydata, family=poisson) anova.glm(g,test="Chisq") # There is an effect of the traitment ... summary(g) # But traitment A does not differ from traitment B ! ! ! (the pvalue is always close from 1 in such cases) # Now if you replace only one zero of the A reponse to 1, the GLM works properly: mydata[1,2]=1 g=glm(response~traitment, data=mydata, family=poisson) anova.glm(g,test="Chisq") summary(g) ##################################################################################### END ## Antonin Ferry (PhD) "Laboratoire d'Ecobiologie des Insectes Parasitoides" http://www.parasitoides.univ-rennes1.fr Universit? de Renes1, FRANCE
vito muggeo
2006-Sep-13 10:49 UTC
[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor
Dear Antonin, It is a statistical problem: the well-known monotone likelihood. In this case ML estimate does not exist (or equals infinity) and Wald approximations (ob which SE are based) does not hold. However LRT is valid and provides reliable results. As far as I know, the only software dealing with monotone likelihood problems in loglinear models is LogXact by cytel corporation. best, vito Antonin Ferry wrote:> Dear members, > here is my trouble: My data consists of counts of trapped insects in different attractive traps. I usually use GLMs with a poisson error distribution to find out the differences between my traitments (and to look at other factor effects). But for some dataset where one traitment contains only zeros, GLM with poisson family fail to find any difference between this particular traitment and anyother one (even with traitment that have trapped a lot of insects). GLMs with gaussian family does not seem to have the same problem but GLMs with binomial family does. > I'm not sure if it is a statistical problem or if it comes from R... in the latter case I think some solution exists (perhaps in the options of the glm() function ?). > Thank you for your help. > > > Here I figure out an exemple to past in the console: > > ## START ############################################################################## > # Take a data set of counts for two traitments, one containing only zeros > A=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) > B=c(1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0,1,1,1,0,1) > traitment=c(rep("A",40),rep("B",40)) > response=c(A,B) > mydata=data.frame(traitment ,response) > > > # Make a GLM on this dataset , with "family=poisson" > > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > # There is an effect of the traitment ... > > summary(g) > # But traitment A does not differ from traitment B ! ! ! (the pvalue is always close from 1 in such cases) > > # Now if you replace only one zero of the A reponse to 1, the GLM works properly: > mydata[1,2]=1 > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > summary(g) > ##################################################################################### END ## > > > > Antonin Ferry (PhD) > > "Laboratoire d'Ecobiologie des Insectes Parasitoides" > http://www.parasitoides.univ-rennes1.fr > Universit? de Renes1, FRANCE > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- ===================================Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612
John Maindonald
2006-Sep-13 12:02 UTC
[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor
The Wald statistics that are returned as "z value" can be a very rough approximation. The standard error is radically different, on a logarithmic scale, between log(mu) = -20.30 [the best glm() managed in approximating -infinity] and log(mu) + log(a) = -0.29. It is actually worse than might appear; the SE=2457.38 is an approximation to infinity! The phenomenon is an extreme version, now with a poisson error model, of the Hauck-Donner effect (Modern Applied Statistics with S, 4th edn, pp.197-199) that occurs with binomial data. Use the result from the anova likelihood ratio test, where the approximations that are involved are usually much better behaved (but it would not hurt to do a simulation as a check.) There is an example of this phenomenon with a poisson error model in Subsection 8.4.2 (the same subsection number both for the 1st edn and the forthcoming 2nd edn) of Data Analysis & Graphics using R, CUP, 2003 and 2006. Install and attach the DAAG package and try example(moths) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.> Dear members, > here is my trouble: My data consists of counts of trapped insects > in different attractive traps. I usually use GLMs with a poisson > error distribution to find out the differences between my > traitments (and to look at other factor effects). But for some > dataset where one traitment contains only zeros, GLM with poisson > family fail to find any difference between this particular > traitment and anyother one (even with traitment that have trapped a > lot of insects). GLMs with gaussian family does not seem to have > the same problem but GLMs with binomial family does. > I'm not sure if it is a statistical problem or if it comes from > R... in the latter case I think some solution exists (perhaps in > the options of the glm() function ?). > Thank you for your help. > > > Here I figure out an exemple to past in the console: > > ## START > ###################################################################### > ######## > # Take a data set of counts for two traitments, one containing only > zeros > A=c > (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 > ,0,0,0,0,0) > B=c > (1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0 > ,1,1,1,0,1) > traitment=c(rep("A",40),rep("B",40)) > response=c(A,B) > mydata=data.frame(traitment ,response) > > > # Make a GLM on this dataset , with "family=poisson" > > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > # There is an effect of the traitment ... > > summary(g) > # But traitment A does not differ from traitment B ! ! ! (the > pvalue is always close from 1 in such cases) > > # Now if you replace only one zero of the A reponse to 1, the GLM > works properly: > mydata[1,2]=1 > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > summary(g) > ###################################################################### > ############### END ## > > > > Antonin Ferry (PhD) > > "Laboratoire d'Ecobiologie des Insectes Parasitoides" > http://www.parasitoides.univ-rennes1.fr > Universit? de Renes1, FRANCE
John Maindonald
2006-Sep-13 12:14 UTC
[R] unexpected result in glm (family=poisson) for data with an only zero response in one factor
PS. In part, the problem is with the use of the log link, arising because the limit of log(mu), as mu goes to 0, is minus infinity. This is not an appropriate scale on which to represent a fitted value that is zero. The estimated SE for a fitted value of zero should be 0. You will get a more sensible answer if you set family=poisson (link="sqrt") g <- glm(response~traitment, data=mydata, family=poisson(link="sqrt")) > summary(g) Call: glm(formula = response ~ traitment, family = poisson(link = "sqrt"), data = mydata) Deviance Residuals: Min 1Q Median 3Q Max -1.225e+00 -2.730e-05 -2.730e-05 2.745e-01 1.193e+00 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.0000193 0.0790569 0.000244 1 traitmentB 0.8660061 0.1118034 7.746 9.5e-15 (Dispersion parameter for poisson family taken to be 1) Null deviance: 75.485 on 79 degrees of freedom Residual deviance: 33.896 on 78 degrees of freedom AIC: 89.579 Number of Fisher Scoring iterations: 14 John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.> Dear members, > here is my trouble: My data consists of counts of trapped insects > in different attractive traps. I usually use GLMs with a poisson > error distribution to find out the differences between my > traitments (and to look at other factor effects). But for some > dataset where one traitment contains only zeros, GLM with poisson > family fail to find any difference between this particular > traitment and anyother one (even with traitment that have trapped a > lot of insects). GLMs with gaussian family does not seem to have > the same problem but GLMs with binomial family does. > I'm not sure if it is a statistical problem or if it comes from > R... in the latter case I think some solution exists (perhaps in > the options of the glm() function ?). > Thank you for your help. > > > Here I figure out an exemple to past in the console: > > ## START > ###################################################################### > ######## > # Take a data set of counts for two traitments, one containing only > zeros > A=c > (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 > ,0,0,0,0,0) > B=c > (1,0,0,0,2,1,0,0,1,2,0,0,0,1,2,2,0,1,1,0,1,0,2,1,1,0,1,2,0,1,0,1,1,1,0 > ,1,1,1,0,1) > traitment=c(rep("A",40),rep("B",40)) > response=c(A,B) > mydata=data.frame(traitment ,response) > > > # Make a GLM on this dataset , with "family=poisson" > > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > # There is an effect of the traitment ... > > summary(g) > # But traitment A does not differ from traitment B ! ! ! (the > pvalue is always close from 1 in such cases) > > # Now if you replace only one zero of the A reponse to 1, the GLM > works properly: > mydata[1,2]=1 > g=glm(response~traitment, data=mydata, family=poisson) > anova.glm(g,test="Chisq") > summary(g) > ###################################################################### > ############### END ## > > > > Antonin Ferry (PhD) > > "Laboratoire d'Ecobiologie des Insectes Parasitoides" > http://www.parasitoides.univ-rennes1.fr > Universit? de Renes1, FRANCE