Dear all, Last autumn there was some discussion on the list of the warning Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, when fitting binomial GLMs with many 0 and few 1. Parts of replies: "You should be able to tell which coefficients are infinite -- the coefficients and their standard errors will be large. When this happens the standard errors and the p-values reported by summary.glm() for those variables are useless." "My guess is that the deviances and coefficients are entirely ok. I'd expect that problems in the general area that Thomas mentions to reveal themselves as a failure to converge." I have this problem with my data. In a GLM, I have 269 zeroes and only 1 one: summary(dbh) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.1659 3.8781 0.043 0.966 dbh -0.5872 0.5320 -1.104 0.270> drop1(dbh, test = "Chisq")Single term deletions Model: MPext ~ dbh Df Deviance AIC LRT Pr(Chi) <none> 9.9168 13.9168 dbh 1 13.1931 15.1931 3.2763 0.07029 . I now wonder, is the drop1() function output 'reliable'? If so, is then the estimates from MASS confint() also 'reliable'? It gives the same warning. Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -6.503472 -0.77470556 abund -1.962549 -0.07496205 There were 20 warnings (use warnings() to see them) Thanks in advance for your reply. Sincerely, Tord ----------------------------------------------------------------------- Tord Sn?ll Avd. f v?xtekologi, Evolutionsbiologiskt centrum, Uppsala universitet Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University Villav?gen 14 SE-752 36 Uppsala, Sweden Tel: 018-471 28 82 (int +46 18 471 28 82) (work) Tel: 018-25 71 33 (int +46 18 25 71 33) (home) Fax: 018-55 34 19 (int +46 18 55 34 19) (work) E-mail: Tord.Snall at ebc.uu.se Check this: http://www.vaxtbio.uu.se/resfold/snall.htm!
This seems to me to be a special case of the general problem of a parameter on a boundary. Another example is the case of a variance component that is zero. For this latter problem, Pinhiero and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer, sec. 2.4.1) present simulation results showing that a 50-50 mixture of chi-square(0) and chi-square(1), for example, provide an excellent approximation to the actual sampling distribution of the 2*log(likelihood ratio). Recent discussions of this and related questions on this list and elsewhere produced the following list of articles that may be helpful: Donald Andrews (2001) "Testing When a Parameter In on the Boundary of the Maintained Hypothesis", Econometrica, 69: 683-734. Donald Andrews (2000) "Inconsistency of the Bootstrap When a Parameter Is on the Boundary of the Parameter Space", Econometrica, 68: 388-405. Donald Andrews (1999) "Estimation When a Parameter Is on a Boundary", Econometrica, 67: 1341-1383. Rousseeuw, P. J. and Christmann, A. (2003) Robustness against separations and outliers in logistic regression, Computational Statistics & Data Analysis, Vol. 43, pp. 315-332 ### Unfortunately, I have not had time to review these, so I can't comment further. hope this helps. spencer graves Tord Snall wrote:>Dear all, > >Last autumn there was some discussion on the list of the warning >Warning message: >fitted probabilities numerically 0 or 1 occurred in: (if >(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, > >when fitting binomial GLMs with many 0 and few 1. > >Parts of replies: >"You should be able to tell which coefficients are infinite -- the >coefficients and their standard errors will be large. When this happens the >standard errors and the p-values reported by summary.glm() for those >variables are useless." >"My guess is that the deviances and coefficients are entirely ok. I'd >expect that problems in the general area that Thomas mentions to reveal >themselves as a failure to converge." > >I have this problem with my data. In a GLM, I have 269 zeroes and only 1 one: > >summary(dbh) >Coefficients: > Estimate Std. Error z value Pr(>|z|) >(Intercept) 0.1659 3.8781 0.043 0.966 >dbh -0.5872 0.5320 -1.104 0.270 > > > >>drop1(dbh, test = "Chisq") >> >> >Single term deletions >Model: >MPext ~ dbh > Df Deviance AIC LRT Pr(Chi) ><none> 9.9168 13.9168 >dbh 1 13.1931 15.1931 3.2763 0.07029 . > >I now wonder, is the drop1() function output 'reliable'? > >If so, is then the estimates from MASS confint() also 'reliable'? It gives >the same warning. > >Waiting for profiling to be done... > 2.5 % 97.5 % >(Intercept) -6.503472 -0.77470556 >abund -1.962549 -0.07496205 >There were 20 warnings (use warnings() to see them) > > >Thanks in advance for your reply. > > >Sincerely, >Tord > > > > >----------------------------------------------------------------------- >Tord Sn?ll >Avd. f v?xtekologi, Evolutionsbiologiskt centrum, Uppsala universitet >Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University >Villav?gen 14 >SE-752 36 Uppsala, Sweden >Tel: 018-471 28 82 (int +46 18 471 28 82) (work) >Tel: 018-25 71 33 (int +46 18 25 71 33) (home) >Fax: 018-55 34 19 (int +46 18 55 34 19) (work) >E-mail: Tord.Snall at ebc.uu.se >Check this: http://www.vaxtbio.uu.se/resfold/snall.htm! > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Dear Spencer, Thanks very much for your reply. I am a biologist, and thus not used to read stats papers. I will however give it a try. Hmm, I searched in mails with Subject 'warning', 'glm', 'fitted' without finding the answer, but perhaps it is hidden in mails with other Subject. Additional replies from others are most welcome. Sincerely, Tord At 10:54 2003-10-08 -0700, Spencer Graves wrote:> This seems to me to be a special case of the general problem of a >parameter on a boundary. Another example is the case of a variance >component that is zero. For this latter problem, Pinhiero and Bates >(2000) Mixed-Effects Models in S and S-Plus (Springer, sec. 2.4.1) >present simulation results showing that a 50-50 mixture of chi-square(0) >and chi-square(1), for example, provide an excellent approximation to >the actual sampling distribution of the 2*log(likelihood ratio).> > Recent discussions of this and related questions on this list and >elsewhere produced the following list of articles that may be helpful: > > Donald Andrews (2001) "Testing When a Parameter In on the Boundary >of the Maintained Hypothesis", Econometrica, 69: 683-734. > > Donald Andrews (2000) "Inconsistency of the Bootstrap When a >Parameter Is on the Boundary of the Parameter Space", Econometrica, 68: >388-405. > > Donald Andrews (1999) "Estimation When a Parameter Is on a >Boundary", Econometrica, 67: 1341-1383. > > Rousseeuw, P. J. and Christmann, A. (2003) Robustness against >separations >and outliers in logistic regression, Computational Statistics & Data >Analysis, Vol. 43, pp. 315-332 > > ### Unfortunately, I have not had time to review these, so I can't >comment further. > > hope this helps. spencer graves > >Tord Snall wrote: > >>Dear all, >> >>Last autumn there was some discussion on the list of the warning >>Warning message: >>fitted probabilities numerically 0 or 1 occurred in: (if >>(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, >> >>when fitting binomial GLMs with many 0 and few 1. >> >>Parts of replies: >>"You should be able to tell which coefficients are infinite -- the >>coefficients and their standard errors will be large. When this happens the >>standard errors and the p-values reported by summary.glm() for those >>variables are useless." >>"My guess is that the deviances and coefficients are entirely ok. I'd >>expect that problems in the general area that Thomas mentions to reveal >>themselves as a failure to converge." >> >>I have this problem with my data. In a GLM, I have 269 zeroes and only 1one:>> >>summary(dbh) >>Coefficients: >> Estimate Std. Error z value Pr(>|z|) >>(Intercept) 0.1659 3.8781 0.043 0.966 >>dbh -0.5872 0.5320 -1.104 0.270 >> >> >> >>>drop1(dbh, test = "Chisq") >>> >>> >>Single term deletions >>Model: >>MPext ~ dbh >> Df Deviance AIC LRT Pr(Chi) >><none> 9.9168 13.9168 >>dbh 1 13.1931 15.1931 3.2763 0.07029 . >> >>I now wonder, is the drop1() function output 'reliable'? >> >>If so, is then the estimates from MASS confint() also 'reliable'? It gives >>the same warning. >> >>Waiting for profiling to be done... >> 2.5 % 97.5 % >>(Intercept) -6.503472 -0.77470556 >>abund -1.962549 -0.07496205 >>There were 20 warnings (use warnings() to see them) >> >> >>Thanks in advance for your reply. >> >> >>Sincerely, >>Tord >> >> >> >> >>----------------------------------------------------------------------- >>Tord Sn?ll >>Avd. f v?xtekologi, Evolutionsbiologiskt centrum, Uppsala universitet >>Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University >>Villav?gen 14 >>SE-752 36 Uppsala, Sweden >>Tel: 018-471 28 82 (int +46 18 471 28 82) (work) >>Tel: 018-25 71 33 (int +46 18 25 71 33) (home) >>Fax: 018-55 34 19 (int +46 18 55 34 19) (work) >>E-mail: Tord.Snall at ebc.uu.se >>Check this: http://www.vaxtbio.uu.se/resfold/snall.htm! >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help >> >> > >----------------------------------------------------------------------- Tord Sn?ll Avd. f v?xtekologi, Evolutionsbiologiskt centrum, Uppsala universitet Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University Villav?gen 14 SE-752 36 Uppsala, Sweden Tel: 018-471 28 82 (int +46 18 471 28 82) (work) Tel: 018-25 71 33 (int +46 18 25 71 33) (home) Fax: 018-55 34 19 (int +46 18 55 34 19) (work) E-mail: Tord.Snall at ebc.uu.se Check this: http://www.vaxtbio.uu.se/resfold/snall.htm!