m.crawley at imperial.ac.uk
2008-May-01 07:25 UTC
[Rd] zero variance in part of a glm (PR#11355)
In this real example (below), all four of the replicates in one treatment combination had zero failures, and this produced a very high standard error in the summary.lm. =20 Just adding one failure to one of the replicates produced a well-behaved standard error. =20 I don't know if this is a bug, but it is certainly hard for users to understand. =20 I would value your comments=20 =20 Thanks =20 Best wishes, =20 Mick =20 Prof M.J. Crawley =20 Imperial College London Silwood Park Ascot Berks SL5 7PY UK =20 Phone (0) 207 5942 216 Fax (0) 207 5942 339 =20 The data are from a bioassay in which a factorial experiment with two factors (antibio and toxin) each with three levels was replicated four times. The response is "mi" and "n-mi" =20 Note that lines 17 to 20 in the dataframe have no failures (24 dead out of 24 individuals) data<-read.table("c:\\temp\\ab1.txt",header=3DT) attach(data) names(data) =20 [1] "antibio" "toxin" "rep" "alive" "af" "dead" "mi" [8] "n"=20=20=20=20 =20 data =20 =20 =20 antibio toxin rep alive af dead mi n 1 camp control 1 24 0 0 0 24 2 camp control 2 23 0 1 1 24 3 camp control 3 23 0 1 1 24 4 camp control 4 21 0 3 3 24 5 camp Cry1Ab 1 21 4 3 7 24 6 camp Cry1Ab 2 20 3 4 7 24 7 camp Cry1Ab 3 20 4 4 8 24 8 camp Cry1Ab 4 18 7 6 13 24 9 camp Vip3A 1 7 3 17 20 24 10 camp Vip3A 2 10 6 14 20 24 11 camp Vip3A 3 11 5 13 18 24 12 camp Vip3A 4 10 2 14 16 24 13 bcock control 1 21 0 3 3 24 14 bcock control 2 24 0 0 0 24 15 bcock control 3 24 3 0 3 24 16 bcock control 4 23 1 1 2 24 17 bcock Cry1Ab 1 4 4 20 24 24 18 bcock Cry1Ab 2 4 4 20 24 24 19 bcock Cry1Ab 3 1 1 23 24 24 20 bcock Cry1Ab 4 2 2 22 24 24 21 bcock Vip3A 1 11 4 13 17 24 22 bcock Vip3A 2 15 5 9 14 24 23 bcock Vip3A 3 5 3 19 22 24 24 bcock Vip3A 4 10 6 14 20 24 25 ana control 1 23 0 1 1 24 26 ana control 2 23 0 1 1 24 27 ana control 3 22 0 2 2 24 28 ana control 4 23 0 1 1 24 29 ana Cry1Ab 1 18 1 6 7 24 30 ana Cry1Ab 2 17 4 7 11 24 31 ana Cry1Ab 3 13 4 11 15 24 32 ana Cry1Ab 4 14 3 10 13 24 33 ana Vip3A 1 21 12 3 15 24 34 ana Vip3A 2 12 6 12 18 24 35 ana Vip3A 3 9 5 15 20 24 36 ana Vip3A 4 9 1 15 16 24 =20 y<-cbind(mi,n-mi) model<-glm(y~antibio*toxin,quasibinomial) summary(model) =20 Call: glm(formula =3D y ~ antibio * toxin, family =3D quasibinomial) =20 Deviance Residuals:=20 Min 1Q Median 3Q Max=20=20 -2.0437 -0.5645 -0.1022 0.6921 1.9996=20=20 =20 Coefficients: Estimate Std. Error t value Pr(>|t|)=20=20=20=20 (Intercept) -2.901e+00 5.020e-01 -5.780 3.79e-06 *** antibiobcock 5.035e-01 6.441e-01 0.782 0.441=20=20=20=20 antibiocamp 2.100e-15 7.099e-01 2.96e-15 1.000=20=20=20=20 toxinCry1Ab 2.818e+00 5.494e-01 5.129 2.15e-05 *** toxinVip3A 3.840e+00 5.600e-01 6.857 2.29e-07 *** antibiobcock:toxinCry1Ab 2.050e+01 2.365e+03 0.009 0.993=20=20=20=20 antibiocamp:toxinCry1Ab -4.721e-01 7.795e-01 -0.606 0.550=20=20=20=20 antibiobcock:toxinVip3A -2.868e-01 7.381e-01 -0.389 0.701=20=20=20=20 antibiocamp:toxinVip3A 2.748e-01 7.975e-01 0.345 0.733=20=20=20=20 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 =20 (Dispersion parameter for quasibinomial family taken to be 1.19442) =20 Null deviance: 515.115 on 35 degrees of freedom Residual deviance: 35.047 on 27 degrees of freedom AIC: NA =20 Number of Fisher Scoring iterations: 17 =20 # note the standard error of the firast interaction term =3D 2365) =20 # add a single failure to one replicate =20 y2<-y y2[17,]<-c(23,1) model2<-glm(y2~antibio*toxin,quasibinomial) summary(model2) =20 Call: glm(formula =3D y2 ~ antibio * toxin, family =3D quasibinomial) =20 Deviance Residuals:=20 Min 1Q Median 3Q Max=20=20 -2.044 -0.627 -0.221 0.709 2.000=20=20 =20 Coefficients: Estimate Std. Error t value Pr(>|t|)=20=20=20=20 (Intercept) -2.901e+00 5.251e-01 -5.526 7.44e-06 *** antibiobcock 5.035e-01 6.737e-01 0.747 0.461=20=20=20=20 antibiocamp -4.058e-18 7.426e-01 -5.47e-18 1.000=20=20=20=20 toxinCry1Ab 2.818e+00 5.747e-01 4.904 3.94e-05 *** toxinVip3A 3.840e+00 5.857e-01 6.556 4.96e-07 *** antibiobcock:toxinCry1Ab 4.134e+00 1.352e+00 3.057 0.005 **=20 antibiocamp:toxinCry1Ab -4.721e-01 8.153e-01 -0.579 0.567=20=20=20=20 antibiobcock:toxinVip3A -2.868e-01 7.720e-01 -0.372 0.713=20=20=20=20 antibiocamp:toxinVip3A 2.748e-01 8.341e-01 0.329 0.744=20=20=20=20 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 =20 (Dispersion parameter for quasibinomial family taken to be 1.306703) =20 Null deviance: 506.602 on 35 degrees of freedom Residual deviance: 37.851 on 27 degrees of freedom AIC: NA =20 Number of Fisher Scoring iterations: 5 =20 # Now the standard errors are all well-behaved =20 =20 [[alternative HTML version deleted]]
NOT a bug. The estimate of the probability of mi is precisely 1.0 for one cell in your setup. The information contributed by that cell has a factor of p*(1-p) in it, which works out to zero. The bad behavior of asymptotic methods for binary regressions in such settings is well known. HTH, Chuck On Thu, 1 May 2008, m.crawley at imperial.ac.uk wrote:> In this real example (below), all four of the replicates in one > treatment combination had zero failures, and this produced a very high > standard error in the summary.lm. > =20 > Just adding one failure to one of the replicates produced a well-behaved > standard error. > =20 > I don't know if this is a bug, but it is certainly hard for users to > understand. > =20 > I would value your comments=20 > =20 > Thanks > =20 > Best wishes, > =20 > Mick > =20 > Prof M.J. Crawley > =20 > Imperial College London > Silwood Park > Ascot > Berks > SL5 7PY > UK > =20 > Phone (0) 207 5942 216 > Fax (0) 207 5942 339 > =20 > The data are from a bioassay in which a factorial experiment with two > factors (antibio and toxin) each with three levels was replicated four > times. The response is "mi" and "n-mi" > =20 > Note that lines 17 to 20 in the dataframe have no failures (24 dead out > of 24 individuals) > > data<-read.table("c:\\temp\\ab1.txt",header=3DT) > attach(data) > names(data) > =20 > [1] "antibio" "toxin" "rep" "alive" "af" "dead" "mi" > > [8] "n"=20=20=20=20 > =20 > data > =20 > =20 > =20 > antibio toxin rep alive af dead mi n > 1 camp control 1 24 0 0 0 24 > 2 camp control 2 23 0 1 1 24 > 3 camp control 3 23 0 1 1 24 > 4 camp control 4 21 0 3 3 24 > 5 camp Cry1Ab 1 21 4 3 7 24 > 6 camp Cry1Ab 2 20 3 4 7 24 > 7 camp Cry1Ab 3 20 4 4 8 24 > 8 camp Cry1Ab 4 18 7 6 13 24 > 9 camp Vip3A 1 7 3 17 20 24 > 10 camp Vip3A 2 10 6 14 20 24 > 11 camp Vip3A 3 11 5 13 18 24 > 12 camp Vip3A 4 10 2 14 16 24 > 13 bcock control 1 21 0 3 3 24 > 14 bcock control 2 24 0 0 0 24 > 15 bcock control 3 24 3 0 3 24 > 16 bcock control 4 23 1 1 2 24 > 17 bcock Cry1Ab 1 4 4 20 24 24 > 18 bcock Cry1Ab 2 4 4 20 24 24 > 19 bcock Cry1Ab 3 1 1 23 24 24 > 20 bcock Cry1Ab 4 2 2 22 24 24 > 21 bcock Vip3A 1 11 4 13 17 24 > 22 bcock Vip3A 2 15 5 9 14 24 > 23 bcock Vip3A 3 5 3 19 22 24 > 24 bcock Vip3A 4 10 6 14 20 24 > 25 ana control 1 23 0 1 1 24 > 26 ana control 2 23 0 1 1 24 > 27 ana control 3 22 0 2 2 24 > 28 ana control 4 23 0 1 1 24 > 29 ana Cry1Ab 1 18 1 6 7 24 > 30 ana Cry1Ab 2 17 4 7 11 24 > 31 ana Cry1Ab 3 13 4 11 15 24 > 32 ana Cry1Ab 4 14 3 10 13 24 > 33 ana Vip3A 1 21 12 3 15 24 > 34 ana Vip3A 2 12 6 12 18 24 > 35 ana Vip3A 3 9 5 15 20 24 > 36 ana Vip3A 4 9 1 15 16 24 > =20 > y<-cbind(mi,n-mi) > model<-glm(y~antibio*toxin,quasibinomial) > summary(model) > =20 > > Call: > glm(formula =3D y ~ antibio * toxin, family =3D quasibinomial) > =20 > Deviance Residuals:=20 > Min 1Q Median 3Q Max=20=20 > -2.0437 -0.5645 -0.1022 0.6921 1.9996=20=20 > =20 > Coefficients: > Estimate Std. Error t value Pr(>|t|)=20=20=20> =20 > (Intercept) -2.901e+00 5.020e-01 -5.780 3.79e-06 *** > antibiobcock 5.035e-01 6.441e-01 0.782 0.441=20=20=20> =20 > antibiocamp 2.100e-15 7.099e-01 2.96e-15 1.000=20=20=20> =20 > toxinCry1Ab 2.818e+00 5.494e-01 5.129 2.15e-05 *** > toxinVip3A 3.840e+00 5.600e-01 6.857 2.29e-07 *** > antibiobcock:toxinCry1Ab 2.050e+01 2.365e+03 0.009 0.993=20=20=20> =20 > antibiocamp:toxinCry1Ab -4.721e-01 7.795e-01 -0.606 0.550=20=20=20> =20 > antibiobcock:toxinVip3A -2.868e-01 7.381e-01 -0.389 0.701=20=20=20> =20 > antibiocamp:toxinVip3A 2.748e-01 7.975e-01 0.345 0.733=20=20=20> =20 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 > =20 > (Dispersion parameter for quasibinomial family taken to be 1.19442) > =20 > Null deviance: 515.115 on 35 degrees of freedom > Residual deviance: 35.047 on 27 degrees of freedom > AIC: NA > =20 > Number of Fisher Scoring iterations: 17 > =20 > # note the standard error of the firast interaction term =3D 2365) > =20 > # add a single failure to one replicate > =20 > y2<-y > y2[17,]<-c(23,1) > model2<-glm(y2~antibio*toxin,quasibinomial) > summary(model2) > =20 > Call: > glm(formula =3D y2 ~ antibio * toxin, family =3D quasibinomial) > =20 > Deviance Residuals:=20 > Min 1Q Median 3Q Max=20=20 > -2.044 -0.627 -0.221 0.709 2.000=20=20 > =20 > Coefficients: > Estimate Std. Error t value Pr(>|t|)=20=20=20> =20 > (Intercept) -2.901e+00 5.251e-01 -5.526 7.44e-06 *** > antibiobcock 5.035e-01 6.737e-01 0.747 0.461=20=20=20> =20 > antibiocamp -4.058e-18 7.426e-01 -5.47e-18 1.000=20=20=20> =20 > toxinCry1Ab 2.818e+00 5.747e-01 4.904 3.94e-05 *** > toxinVip3A 3.840e+00 5.857e-01 6.556 4.96e-07 *** > antibiobcock:toxinCry1Ab 4.134e+00 1.352e+00 3.057 0.005 **=20 > antibiocamp:toxinCry1Ab -4.721e-01 8.153e-01 -0.579 0.567=20=20=20> =20 > antibiobcock:toxinVip3A -2.868e-01 7.720e-01 -0.372 0.713=20=20=20> =20 > antibiocamp:toxinVip3A 2.748e-01 8.341e-01 0.329 0.744=20=20=20> =20 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 > =20 > (Dispersion parameter for quasibinomial family taken to be 1.306703) > =20 > Null deviance: 506.602 on 35 degrees of freedom > Residual deviance: 37.851 on 27 degrees of freedom > AIC: NA > =20 > Number of Fisher Scoring iterations: 5 > =20 > # Now the standard errors are all well-behaved > =20 > > =20 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
m.crawley at imperial.ac.uk wrote:> In this real example (below), all four of the replicates in one > treatment combination had zero failures, and this produced a very high > standard error in the summary.lm. >Why are you reporting trouble with your own understanding as a bug in R? I don't see anything out of the ordinary here. In the first case you have divergence, not because of a zero variance, but because you are trying to estimate an infinite value for the linear predictor (l = logit(1) for the combination where everyone dies). This will also happen if you fit an ordinary logistic regression, whenever you have 0s in the y*antibio*toxin marginal table.> =20 > Just adding one failure to one of the replicates produced a well-behaved > standard error. > =20 > I don't know if this is a bug, but it is certainly hard for users to > understand. > =20 > I would value your comments=20 > =20 > Thanks > =20 > Best wishes, > =20 > Mick > =20 > Prof M.J. Crawley > =20 > Imperial College London > Silwood Park > Ascot > Berks > SL5 7PY > UK > =20 > Phone (0) 207 5942 216 > Fax (0) 207 5942 339 > =20 > The data are from a bioassay in which a factorial experiment with two > factors (antibio and toxin) each with three levels was replicated four > times. The response is "mi" and "n-mi" > =20 > Note that lines 17 to 20 in the dataframe have no failures (24 dead out > of 24 individuals) > > data<-read.table("c:\\temp\\ab1.txt",header=3DT) > attach(data) > names(data) > =20 > [1] "antibio" "toxin" "rep" "alive" "af" "dead" "mi" > > [8] "n"=20=20=20=20 > =20 > data > =20 > =20 > =20 > antibio toxin rep alive af dead mi n > 1 camp control 1 24 0 0 0 24 > 2 camp control 2 23 0 1 1 24 > 3 camp control 3 23 0 1 1 24 > 4 camp control 4 21 0 3 3 24 > 5 camp Cry1Ab 1 21 4 3 7 24 > 6 camp Cry1Ab 2 20 3 4 7 24 > 7 camp Cry1Ab 3 20 4 4 8 24 > 8 camp Cry1Ab 4 18 7 6 13 24 > 9 camp Vip3A 1 7 3 17 20 24 > 10 camp Vip3A 2 10 6 14 20 24 > 11 camp Vip3A 3 11 5 13 18 24 > 12 camp Vip3A 4 10 2 14 16 24 > 13 bcock control 1 21 0 3 3 24 > 14 bcock control 2 24 0 0 0 24 > 15 bcock control 3 24 3 0 3 24 > 16 bcock control 4 23 1 1 2 24 > 17 bcock Cry1Ab 1 4 4 20 24 24 > 18 bcock Cry1Ab 2 4 4 20 24 24 > 19 bcock Cry1Ab 3 1 1 23 24 24 > 20 bcock Cry1Ab 4 2 2 22 24 24 > 21 bcock Vip3A 1 11 4 13 17 24 > 22 bcock Vip3A 2 15 5 9 14 24 > 23 bcock Vip3A 3 5 3 19 22 24 > 24 bcock Vip3A 4 10 6 14 20 24 > 25 ana control 1 23 0 1 1 24 > 26 ana control 2 23 0 1 1 24 > 27 ana control 3 22 0 2 2 24 > 28 ana control 4 23 0 1 1 24 > 29 ana Cry1Ab 1 18 1 6 7 24 > 30 ana Cry1Ab 2 17 4 7 11 24 > 31 ana Cry1Ab 3 13 4 11 15 24 > 32 ana Cry1Ab 4 14 3 10 13 24 > 33 ana Vip3A 1 21 12 3 15 24 > 34 ana Vip3A 2 12 6 12 18 24 > 35 ana Vip3A 3 9 5 15 20 24 > 36 ana Vip3A 4 9 1 15 16 24 > =20 > y<-cbind(mi,n-mi) > model<-glm(y~antibio*toxin,quasibinomial) > summary(model) > =20 > > Call: > glm(formula =3D y ~ antibio * toxin, family =3D quasibinomial) > =20 > Deviance Residuals:=20 > Min 1Q Median 3Q Max=20=20 > -2.0437 -0.5645 -0.1022 0.6921 1.9996=20=20 > =20 > Coefficients: > Estimate Std. Error t value Pr(>|t|)=20=20=20> =20 > (Intercept) -2.901e+00 5.020e-01 -5.780 3.79e-06 *** > antibiobcock 5.035e-01 6.441e-01 0.782 0.441=20=20=20> =20 > antibiocamp 2.100e-15 7.099e-01 2.96e-15 1.000=20=20=20> =20 > toxinCry1Ab 2.818e+00 5.494e-01 5.129 2.15e-05 *** > toxinVip3A 3.840e+00 5.600e-01 6.857 2.29e-07 *** > antibiobcock:toxinCry1Ab 2.050e+01 2.365e+03 0.009 0.993=20=20=20> =20 > antibiocamp:toxinCry1Ab -4.721e-01 7.795e-01 -0.606 0.550=20=20=20> =20 > antibiobcock:toxinVip3A -2.868e-01 7.381e-01 -0.389 0.701=20=20=20> =20 > antibiocamp:toxinVip3A 2.748e-01 7.975e-01 0.345 0.733=20=20=20> =20 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 > =20 > (Dispersion parameter for quasibinomial family taken to be 1.19442) > =20 > Null deviance: 515.115 on 35 degrees of freedom > Residual deviance: 35.047 on 27 degrees of freedom > AIC: NA > =20 > Number of Fisher Scoring iterations: 17 > =20 > # note the standard error of the firast interaction term =3D 2365) > =20 > # add a single failure to one replicate > =20 > y2<-y > y2[17,]<-c(23,1) > model2<-glm(y2~antibio*toxin,quasibinomial) > summary(model2) > =20 > Call: > glm(formula =3D y2 ~ antibio * toxin, family =3D quasibinomial) > =20 > Deviance Residuals:=20 > Min 1Q Median 3Q Max=20=20 > -2.044 -0.627 -0.221 0.709 2.000=20=20 > =20 > Coefficients: > Estimate Std. Error t value Pr(>|t|)=20=20=20> =20 > (Intercept) -2.901e+00 5.251e-01 -5.526 7.44e-06 *** > antibiobcock 5.035e-01 6.737e-01 0.747 0.461=20=20=20> =20 > antibiocamp -4.058e-18 7.426e-01 -5.47e-18 1.000=20=20=20> =20 > toxinCry1Ab 2.818e+00 5.747e-01 4.904 3.94e-05 *** > toxinVip3A 3.840e+00 5.857e-01 6.556 4.96e-07 *** > antibiobcock:toxinCry1Ab 4.134e+00 1.352e+00 3.057 0.005 **=20 > antibiocamp:toxinCry1Ab -4.721e-01 8.153e-01 -0.579 0.567=20=20=20> =20 > antibiobcock:toxinVip3A -2.868e-01 7.720e-01 -0.372 0.713=20=20=20> =20 > antibiocamp:toxinVip3A 2.748e-01 8.341e-01 0.329 0.744=20=20=20> =20 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1=20 > =20 > (Dispersion parameter for quasibinomial family taken to be 1.306703) > =20 > Null deviance: 506.602 on 35 degrees of freedom > Residual deviance: 37.851 on 27 degrees of freedom > AIC: NA > =20 > Number of Fisher Scoring iterations: 5 > =20 > # Now the standard errors are all well-behaved > =20 > > =20 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907