Haibo Huang
2005-Aug-08  21:06 UTC
[R] Help with "non-integer #successes in a binomial glm"
Hi, I had a logit regression, but don't really know how to handle the "Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)" problem. I had the same logit regression without weights and it worked out without the warning, but I figured it makes more sense to add the weights. The weights sum up to one. Could anyone give me some hint? Thanks a lot! FYI, I have posted both regressions (with and without weights) below. Ed> setwd("P:/Work in Progress/Haibo/Hans") > > Lease=read.csv("lease.csv", header=TRUE) > Lease$ET <- factor(Lease$EarlyTermination) > SICCode=factor(Lease$SIC.Code) > Lease$TO=factor(Lease$TenantHasOption) > Lease$LO=factor(Lease$LandlordHasOption) > Lease$TEO=factor(Lease$TenantExercisedOption) > > RegA=glm(ET~1+TO,+ family=binomial(link=logit), data=Lease)> summary(RegA)Call: glm(formula = ET ~ 1 + TO, family = binomial(link logit), data = Lease) Deviance Residuals: Min 1Q Median 3Q Max -0.5839 -0.5839 -0.5839 -0.3585 2.3565 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.68271 0.02363 -71.20 <2e-16 *** TO1 -1.02959 0.09012 -11.43 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 12987 on 15809 degrees of freedom Residual deviance: 12819 on 15808 degrees of freedom AIC: 12823 Number of Fisher Scoring iterations: 5> setwd("P:/Work in Progress/Haibo/Hans") > > Lease=read.csv("lease.csv", header=TRUE) > Lease$ET <- factor(Lease$EarlyTermination) > SICCode=factor(Lease$SIC.Code) > Lease$TO=factor(Lease$TenantHasOption) > Lease$LO=factor(Lease$LandlordHasOption) > Lease$TEO=factor(Lease$TenantExercisedOption) > > RegA=glm(ET~1+TO,+ family=binomial(link=logit), data=Lease, weights=PortionSF) Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)> summary(RegA)Call: glm(formula = ET ~ 1 + TO, family = binomial(link logit), data = Lease, weights = PortionSF) Deviance Residuals: Min 1Q Median 3Q Max -0.055002 -0.003434 0.000000 0.000000 0.120656 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.120 2.618 -0.428 0.669 TO1 -1.570 9.251 -0.170 0.865 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1.0201 on 9302 degrees of freedom Residual deviance: 0.9787 on 9301 degrees of freedom AIC: 4 Number of Fisher Scoring iterations: 5
Prof Brian Ripley
2005-Aug-09  06:24 UTC
[R] Help with "non-integer #successes in a binomial glm"
On Mon, 8 Aug 2005, Haibo Huang wrote:> I had a logit regression, but don't really know how to > handle the "Warning message: non-integer #successes in > a binomial glm! in: eval(expr, envir, enclos)" > problem. I had the same logit regression without > weights and it worked out without the warning, but I > figured it makes more sense to add the weights. The > weights sum up to one.Weights are case weights in a binomial GLM, that is w_i means `I have w_i of these'. Do check out the theory in MASS (the book) or Nelder & McCullagh. There are some circumstances when fractional weights make sense (when this doing something other than fitting a glm, e.g. part of a `mixture of experts' model) but they are unusual, hence the warning.> > Could anyone give me some hint? Thanks a lot! > > FYI, I have posted both regressions (with and without > weights) below. > > Ed > > >> setwd("P:/Work in Progress/Haibo/Hans") >> >> Lease=read.csv("lease.csv", header=TRUE) >> Lease$ET <- factor(Lease$EarlyTermination) >> SICCode=factor(Lease$SIC.Code) >> Lease$TO=factor(Lease$TenantHasOption) >> Lease$LO=factor(Lease$LandlordHasOption) >> Lease$TEO=factor(Lease$TenantExercisedOption) >> >> RegA=glm(ET~1+TO, > + family=binomial(link=logit), data=Lease) >> summary(RegA) > > Call: > glm(formula = ET ~ 1 + TO, family = binomial(link > logit), data = Lease) > > Deviance Residuals: > Min 1Q Median 3Q Max > -0.5839 -0.5839 -0.5839 -0.3585 2.3565 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.68271 0.02363 -71.20 <2e-16 *** > TO1 -1.02959 0.09012 -11.43 <2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' > 0.1 ` ' 1 > > (Dispersion parameter for binomial family taken to be > 1) > > Null deviance: 12987 on 15809 degrees of freedom > Residual deviance: 12819 on 15808 degrees of freedom > AIC: 12823 > > Number of Fisher Scoring iterations: 5 > >> setwd("P:/Work in Progress/Haibo/Hans") >> >> Lease=read.csv("lease.csv", header=TRUE) >> Lease$ET <- factor(Lease$EarlyTermination) >> SICCode=factor(Lease$SIC.Code) >> Lease$TO=factor(Lease$TenantHasOption) >> Lease$LO=factor(Lease$LandlordHasOption) >> Lease$TEO=factor(Lease$TenantExercisedOption) >> >> RegA=glm(ET~1+TO, > + family=binomial(link=logit), data=Lease, > weights=PortionSF) > Warning message: > non-integer #successes in a binomial glm! in: > eval(expr, envir, enclos) >> summary(RegA) > > Call: > glm(formula = ET ~ 1 + TO, family = binomial(link > logit), data = Lease, > weights = PortionSF) > > Deviance Residuals: > Min 1Q Median 3Q Max > > -0.055002 -0.003434 0.000000 0.000000 0.120656 > > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.120 2.618 -0.428 0.669 > TO1 -1.570 9.251 -0.170 0.865 > > (Dispersion parameter for binomial family taken to be > 1) > > Null deviance: 1.0201 on 9302 degrees of freedom > Residual deviance: 0.9787 on 9301 degrees of freedom > AIC: 4 > > Number of Fisher Scoring iterations: 5-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595