Hello, I have a question regarding estimation of the dispersion parameter (theta) for generalized linear models with the negative binomial error structure. As I understand, there are two main methods to fit glm's using the nb error structure in R: glm.nb() or glm() with the negative.binomial(theta) family. Both functions are implemented through the MASS library. Fitting the model using these two functions to the same data produces much different results for me in terms of estimated theta and the coefficients, and I am not sure why. the following model: mod<-glm.nb(count ~ year + season + time + depth, link="log",data=dat,control=glm.control(maxit=100,trace=T)) estimates theta as 0.0109 while the following model: mod2<-glm(count ~ year + season + time + depth, family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T)) will not accept 0.0109 as theta and instead estimates it as 1271 (these are fisheries catch data and so are very overdispersed). Fitting a quasipoisson model also yields a large dispersion parameter (1300). The models also produce different coefficients and P-values, which is disconcerting. What am I doing wrong here? I've read through the help sections (?negative.binomial,?glm.nb, and ?glm) but did not find any answers. Any help and/or input is greatly appreciated! Daniel. -- View this message in context: http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html Sent from the R help mailing list archive at Nabble.com.
On Thu, 13 Aug 2009, hesicaia wrote:> > Hello, > > I have a question regarding estimation of the dispersion parameter (theta) > for generalized linear models with the negative binomial error structure. AsThe theta is different from the dispersion. In the usual GLM notation: E[y] = mu VAR[y] = dispersion * V(mu) The function V() depends on the family and is Poisson: V(mu) = mu NB: V(mu) = mu + 1/theta * mu^2 For both models, dispersion is known to be 1 (from the likelihood). However, a quasi-Poisson approach can be adopted where dispersion is estimated (but does not correspond to a specific likelihood). Thus, dispersion and theta are really different things although both of them can be used to capture overdispersion.> I understand, there are two main methods to fit glm's using the nb error > structure in R: glm.nb() or glm() with the negative.binomial(theta) family. > Both functions are implemented through the MASS library. Fitting the model > using these two functions to the same data produces much different results > for me in terms of estimated theta and the coefficients, and I am not sure > why....because you tell them to do different things.> the following model: > mod<-glm.nb(count ~ year + season + time + depth, > link="log",data=dat,control=glm.control(maxit=100,trace=T)) > estimates theta as 0.0109This approach estimates theta (= 0.0109) and assumes that dispersion is known to be 1. The underlying estimated variance function is: VAR[y] = mu + 91.74312 * mu^2> while the following model: > mod2<-glm(count ~ year + season + time + depth, > family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T)) > will not accept 0.0109 as theta and instead estimates it as 1271 (these are > fisheries catch data and so are very overdispersed).This does not estimate theta at all but keeps it fixed (= 100). By default, however, dispersion will be estimated by the summary() method, presumably leading to the value of 1271 you report. The underlying variance function would then be VAR[y] = 1271 * mu + 12.71 * mu^2> Fitting a quasipoisson model also yields a large dispersion parameter > (1300). The models also produce different coefficients and P-values, which > is disconcerting.This implies yet another variance function, namely VAR[y] = 1300 * mu If you want to get essentially the same result as summary(mod) from using glm+negative.binomial you can do mod0 <- glm(count ~ year + season + time + depth, data = dat, family = negative.binomial(theta = mod$theta), control = glm.control(maxit = 100)) summary(mod0, dispersion = 1) (Note that link = "log" is not needed.)> What am I doing wrong here? I've read through the help sections > (?negative.binomial,?glm.nb, and ?glm) but did not find any answers.I guess the authors of the "MASS" package would say that the software accompanies a book which should be consulted...and they would be right. Reading the corresponding sections in MASS (the book) will surely be helpful. Consulting McCullagh & Nelder about some general GLM theory can't hurt either. Finally, some extended modeling (including excess zeros) is available in http://www.jstatsoft.org/v27/i08/ (Apologies for advertising this twice on the same day.) hth, Z> Any help and/or input is greatly appreciated! > Daniel. > -- > View this message in context: http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Bill.Venables at csiro.au
2009-Aug-14 00:36 UTC
[R] glm.nb versus glm estimation of theta.
Whoa! Just hang on a minute. theta is NOT the dispersion parameter. Under the NB model, the variance of an observation is mu+mu^2/theta, so that's how theta enters the picture. The smaller theta is the larger the variance. glm(..., family = negative.binomial(theta = <something>), ...) will NOT estimate theta. It will estimate a dispersion parameter, and if you get your <something> wrong, it could be a silly estimate. One would hope the estimate of the dispersion parameter would be close to unity. With your data try mod2 <- glm(count ~ year + season + time + depth, family = negative.binomial(theta=mod$theta),link = "log", data = dat, control = glm.control(maxit=100, trace=T)) You should get the same estimates as you got with the negative binomial model (though standard errors, &c, will differ because you have cheated on that), and your dispersion parameter should be close to (though not necessarily equal to) unity. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of hesicaia [dboyce at dal.ca] Sent: 14 August 2009 04:31 To: r-help at r-project.org Subject: [R] glm.nb versus glm estimation of theta. Hello, I have a question regarding estimation of the dispersion parameter (theta) for generalized linear models with the negative binomial error structure. As I understand, there are two main methods to fit glm's using the nb error structure in R: glm.nb() or glm() with the negative.binomial(theta) family. Both functions are implemented through the MASS library. Fitting the model using these two functions to the same data produces much different results for me in terms of estimated theta and the coefficients, and I am not sure why. the following model: mod<-glm.nb(count ~ year + season + time + depth, link="log",data=dat,control=glm.control(maxit=100,trace=T)) estimates theta as 0.0109 while the following model: mod2<-glm(count ~ year + season + time + depth, family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T)) will not accept 0.0109 as theta and instead estimates it as 1271 (these are fisheries catch data and so are very overdispersed). Fitting a quasipoisson model also yields a large dispersion parameter (1300). The models also produce different coefficients and P-values, which is disconcerting. What am I doing wrong here? I've read through the help sections (?negative.binomial,?glm.nb, and ?glm) but did not find any answers. Any help and/or input is greatly appreciated! Daniel. -- View this message in context: http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.