Paul E. Johnson
2003-Dec-04 12:57 UTC
[R] Comparing Negative Binomial Regression in Stata and R. Constants differ?
I looked for examples of count data that might interest the students and found this project about dropout rates in Los Angeles High Schools. It is discussed in the UCLA stats help pages for the Stata users: http://www.ats.ucla.edu/stat/stata/library/count.htm and See: http://www.ats.ucla.edu/stat/stata/library/longutil.htm To replicate those results, I used R's excellent foreign package to bring the lahigh data in, then did poisReg1 <- glm(daysabs~gender+ mathnce+langnce,family=poisson(link=log), data=lahigh) library(MASS) negbinReg1 <- glm.nb(daysabs~gender+ mathnce+langnce,link=log, data=lahigh) The parameter estimates of the coefficients are the just about the same, except for the intercept estimates. Below I pasted in the Negative Binomial results I got from R along with the Stata results that they report. In the Stata output, they report alpha, same as 1/theta from the R glm.nb output. Except for minor differences in standard errors, only the intercept estimates markedly differ. Can anybody explain this? ----------------------------------------------------------- Stata: nbreg daysabs gender mathnce langnce Negative binomial regression Number of obs = 316 LR chi2(3) = 20.74 Prob > chi2 = 0.0001 Log likelihood = -880.87312 Pseudo R2 = 0.0116 ------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- gender | -.4311844 .1396656 -3.087 0.002 -.704924 -.1574448 mathnce | -.001601 .00485 -0.330 0.741 -.0111067 .0079048 langnce | -.0143475 .0055815 -2.571 0.010 -.0252871 -.003408 _cons | 3.147254 .3211669 9.799 0.000 2.517778 3.776729 ---------+-------------------------------------------------------------------- /lnalpha | .2533877 .0955362 .0661402 .4406351 ---------+-------------------------------------------------------------------- alpha | 1.288383 .1230871 10.467 0.000 1.068377 1.553694 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chi2(1) = 1334.20 Prob > chi2 = 0.0000 Here is the R glm.nb output: Deviance Residuals: Min 1Q Median 3Q Max -1.9785 -1.0627 -0.4147 0.2865 2.8193 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.716069 0.234174 11.598 < 2e-16 *** gendermale -0.431185 0.139516 -3.091 0.00200 ** mathnce -0.001601 0.005300 -0.302 0.76259 langnce -0.014348 0.005372 -2.671 0.00756 ** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for Negative Binomial(0.7762) family taken to be 1) Null deviance: 378.43 on 315 degrees of freedom Residual deviance: 356.93 on 312 degrees of freedom AIC: 1771.7 Number of Fisher Scoring iterations: 1 Correlation of Coefficients: (Intercept) gendermale mathnce gendermale -0.40 mathnce -0.28 -0.09 langnce -0.43 0.19 -0.69 Theta: 0.7762 Std. Err.: 0.0742 2 x log-likelihood: -1761.7460 ---------------------------------------------------------- -- Paul E. Johnson email: pauljohn at ukans.edu Dept. of Political Science http://lark.cc.ukans.edu/~pauljohn University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66045 FAX: (785) 864-5700
Prof Brian Ripley
2003-Dec-04 16:16 UTC
[R] Comparing Negative Binomial Regression in Stata and R. Constants differ?
On Thu, 4 Dec 2003, Paul E. Johnson wrote:> I looked for examples of count data that might interest the students and > found this project about dropout rates in Los Angeles High Schools. It > is discussed in the UCLA stats help pages for the Stata users: > http://www.ats.ucla.edu/stat/stata/library/count.htm > and > See: http://www.ats.ucla.edu/stat/stata/library/longutil.htm > > To replicate those results, I used R's excellent foreign package to > bring the lahigh data in, then did > poisReg1 <- glm(daysabs~gender+ > mathnce+langnce,family=poisson(link=log), data=lahigh) > library(MASS) > negbinReg1 <- glm.nb(daysabs~gender+ mathnce+langnce,link=log, data=lahigh) > > The parameter estimates of the coefficients are the just about the same, > except for the intercept estimates. Below I pasted in the Negative > Binomial results I got from R along with the Stata results that theyActually, from V&R's MASS package, excellent or otherwise but worthy of credit!> report. In the Stata output, they report alpha, same as 1/theta from > the R glm.nb output. Except for minor differences in standard errors, > only the intercept estimates markedly differ.What are the variable codings used? Intercepts depend on coding of factors, and that applies to any sort of regression. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595