No modification is required. The standard way in S to handle offsets is
via the offset() function, and that works in glm.nb. The offset argument
to R's glm is unnecessary.
See ?Insurance and try
glm.nb(Claims ~ District + Group + Age + offset(log(Holders)),data =
Insurance)
(which is not over-dispersed and so gives some warnings).
On Mon, 24 Mar 2003, Ross Nelson wrote:
> I would like to know if it is possible to perform negative binomial
> regression with rate data (incidence density) using the glm.nb (in
> MASS) function.
>
> I used the poisson regression glm call to assess the count of injuries
> across census tracts. The glm request was adjusted to handle the data
> as rates using the offset parameter since the population of census
> tracts can vary by a factor of three.
>
> eg. Call:
> glm(formula = inj ~ lp + rdm, family = poisson(), data = ww,
> offset = log(pop))
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -17.2779 -2.6034 -0.4519 2.0837 16.9275
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.11593 0.01482 -75.290 < 2e-16 ***
> lp2 0.11569 0.01477 7.835 4.70e-15 ***
> lp3 0.02374 0.01763 1.346 0.178
> lp4 0.17777 0.01922 9.248 < 2e-16 ***
> rdm2 -0.08810 0.01747 -5.044 4.57e-07 ***
> rdm3 0.08750 0.01533 5.706 1.15e-08 ***
> rdm4 0.10513 0.01518 6.925 4.35e-12 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
>
>
> inj and pop are interval, while lp and rdm are categorical.
>
>
> A test of the dispersion indicates that the data is over dispersed, and
> thus that an alternative distribution should be used.
>
> I am not sure, however, if or how to modify the glm.nb to handle this
> situation.
>
> glm.nb(formula, ..., init.theta, link = log)
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595