It seems that MASS suggest to judge on the basis of sum(residuals(mode,type="pearson"))/df.residual(mode). My question: Is there any rule of thumb of the cutpoiont value? The paper "On the Use of Corrections for Overdispersion" suggests overdispersion exists if the deviance is at least twice the number of degrees of freedom. Are there any further hints? Thanks. -- Ronggui Huang Department of Sociology Fudan University, Shanghai, China
Achim Zeileis
2007-Apr-10 07:11 UTC
[R] When to use quasipoisson instead of poisson family
On Tue, 10 Apr 2007, ronggui wrote:> It seems that MASS suggest to judge on the basis of > sum(residuals(mode,type="pearson"))/df.residual(mode). My question: Is > there any rule of thumb of the cutpoiont value? > > The paper "On the Use of Corrections for Overdispersion" suggests > overdispersion exists if the deviance is at least twice the number of > degrees of freedom.There are also formal tests for over-dispersion. I've implemented one for a package which is not yet on CRAN (code/docs attached), another one is implemented in odTest() in package "pscl". The latter also contains further count data regression models which can deal with both over-dispersion and excess zeros in count data. A vignette explaining the tools is about to be released. hth, Z> Are there any further hints? Thanks. > > -- > Ronggui Huang > Department of Sociology > Fudan University, Shanghai, China > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-------------- next part -------------- \name{dispersiontest} \alias{dispersiontest} \title{Dispersion Test} \description{ Tests the null hypothesis of equidispersion in Poisson GLMs against the alternative of overdispersion and/or underdispersion. } \usage{ dispersiontest(object, trafo = NULL, alternative = c("greater", "two.sided", "less")) } \arguments{ \item{object}{a fitted Poisson GLM of class \code{"glm"} as fitted by \code{\link{glm}} with family \code{\link{poisson}}.} \item{trafo}{a specification of the alternative (see also details), can be numeric or a (positive) function or \code{NULL} (the default).} \item{alternative}{a character string specifying the alternative hypothesis: \code{"greater"} corresponds to overdispersion, \code{"less"} to underdispersion and \code{"two.sided"} to either one.} } \details{ The standard Poisson GLM models the (conditional) mean \eqn{\mathsf{E}[y] = \mu}{E[y] = mu} which is assumed to be equal to the variance \eqn{\mathsf{VAR}[y] = \mu}{VAR[y] = mu}. \code{dispersiontest} assesses the hypothesis that this assumption holds (equidispersion) against the alternative that the variance is of the form: \deqn{\mathsf{VAR}[y] \quad = \quad \mu \; + \; \alpha \cdot \mathrm{trafo}(\mu).}{VAR[y] = mu + alpha * trafo(mu).} Overdispersion corresponds to \eqn{\alpha > 0}{alpha > 0} and underdispersion to \eqn{\alpha < 0}{alpha < 0}. The coefficient \eqn{\alpha}{alpha} can be estimated by an auxiliary OLS regression and tested with the corresponding t (or z) statistic which is asymptotically standard normal under the null hypothesis. Common specifications of the transformation function \eqn{\mathrm{trafo}}{trafo} are \eqn{\mathrm{trafo}(\mu) = \mu^2}{trafo(mu) = mu^2} or \eqn{\mathrm{trafo}(\mu) = \mu}{trafo(mu) = mu}. The former corresponds to a negative binomial (NB) model with quadratic variance function (called NB2 by Cameron and Trivedi, 2005), the latter to a NB model with linear variance function (called NB1 by Cameron and Trivedi, 2005) or quasi-Poisson model with dispersion parameter, i.e., \deqn{\mathsf{VAR}[y] \quad = \quad (1 + \alpha) \cdot \mu = \mathrm{dispersion} \cdot \mu.}{VAR[y] = (1 + alpha) * mu = dispersion * mu.} By default, for \code{trafo = NULL}, the latter dispersion formulation is used in \code{dispersiontest}. Otherwise, if \code{trafo} is specified, the test is formulated in terms of the parameter \eqn{\alpha}{alpha}. The transformation \code{trafo} can either be specified as a function or an integer corresponding to the function \code{function(x) x^trafo}, such that \code{trafo = 1} and \code{trafo = 2} yield the linear and quadratic formulations respectively. } \value{An object of class \code{"htest"}.} \references{ Cameron, A.C. and Trivedi, P.K. (1990). Regression-based Tests for Overdispersion in the Poisson Model. \emph{Journal of Econometrics}, \bold{46}, 347--364. Cameron, A.C. and Trivedi, P.K. (1998). \emph{Regression Analysis of Count Data}. Cambridge: Cambridge University Press. Cameron, A.C. and Trivedi, P.K. (2005). \emph{Microeconometrics: Methods and Applications}. Cambridge: Cambridge University Press. } \seealso{\code{\link{glm}}, \code{\link{poisson}}, \code{\link[MASS]{glm.nb}}} \examples{ data("RecreationDemand") fm <- glm(trips ~ ., data = RecreationDemand, family = poisson) ## linear specification (in terms of dispersion) dispersiontest(fm) ## linear specification (in terms of alpha) dispersiontest(fm, trafo = 1) ## quadratic specification (in terms of alpha) dispersiontest(fm, trafo = 2) dispersiontest(fm, trafo = function(x) x^2) ## further examples data("DoctorVisits") fm <- glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson) dispersiontest(fm) data("DebTrivedi") fm <- glm(ofp ~ health + age + gender + married + faminc + privins, data = DebTrivedi, family = poisson) dispersiontest(fm) } \keyword{htest} -------------- next part -------------- dispersiontest <- function(object, trafo = NULL, alternative = c("greater", "two.sided", "less")) { if(!inherits(object, "glm") || family(object)$family != "poisson") stop("only Poisson GLMs can be tested") alternative <- match.arg(alternative) otrafo <- trafo if(is.numeric(otrafo)) trafo <- function(x) x^otrafo y <- if(is.null(object$y)) model.response(model.frame(object)) else object$y yhat <- fitted(object) aux <- ((y - yhat)^2 - y)/yhat if(is.null(trafo)) { STAT <- sqrt(length(aux)) * mean(aux)/sd(aux) NVAL <- c(dispersion = 1) EST <- c(dispersion = mean(aux) + 1) } else { auxreg <- lm(aux ~ 0 + I(trafo(yhat)/yhat)) STAT <- as.vector(summary(auxreg)$coef[1,3]) NVAL <- c(alpha = 0) EST <- c(alpha = as.vector(coef(auxreg)[1])) } rval <- list(statistic = c(z = STAT), p.value = switch(alternative, "greater" = pnorm(STAT, lower.tail = FALSE), "two.sided" = pnorm(abs(STAT), lower.tail = FALSE)*2, "less" = pnorm(STAT)), estimate = EST, null.value = NVAL, alternative = alternative, method = switch(alternative, "greater" = "Overdispersion test", "two.sided" = "Dispersion test", "less" = "Underdispersion test"), data.name = deparse(substitute(object))) class(rval) <- "htest" return(rval) }