peter.fledelius@wgo.royalsun.com
2003-Jan-16 16:53 UTC
[R] Overdispersed poisson - negative observation
Dear R users I have been looking for functions that can deal with overdispersed poisson models. Some (one) of the observations are negative. According to actuarial literature (England & Verall, Stochastic Claims Reserving in General Insurance , Institute of Actiuaries 2002) this can be handled through the use of quasi likelihoods instead of normal likelihoods. The presence of negatives is not normal in a poisson model, however, we see them frequently in this type of data, and we would like to be able to fit the model anyway. At the moment R is complaining about negative values and the link function = log. My code looks like this. Do any of you know if this problem can be solved in R? Any suggestions are welcomed. Kind regards, Peter Fledelius (new R user) *********** Code ************ paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172, 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535, 3410, 5582, 4881, 2268, 2594, 3479, 649, 603, 5655, 5900, 4211, 5500, 2159, 2658, 984, 1092, 8473, 6271, 6333, 3786, 225, 1513, 4932, 5257, 1233, 2917, 557, 3463, 6956, 1368, 1351, 5596, 6165, 3133, 2262, 2063) alpha <- factor(c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4, 5,5,5,5,5,5, 6,6,6,6,6, 7,7,7,7, 8,8,8, 9,9, 10)) beta <- factor(c(1,2,3,4,5,6,7,8,9,10, 1,2,3,4,5,6,7,8,9, 1,2,3,4,5,6,7,8, 1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5, 1,2,3,4, 1,2,3, 1,2, 1)) d.AD <- data.frame(paym, alpha, beta) glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson()) glm.qD93 ************ Code end ***************
Roughtly speaking, your code is correct, but you can not fit your data with negative value:> glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())Error in eval(expr, envir, enclos) : Negative values not allowed for the quasiPoisson family Anyway, in general, quasi-likelihood is a way to handle (low) overdispersion in count data: for modest or large amount of overdispersion you could use negative binomial models, namely glm.nb() in MASS or gnlr() in the Jim Lindsey packages. best, vito ----- Original Message ----- From: <peter.fledelius at wgo.royalsun.com> To: <r-help at stat.math.ethz.ch> Sent: Thursday, January 16, 2003 4:45 PM Subject: [R] Overdispersed poisson - negative observation> Dear R users > > I have been looking for functions that can deal with overdispersed poisson > models. Some (one) of the observations are negative. According toactuarial> literature (England & Verall, Stochastic Claims Reserving in General > Insurance , Institute of Actiuaries 2002) this can be handled through the > use of quasi likelihoods instead of normal likelihoods. The presence of > negatives is not normal in a poisson model, however, we see themfrequently> in this type of data, and we would like to be able to fit the modelanyway.> > At the moment R is complaining about negative values and the link function > = log. > > My code looks like this. Do any of you know if this problem can be solved > in R? Any suggestions are welcomed. > > Kind regards, > > Peter Fledelius (new R user) > > *********** Code ************ > paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172, > 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535, > 3410, 5582, 4881, 2268, 2594, 3479, 649, 603, > 5655, 5900, 4211, 5500, 2159, 2658, 984, > 1092, 8473, 6271, 6333, 3786, 225, > 1513, 4932, 5257, 1233, 2917, > 557, 3463, 6956, 1368, > 1351, 5596, 6165, > 3133, 2262, > 2063) > alpha <- factor(c(1,1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4, > 5,5,5,5,5,5, > 6,6,6,6,6, > 7,7,7,7, > 8,8,8, > 9,9, > 10)) > beta <- factor(c(1,2,3,4,5,6,7,8,9,10, > 1,2,3,4,5,6,7,8,9, > 1,2,3,4,5,6,7,8, > 1,2,3,4,5,6,7, > 1,2,3,4,5,6, > 1,2,3,4,5, > 1,2,3,4, > 1,2,3, > 1,2, > 1)) > d.AD <- data.frame(paym, alpha, beta) > glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson()) > glm.qD93 > ************ Code end *************** > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Peter You are right to suppose that negative observations should not be a problem for a quasi-Poisson fit (though negative *fitted* values would be, in a log-linear model). The problem is that the "quasipoisson" function is overly restrictive, in requiring non-negativity of the y variable. (commenting out the error trap is not enough to solve the problem). There are two places where "quasipoisson" has problems: dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y = 0, 1, y/mu)) - (y - mu)) fails for negative values of y, while initialize <- expression({ if (any(y < 0)) stop(paste("Negative values not allowed for", "the quasiPoisson family")) n <- rep(1, nobs) mustart <- y + 0.1 }) would (without the error trap) return negative initial values for means, which won't work with a log link. The first problem is the hard one: the deviance isn't defined when y is negative. But an alternative is to use the Pearson X^2 statistic, which is defined for all values of y (and makes more sense anyway for quasi-likelhood models -- it's what you use to determine the dispersion constant). The second problem is easy to fix: for example, just take as starting values the global mean (corresponding to zero-valued effects in the model). I often find that this is a good starting point anyway -- better than the default choice, which typically is not in the model space (even when there are no negative values of y). I give below an amended "quasipoisson" which seems to work, ie it fits the model by the quasi-likelihood method with variance proportional to mean. It makes only the two changes just described. A *slightly* unsatisfactory aspect is that quantities reported as "deviance" are in fact Pearson X^2 statistics. But no other choice really makes sense there. David quasipoisson <- function (link = "log") ## Amended by David Firth, 2003.01.16, at points labelled ### ## to cope with negative y values ## ## Computes Pearson X^2 rather than Poisson deviance ## ## Starting values are all equal to the global mean { linktemp <- substitute(link) if (!is.character(linktemp)) { linktemp <- deparse(linktemp) if (linktemp == "link") linktemp <- eval(link) } if (any(linktemp == c("log", "identity", "sqrt"))) stats <- make.link(linktemp) else stop(paste(linktemp, "link not available for poisson", "family; available links are", "\"identity\", \"log\" and \"sqrt\"")) variance <- function(mu) mu validmu <- function(mu) all(mu > 0) dev.resids <- function(y, mu, wt) wt*(y-mu)^2/mu ### aic <- function(y, n, mu, wt, dev) NA initialize <- expression({ n <- rep(1, nobs) mustart <- rep(mean(y), length(y)) ### }) structure(list(family = "quasipoisson", link = linktemp, linkfun = stats$linkfun, linkinv = stats$linkinv, variance = variance, dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta, initialize = initialize, validmu = validmu, valideta = stats$valideta), class = "family") } On Thursday, Jan 16, 2003, at 15:45 Europe/London, peter.fledelius at wgo.royalsun.com wrote:> Dear R users > > I have been looking for functions that can deal with overdispersed > poisson > models. Some (one) of the observations are negative. According to > actuarial > literature (England & Verall, Stochastic Claims Reserving in General > Insurance , Institute of Actiuaries 2002) this can be handled through > the > use of quasi likelihoods instead of normal likelihoods. The presence of > negatives is not normal in a poisson model, however, we see them > frequently > in this type of data, and we would like to be able to fit the model > anyway. > > At the moment R is complaining about negative values and the link > function > = log. > > My code looks like this. Do any of you know if this problem can be > solved > in R? Any suggestions are welcomed. > > Kind regards, > > Peter Fledelius (new R user) > > *********** Code ************ > paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172, > 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535, > 3410, 5582, 4881, 2268, 2594, 3479, 649, 603, > 5655, 5900, 4211, 5500, 2159, 2658, 984, > 1092, 8473, 6271, 6333, 3786, 225, > 1513, 4932, 5257, 1233, 2917, > 557, 3463, 6956, 1368, > 1351, 5596, 6165, > 3133, 2262, > 2063) > alpha <- factor(c(1,1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4, > 5,5,5,5,5,5, > 6,6,6,6,6, > 7,7,7,7, > 8,8,8, > 9,9, > 10)) > beta <- factor(c(1,2,3,4,5,6,7,8,9,10, > 1,2,3,4,5,6,7,8,9, > 1,2,3,4,5,6,7,8, > 1,2,3,4,5,6,7, > 1,2,3,4,5,6, > 1,2,3,4,5, > 1,2,3,4, > 1,2,3, > 1,2, > 1)) > d.AD <- data.frame(paym, alpha, beta) > glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson()) > glm.qD93 > ************ Code end *************** > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Some other techniques for overdispersed and underdispersed count data are described in: COUNT DATA REGRESSION USING SERIES EXPANSIONS: WITH APPLICATIONS by A. COLIN CAMERON and PER JOHANSSON in JOURNAL OF APPLIED ECONOMETRICS, VOL 12, 203-223 (1997). http://www.econ.uiuc.edu/~econ472/cameron97.pdf peter.fledelius at wgo.royalsun.com wrote:>Dear R users > >I have been looking for functions that can deal with overdispersed poisson >models. Some (one) of the observations are negative. According to actuarial >literature (England & Verall, Stochastic Claims Reserving in General >Insurance , Institute of Actiuaries 2002) this can be handled through the >use of quasi likelihoods instead of normal likelihoods. The presence of >negatives is not normal in a poisson model, however, we see them frequently >in this type of data, and we would like to be able to fit the model anyway. >[...] >