peter.fledelius@wgo.royalsun.com
2003-Jan-16 16:53 UTC
[R] Overdispersed poisson - negative observation
Dear R users
I have been looking for functions that can deal with overdispersed poisson
models. Some (one) of the observations are negative. According to actuarial
literature (England & Verall, Stochastic Claims Reserving in General
Insurance , Institute of Actiuaries 2002) this can be handled through the
use of quasi likelihoods instead of normal likelihoods. The presence of
negatives is not normal in a poisson model, however, we see them frequently
in this type of data, and we would like to be able to fit the model anyway.
At the moment R is complaining about negative values and the link function
= log.
My code looks like this. Do any of you know if this problem can be solved
in R? Any suggestions are welcomed.
Kind regards,
Peter Fledelius (new R user)
*********** Code ************
paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172,
106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535,
3410, 5582, 4881, 2268, 2594, 3479, 649, 603,
5655, 5900, 4211, 5500, 2159, 2658, 984,
1092, 8473, 6271, 6333, 3786, 225,
1513, 4932, 5257, 1233, 2917,
557, 3463, 6956, 1368,
1351, 5596, 6165,
3133, 2262,
2063)
alpha <- factor(c(1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,
5,5,5,5,5,5,
6,6,6,6,6,
7,7,7,7,
8,8,8,
9,9,
10))
beta <- factor(c(1,2,3,4,5,6,7,8,9,10,
1,2,3,4,5,6,7,8,9,
1,2,3,4,5,6,7,8,
1,2,3,4,5,6,7,
1,2,3,4,5,6,
1,2,3,4,5,
1,2,3,4,
1,2,3,
1,2,
1))
d.AD <- data.frame(paym, alpha, beta)
glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())
glm.qD93
************ Code end ***************
Roughtly speaking, your code is correct, but you can not fit your data with negative value:> glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())Error in eval(expr, envir, enclos) : Negative values not allowed for the quasiPoisson family Anyway, in general, quasi-likelihood is a way to handle (low) overdispersion in count data: for modest or large amount of overdispersion you could use negative binomial models, namely glm.nb() in MASS or gnlr() in the Jim Lindsey packages. best, vito ----- Original Message ----- From: <peter.fledelius at wgo.royalsun.com> To: <r-help at stat.math.ethz.ch> Sent: Thursday, January 16, 2003 4:45 PM Subject: [R] Overdispersed poisson - negative observation> Dear R users > > I have been looking for functions that can deal with overdispersed poisson > models. Some (one) of the observations are negative. According toactuarial> literature (England & Verall, Stochastic Claims Reserving in General > Insurance , Institute of Actiuaries 2002) this can be handled through the > use of quasi likelihoods instead of normal likelihoods. The presence of > negatives is not normal in a poisson model, however, we see themfrequently> in this type of data, and we would like to be able to fit the modelanyway.> > At the moment R is complaining about negative values and the link function > = log. > > My code looks like this. Do any of you know if this problem can be solved > in R? Any suggestions are welcomed. > > Kind regards, > > Peter Fledelius (new R user) > > *********** Code ************ > paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172, > 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535, > 3410, 5582, 4881, 2268, 2594, 3479, 649, 603, > 5655, 5900, 4211, 5500, 2159, 2658, 984, > 1092, 8473, 6271, 6333, 3786, 225, > 1513, 4932, 5257, 1233, 2917, > 557, 3463, 6956, 1368, > 1351, 5596, 6165, > 3133, 2262, > 2063) > alpha <- factor(c(1,1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4, > 5,5,5,5,5,5, > 6,6,6,6,6, > 7,7,7,7, > 8,8,8, > 9,9, > 10)) > beta <- factor(c(1,2,3,4,5,6,7,8,9,10, > 1,2,3,4,5,6,7,8,9, > 1,2,3,4,5,6,7,8, > 1,2,3,4,5,6,7, > 1,2,3,4,5,6, > 1,2,3,4,5, > 1,2,3,4, > 1,2,3, > 1,2, > 1)) > d.AD <- data.frame(paym, alpha, beta) > glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson()) > glm.qD93 > ************ Code end *************** > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Peter
You are right to suppose that negative observations should not be a
problem for a quasi-Poisson fit (though negative *fitted* values would
be, in a log-linear model).
The problem is that the "quasipoisson" function is overly restrictive,
in requiring non-negativity of the y variable. (commenting out the
error trap is not enough to solve the problem). There are two places
where "quasipoisson" has problems:
dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y =
0, 1, y/mu)) - (y - mu))
fails for negative values of y, while
initialize <- expression({
if (any(y < 0)) stop(paste("Negative values not allowed
for",
"the quasiPoisson family"))
n <- rep(1, nobs)
mustart <- y + 0.1
})
would (without the error trap) return negative initial values for
means, which won't work with a log link.
The first problem is the hard one: the deviance isn't defined when y is
negative. But an alternative is to use the Pearson X^2 statistic,
which is defined for all values of y (and makes more sense anyway for
quasi-likelhood models -- it's what you use to determine the dispersion
constant).
The second problem is easy to fix: for example, just take as starting
values the global mean (corresponding to zero-valued effects in the
model). I often find that this is a good starting point anyway --
better than the default choice, which typically is not in the model
space (even when there are no negative values of y).
I give below an amended "quasipoisson" which seems to work, ie it fits
the model by the quasi-likelihood method with variance proportional to
mean. It makes only the two changes just described. A *slightly*
unsatisfactory aspect is that quantities reported as "deviance" are in
fact Pearson X^2 statistics. But no other choice really makes sense
there.
David
quasipoisson <- function (link = "log")
## Amended by David Firth, 2003.01.16, at points labelled ###
## to cope with negative y values
##
## Computes Pearson X^2 rather than Poisson deviance
##
## Starting values are all equal to the global mean
{
linktemp <- substitute(link)
if (!is.character(linktemp)) {
linktemp <- deparse(linktemp)
if (linktemp == "link")
linktemp <- eval(link)
}
if (any(linktemp == c("log", "identity",
"sqrt")))
stats <- make.link(linktemp)
else stop(paste(linktemp, "link not available for poisson",
"family; available links are", "\"identity\",
\"log\" and
\"sqrt\""))
variance <- function(mu) mu
validmu <- function(mu) all(mu > 0)
dev.resids <- function(y, mu, wt) wt*(y-mu)^2/mu ###
aic <- function(y, n, mu, wt, dev) NA
initialize <- expression({
n <- rep(1, nobs)
mustart <- rep(mean(y), length(y)) ###
})
structure(list(family = "quasipoisson", link = linktemp,
linkfun = stats$linkfun, linkinv = stats$linkinv, variance =
variance,
dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta,
initialize = initialize, validmu = validmu, valideta =
stats$valideta),
class = "family")
}
On Thursday, Jan 16, 2003, at 15:45 Europe/London,
peter.fledelius at wgo.royalsun.com wrote:
> Dear R users
>
> I have been looking for functions that can deal with overdispersed
> poisson
> models. Some (one) of the observations are negative. According to
> actuarial
> literature (England & Verall, Stochastic Claims Reserving in General
> Insurance , Institute of Actiuaries 2002) this can be handled through
> the
> use of quasi likelihoods instead of normal likelihoods. The presence of
> negatives is not normal in a poisson model, however, we see them
> frequently
> in this type of data, and we would like to be able to fit the model
> anyway.
>
> At the moment R is complaining about negative values and the link
> function
> = log.
>
> My code looks like this. Do any of you know if this problem can be
> solved
> in R? Any suggestions are welcomed.
>
> Kind regards,
>
> Peter Fledelius (new R user)
>
> *********** Code ************
> paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172,
> 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535,
> 3410, 5582, 4881, 2268, 2594, 3479, 649, 603,
> 5655, 5900, 4211, 5500, 2159, 2658, 984,
> 1092, 8473, 6271, 6333, 3786, 225,
> 1513, 4932, 5257, 1233, 2917,
> 557, 3463, 6956, 1368,
> 1351, 5596, 6165,
> 3133, 2262,
> 2063)
> alpha <- factor(c(1,1,1,1,1,1,1,1,1,1,
> 2,2,2,2,2,2,2,2,2,
> 3,3,3,3,3,3,3,3,
> 4,4,4,4,4,4,4,
> 5,5,5,5,5,5,
> 6,6,6,6,6,
> 7,7,7,7,
> 8,8,8,
> 9,9,
> 10))
> beta <- factor(c(1,2,3,4,5,6,7,8,9,10,
> 1,2,3,4,5,6,7,8,9,
> 1,2,3,4,5,6,7,8,
> 1,2,3,4,5,6,7,
> 1,2,3,4,5,6,
> 1,2,3,4,5,
> 1,2,3,4,
> 1,2,3,
> 1,2,
> 1))
> d.AD <- data.frame(paym, alpha, beta)
> glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())
> glm.qD93
> ************ Code end ***************
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
Some other techniques for overdispersed and underdispersed count data
are described in:
COUNT DATA REGRESSION USING SERIES EXPANSIONS: WITH APPLICATIONS
by A. COLIN CAMERON and PER JOHANSSON
in JOURNAL OF APPLIED ECONOMETRICS, VOL 12, 203-223 (1997).
http://www.econ.uiuc.edu/~econ472/cameron97.pdf
peter.fledelius at wgo.royalsun.com wrote:
>Dear R users
>
>I have been looking for functions that can deal with overdispersed poisson
>models. Some (one) of the observations are negative. According to actuarial
>literature (England & Verall, Stochastic Claims Reserving in General
>Insurance , Institute of Actiuaries 2002) this can be handled through the
>use of quasi likelihoods instead of normal likelihoods. The presence of
>negatives is not normal in a poisson model, however, we see them frequently
>in this type of data, and we would like to be able to fit the model anyway.
>[...]
>