thr3ads.net - R help - [R] glm.nb - theta, dispersion, and errors [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Eiko Fried

2012-Oct-22 00:23 UTC

[R] glm.nb - theta, dispersion, and errors

I am running 9 negative binomial regressions with count data.

The nine models use 9 different dependent variables - items of a clinical
screening instrument - and use the same set of 5 predictors. Goal is to
find out whether these predictors have differential effects on the items.

Due to various reasons, one being that I want to avoid overfitting models,
I need to employ identical types of models for all 9 regressions.

The problem is that some of my dependent variables are overdispersed,
others are not (parameter estimates gained by using family=quasipoisson).

dispersion p         theta
S1 1.084 0.102 30.0
S2 0.903 0.000 4125.0       PROBLEM
S3 0.997 0.926 3754.0       PROBLEM
S4 0.784 0.000 10283.0     PROBLEM
S5 1.108 0.003 8.6
S6 1.010 0.772 1287.0       PROBLEM
S7 1.228 0.001 1.7
S8 1.222 0.005 0.4
S9 2.120 0.283 1.2

So I thought using maximum likelihood Poisson models would provide wrong
results, and that using negative binomial models would be the best way to
go.

3 models cause the following problem (the ones that are not
overdispersed):> m2.nb <- glm.nb(t0s2 ~ Sex + HisDep + FamHis + ZEFE + ZNeuro, data=data)
>Error in while ((it <- it + 1) < limit && abs(del) > eps) {
:
>missing value where TRUE/FALSE needed
What happens is that, looking at theta iterations, values become too large,
and thereby infinite, and thereby NaN.
This is an example for model 2 above, with a theta of 4125 and a
significantly smaller dispersion value than 1 of .903 (p<.000):
>theta.ml: iter76 theta =1.34118e+16
>theta.ml: iter77 theta =3.0232e+16
>theta.ml: iter78 theta =-Inf
>theta.ml: iter79 theta =NaN
One symptom causes this problem:
1: In sqrt(1/i) : NaNs produced

How would you recommend to deal with these problems? Are results reliable
although these errors occur?

Again, I have to use a model that fits all 9 models overall best - maybe I
should default to maximum likelihood Poisson?

Thank you
Torvon

	[[alternative HTML version deleted]]

Achim Zeileis

2012-Oct-22 06:38 UTC

head link

[R] glm.nb - theta, dispersion, and errors

On Sun, 21 Oct 2012, Eiko Fried wrote:
> I am running 9 negative binomial regressions with count data.
>
> The nine models use 9 different dependent variables - items of a clinical
> screening instrument - and use the same set of 5 predictors. Goal is to
> find out whether these predictors have differential effects on the items.
>
> Due to various reasons, one being that I want to avoid overfitting models,
> I need to employ identical types of models for all 9 regressions.
>
> The problem is that some of my dependent variables are overdispersed,
> others are not (parameter estimates gained by using family=quasipoisson).
>
> dispersion p         theta
> S1 1.084 0.102 30.0
> S2 0.903 0.000 4125.0       PROBLEM
> S3 0.997 0.926 3754.0       PROBLEM
> S4 0.784 0.000 10283.0     PROBLEM
> S5 1.108 0.003 8.6
> S6 1.010 0.772 1287.0       PROBLEM
> S7 1.228 0.001 1.7
> S8 1.222 0.005 0.4
> S9 2.120 0.283 1.2
>
> So I thought using maximum likelihood Poisson models would provide wrong
> results, and that using negative binomial models would be the best way to
> go.
>
> 3 models cause the following problem (the ones that are not overdispersed):
>> m2.nb <- glm.nb(t0s2 ~ Sex + HisDep + FamHis + ZEFE + ZNeuro,
data=data)
>
>> Error in while ((it <- it + 1) < limit && abs(del) >
eps) { :
>> missing value where TRUE/FALSE needed
>
> What happens is that, looking at theta iterations, values become too large,
> and thereby infinite, and thereby NaN.
> This is an example for model 2 above, with a theta of 4125 and a
> significantly smaller dispersion value than 1 of .903 (p<.000):
>
>> theta.ml: iter76 theta =1.34118e+16
>> theta.ml: iter77 theta =3.0232e+16
>> theta.ml: iter78 theta =-Inf
>> theta.ml: iter79 theta =NaN
>
> One symptom causes this problem:
> 1: In sqrt(1/i) : NaNs produced
>
> How would you recommend to deal with these problems? Are results reliable
> although these errors occur?
>
> Again, I have to use a model that fits all 9 models overall best - maybe I
> should default to maximum likelihood Poisson?
>From the description of your response in earlier e-mails on this list, my feeling is that this is caused by the type of response you have. It is not 
really a count response and might be better modeled by an ordered response 
model. Below I've simulated data from an ordered probit model and then 
pretended they are count data. The result are the symptoms you describe: 
underdispersion, non-convergence for theta in NB. Fitting an ordered 
probit model, avoids these problems for this data and (not surprisingly) 
reasonably recovers the true parameters.

hth,
Z

## random regressor and latent Gaussian variable
set.seed(1)
x <- runif(200)
y0 <- 5 * x + rnorm(200)

## collapse to factor with 5 levels
yf <- cut(y0, c(-Inf, 3, 4, 5, 6, Inf), labels = 0:4)

## pretend levels are count variables
yc <- as.numeric(as.character(yf))

## quasi-Poisson with underdispersion
summary(glm(yc ~ x, family = quasipoisson))

## NB with problems in theta estimation
summary(glm.nb(yc ~ x))

## ordered probit recovers true parameters
summary(polr(yf ~ x, method = "probit", Hess = TRUE))

## compare with the latent linear model
summary(lm(y0 ~ x))

> Thank you
> Torvon
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Oct 2012 - glm.nb - theta, dispersion, and errors

[R] glm.nb - theta, dispersion, and errors

[R] glm.nb - theta, dispersion, and errors

Possibly Parallel Threads