Timothy_Handley at nps.gov
2010-Nov-29  19:34 UTC
[R] accuracy of GLM dispersion parameters
I'm confused as to the trustworthiness of the dispersion parameters reported by glm. Any help or advice would be greatly appreciated. Context: I'm interested in using a fitted GLM to make some predictions. Along with the predicted values, I'd also like to have estimates of variance for each of those predictions. For a Gamma-family model, I believe this can be done as Var[y] = dispersion parameter * predicted value ^ 2. Thus, I'm interested in knowing the dispersion parameter for this fitted model. Specifics: The summary function says that my fitted GLM has a dispersion parameter=15.8. On the other hand, the gamma.dispersion function (MASS) says that the GLM uses a dispersion parameter of 1.86. I could understand some modest difference, as the help for gamma.shape() says that the MASS functions return a more accurate dispersion value than summary(). However, these two numbers differ by a factor of 8, which is quite a lot. Is this normal? Would you folks expect such a large difference? Which value should I trust? R terminal excerpt:> summary(tempglm_g2)Call: glm(formula = precip_sbi ~ precip_oxx + precip_oxx_sq, family = Gamma(link = identity), data = w.combo, start = c(0.1, 0.4, 0.02)) Deviance Residuals: Min 1Q Median 3Q Max -2.99999 -1.63183 -1.00720 0.04878 8.93461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.09236 0.04834 1.911 0.0583 . precip_oxx 0.26848 0.35891 0.748 0.4558 precip_oxx_sq 0.05138 0.13418 0.383 0.7024 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for Gamma family taken to be 15.78978) Null deviance: 528.73 on 130 degrees of freedom Residual deviance: 305.81 on 128 degrees of freedom AIC: -100.33 Number of Fisher Scoring iterations: 5> library(MASS) > gamma.shape(tempglm_g2)Alpha: 0.53807358 SE: 0.05526108> gamma.dispersion(tempglm_g2)[1] 1.858482 Thanks, Tim Handley Research Assistant Channel Islands National Park (Will be working from both CHIS and SAMO) CHIS Phone: 805-658-5759 SAMO Phone: 805-370-2300 x2412
<Timothy_Handley <at> nps.gov> writes:> I'm confused as to the trustworthiness of the dispersion parameters > reported by glm. Any help or advice would be greatly appreciated. >[snip]> Specifics: The summary function says that my fitted GLM has a dispersion > parameter=15.8. On the other hand, the gamma.dispersion function (MASS) > says that the GLM uses a dispersion parameter of 1.86. I could understand > some modest difference, as the help for gamma.shape() says that the MASS > functions return a more accurate dispersion value than summary(). However, > these two numbers differ by a factor of 8, which is quite a lot. Is this > normal? Would you folks expect such a large difference? Which value should > I trust? > > R terminal excerpt: > > > > summary(tempglm_g2) > > Call: > glm(formula = precip_sbi ~ precip_oxx + precip_oxx_sq, family = Gamma(link > = identity), > data = w.combo, start = c(0.1, 0.4, 0.02)) > > Deviance Residuals: > Min 1Q Median 3Q Max > -2.99999 -1.63183 -1.00720 0.04878 8.93461 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.09236 0.04834 1.911 0.0583 . > precip_oxx 0.26848 0.35891 0.748 0.4558 > precip_oxx_sq 0.05138 0.13418 0.383 0.7024 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for Gamma family taken to be 15.78978) > > Null deviance: 528.73 on 130 degrees of freedom > Residual deviance: 305.81 on 128 degrees of freedom > AIC: -100.33 > > Number of Fisher Scoring iterations: 5 > > > library(MASS) > > gamma.shape(tempglm_g2) > > Alpha: 0.53807358 > SE: 0.05526108 > > gamma.dispersion(tempglm_g2) > [1] 1.858482 >I would trust the gamma.dispersion() result more, although I agree that the difference is worrisome. The way to look at this further would be to profile the dispersion parameter. As I recall there isn't such a built in option in MASS (profile.glm only profiles the coefficients), but you may be able to do it *approximately* like this: library(bbmle) m1 <- mle2(precip_sbi ~ dgamma(shape=a,scale=mu/a), parameters=list(mu~precip_oxx+precip_oxx_sq), data=w.combo, start=list(mu=0.1,a=2)) p1 <- profile(m1) plot(p1)
Timothy_Handley at nps.gov
2010-Nov-30  16:54 UTC
[R] accuracy of GLM dispersion parameters
>From Ben Bolker: > I would trust the gamma.dispersion() result more, although I >agree that the difference is worrisome. The way to look at this >further would be to profile the dispersion parameter. As I recall >there isn't such a built in option in MASS (profile.glm only >profiles the coefficients), but you may be able to do it >*approximately* like this: > >library(bbmle) >m1 <- mle2(precip_sbi ~ dgamma(shape=a,scale=mu/a), > parameters=list(mu~precip_oxx+precip_oxx_sq), > data=w.combo, start=list(mu=0.1,a=2)) >p1 <- profile(m1) >plot(p1)Ben: Thanks for the suggestion. I ran the code you sent, and according to the graphical profile, the 99% confidence interval for shape is [.4, .7], corresponding to a 99% interval for the dispersion parameter of [1.4, 2.5]. So your profile tool agrees with gamma.dispersion(). I now feel more somewhat more comfortable about gamma.dispersion(), but I'm still worried by the difference between summary() and gamma.dispersion(). While I have a basic understanding of GLM's, I don't understand why the reported value for dispersion would be 'crude'. Note that the word 'crude' comes from help(gamma.shape) {MASS}. If you or anyone else could help me further understand this issue, I'd greatly appreciate it. Tim Handley Research Assistant Channel Islands National Park (Will be working from both CHIS and SAMO) CHIS Phone: 805-658-5759 (Tue, Wed, Thu) SAMO Phone: 805-370-2300 x2412(Mon, Fri)