I have been reviewing GLM and LMER to sharpen up some course notes and
would like to ask for your advice.
1. Is there a test that would be used to check whether a particular
functional form--say Gaussian, Gamma, or Inverse Gaussian, is "more
appropriate" in a Generalized Linear Model? A theoretical reason to
choose one over the other is enough for me, but I've got some
skeptical students who want a "significance test." I've done some
simulation tests and find that the AIC is smaller if you guess the
correct distribution, but as long as you have the link and the
functional form of the right hand side correct, then your parameter
estimates are not horribly affected by the choice of the distribution.
If your data is Gamma, for example, the estimates from GLM-Normal are
just about as good. Do you think so?
A draft of my notes is on my new web site:
http://pj.freefaculty.org/ps707/GLM/GammaGLM-01.pdf
If you look down to sections 7 and 8, (or the graphs on p. 9 and p.
17), you might see what I mean. I still have work to do on this and
if you have pointers, please email me.
2. Suppose you fix a random effects model with lmer and you wonder if
the fit is "better" (I suppose in the sense of anova) than a glm that
is the same, except for the lack of a random effect.
Is there a test for that? I've been reading the thread in the r-help
list from December, 2005, in which people are asking for a standard
error or confidence interval for a variance component and the answer
seems to be that such a thing is not statistically meaningful.
Would you consider an example using survey data about attitudes toward
the police in US cities? The variable PLACE indicates which city
people live in. We wonder if there should be a random intercept to
represent diversity across cities. I want something that works like
anova() might. What to do?
> gl1 <- glm ( RE2TRCOP~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +GRPETH +
TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK,
family=binomial(link=logit),data=eldat)
> summary(gl1)
Call:
glm(formula = RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +
GRPETH + TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL +
RCRIM3YR + RPCTBLAK, family = binomial(link = logit), data = eldat)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5129 -0.5515 -0.4072 -0.2807 2.7975
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.549801 0.666762 -2.324 0.020106 *
AGE -0.020215 0.005423 -3.728 0.000193 ***
POLINT -0.148035 0.075748 -1.954 0.050666 .
HAPPY -0.309656 0.114871 -2.696 0.007024 **
EFFCOM -0.150475 0.088664 -1.697 0.089670 .
TRNEI 0.376409 0.091701 4.105 4.05e-05 ***
GRPETH 0.593949 0.205169 2.895 0.003792 **
TRBLK 0.575473 0.114896 5.009 5.48e-07 ***
BLKCHIEF -0.233197 0.188521 -1.237 0.216093
RCOPDIFF 0.051088 0.150306 0.340 0.733938
RCOPRATE 0.169789 0.097540 1.741 0.081733 .
RCOPBKIL 0.147662 0.089269 1.654 0.098102 .
RCRIM3YR 0.189701 0.097590 1.944 0.051913 .
RPCTBLAK -0.504626 0.175897 -2.869 0.004119 **
---
Signif. codes: 0 $-1??***?? 0.001 ??**?? 0.01 ??*?? 0.05 ??.?? 0.1 ?? ?? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1364.4 on 1694 degrees of freedom
Residual deviance: 1197.5 on 1681 degrees of freedom
AIC: 1225.5
Number of Fisher Scoring iterations: 5
me1 <- lmer( RE2TRCOP~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +GRPETH
+ TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK + (1 | PLACE) , family=binomial(link=logit),data=eldat,
method="Laplace")
Loading required package: lattice> summary(me1)
> Generalized linear mixed model fit using Laplace
Formula: RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI + GRPETH +
TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK + (1 | PLACE)
Data: eldat
Family: binomial(logit link)
AIC BIC logLik deviance
1227.446 1308.977 -598.7229 1197.446
Random effects:
Groups Name Variance Std.Dev.
PLACE (Intercept) 0.0056343 0.075062
# of obs: 1695, groups: PLACE, 18
Estimated scale (compare to 1) 1.014432
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.5478617 0.6708198 -2.3074 0.0210315 *
AGE -0.0202473 0.0054271 -3.7308 0.0001909 ***
POLINT -0.1481644 0.0757952 -1.9548 0.0506069 .
HAPPY -0.3102757 0.1149718 -2.6987 0.0069609 **
EFFCOM -0.1501713 0.0887251 -1.6925 0.0905419 .
TRNEI 0.3757690 0.0918107 4.0929 4.261e-05 ***
GRPETH 0.5903173 0.2053184 2.8751 0.0040386 **
TRBLK 0.5754894 0.1149954 5.0045 5.602e-07 ***
BLKCHIEF -0.2275911 0.1935195 -1.1761 0.2395698
RCOPDIFF 0.0483413 0.1541093 0.3137 0.7537626
RCOPRATE 0.1681531 0.1003416 1.6758 0.0937760 .
RCOPBKIL 0.1437347 0.0924614 1.5545 0.1200562
RCRIM3YR 0.1825188 0.1013626 1.8007 0.0717576 .
RPCTBLAK -0.4959112 0.1814734 -2.7327 0.0062819 **
---
> anova(gl1,me1)
Analysis of Deviance Table
Model: binomial, link: logit
Response: RE2TRCOP
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 1694 1364.45
AGE 1 26.26 1693 1338.19
POLINT 1 14.18 1692 1324.02
HAPPY 1 25.80 1691 1298.21
EFFCOM 1 5.08 1690 1293.13
TRNEI 1 37.65 1689 1255.49
GRPETH 1 8.02 1688 1247.46
TRBLK 1 26.87 1687 1220.59
BLKCHIEF 1 1.36 1686 1219.23
RCOPDIFF 1 10.29 1685 1208.94
RCOPRATE 1 0.35 1684 1208.59
RCOPBKIL 1 0.71 1683 1207.88
RCRIM3YR 1 1.77 1682 1206.12
RPCTBLAK 1 8.63 1681 1197.48>
Are there tests that can help you decide if the distribution of the
dependent variable is Gamma or inverse Gaussian, or Gaussian for that
matter? Is one supposed to use the AIC from the glm estimates to
decide which is best? The deviance is not comparable across families,
right?
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas