On May 29, 2013, at 19:58 , Ben Bolker wrote:
>
> I don't have a copy of Dobson (1990) from which the glm.D93 example is
> taken in example("glm"), but I'm strongly suspecting that
these are
> made-up data rather than real data; the means of the responses within
> each treatment are _identical_ (equal to 16 2/3), so two of the
> parameters are estimated as being zero (within machine tolerance). (At
> this moment I don't understand why the means rather than the geometric
> means being identical is what matters ...)
I don't have it to hand either (although I could get to it tomorrow -- a
colleague is teaching from it). However, I don't think there's anything
particularly fake about the data, but they do seem to come from a designed
trial, in which 3 groups of 50 people are assigned different treatments and a
3-level outcome is recorded.
So in reality, it is a model for three multinomials, but there's a standard
"trick" that this has likelihood inference equivalent to that of a
multiplicative Poisson model. The likelihood for the Poisson model factorizes
into the marginal distribution of the treatment totals and the conditional
distribution of the table given the totals. The latter is the set of
multinomials. If you ensure a suitable separation of parameters and that the
model for the marginals is saturated, you end up with likelihood inference for
the conditional distributions. (This may require some pen-and-paper work to
become intelligible...)
The ML parameters in a multiplicative Poisson model (lambda_ij=alpha_i*beta_j)
are proportional to the row and column means, and in the GLM framework
parameters are the log of that, subject to some resolution of the
overparametrization issue.
The outcome parameters are likely of no interest, since they describe log-ratios
of frequencies of the outcome categories, so the whole thing is about the
goodness of fit, as measured by the
Residual deviance: 5.1291 on 4 degrees of freedom
which is the LRT equivalent of the Pearson Chi-square for the table:
> chisq.test(matrix(d.AD$counts,3))
Pearson's Chi-squared test
data: matrix(d.AD$counts, 3)
X-squared = 5.1732, df = 4, p-value = 0.27
In fact, Rao's score test for an interaction term is exactly the Pearson
Chisquare:
> anova(glm.D93, update(glm.D93,~outcome*treatment), test="Rao")
Analysis of Deviance Table
Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome + treatment + outcome:treatment
Resid. Df Resid. Dev Df Deviance Rao Pr(>Chi)
1 4 5.1291
2 0 0.0000 4 5.1291 5.1732 0.27
>
> This therefore feels like a somewhat strange (i.e. non-generic)
> example to use (although I know it's been that way for a long time).
>
> Perhaps more importantly, the example illustrates the use of glm()
> without a data= argument, which I think should not generally be
> encouraged. I would prefer to see the example written as:
>
> d.AD <- data.frame(
> counts=c(18,17,15,20,10,20,25,13,12),
> outcome=gl(3,1,9),
> treatment=gl(3,3))
>
> print(d.AD)
>
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
> data=d.AD)
>
> Thoughts?
>
> Ben Bolker
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com