thr3ads.net - R devel - [Rd] quick question about glm() example [May 2013]

If this information is useful, please help other people find it:
Share via:

Ben Bolker

2013-May-29 17:58 UTC

[Rd] quick question about glm() example

I don't have a copy of Dobson (1990) from which the glm.D93 example is
taken in example("glm"), but I'm strongly suspecting that these
are
made-up data rather than real data; the means of the responses within
each treatment are _identical_ (equal to 16 2/3), so two of the
parameters are estimated as being zero (within machine tolerance).  (At
this moment I don't understand why the means rather than the geometric
means being identical is what matters ...)

  This therefore feels like a somewhat strange (i.e. non-generic)
example to use (although I know it's been that way for a long time).

  Perhaps more importantly, the example illustrates the use of glm()
without a data= argument, which I think should not generally be
encouraged.  I would prefer to see the example written as:

d.AD <- data.frame(
 counts=c(18,17,15,20,10,20,25,13,12),
 outcome=gl(3,1,9),
 treatment=gl(3,3))

print(d.AD)

 glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
       data=d.AD)

  Thoughts?

  Ben Bolker

peter dalgaard

2013-May-29 19:05 UTC

head link

[Rd] quick question about glm() example

On May 29, 2013, at 19:58 , Ben Bolker wrote:
> 
>  I don't have a copy of Dobson (1990) from which the glm.D93 example is
> taken in example("glm"), but I'm strongly suspecting that
these are
> made-up data rather than real data; the means of the responses within
> each treatment are _identical_ (equal to 16 2/3), so two of the
> parameters are estimated as being zero (within machine tolerance).  (At
> this moment I don't understand why the means rather than the geometric
> means being identical is what matters ...)
I don't have it to hand either (although I could get to it tomorrow -- a
colleague is teaching from it). However, I don't think there's anything
particularly fake about the data, but they do seem to come from a designed
trial, in which 3 groups of 50 people are assigned different treatments and a
3-level outcome is recorded.

So in reality, it is a model for three multinomials, but there's a standard
"trick" that this has likelihood inference equivalent to that of a
multiplicative Poisson model. The likelihood for the Poisson model factorizes
into the marginal distribution of the treatment totals and the conditional
distribution of the table given the totals. The latter is the set of
multinomials. If you ensure a suitable separation of parameters and that the
model for the marginals is saturated, you end up with likelihood inference for
the conditional distributions. (This may require some pen-and-paper work to
become intelligible...)

The ML parameters in a multiplicative Poisson model (lambda_ij=alpha_i*beta_j)
are proportional to the row and column means, and in the GLM framework
parameters are the log of that, subject to some resolution of the
overparametrization issue.

The outcome parameters are likely of no interest, since they describe log-ratios
of frequencies of the outcome categories, so the whole thing is about the
goodness of fit, as measured by the

Residual deviance:  5.1291  on 4  degrees of freedom

which is the LRT equivalent of the Pearson Chi-square for the table:
> chisq.test(matrix(d.AD$counts,3))
	Pearson's Chi-squared test

data:  matrix(d.AD$counts, 3)
X-squared = 5.1732, df = 4, p-value = 0.27

In fact, Rao's score test for an interaction term is exactly the Pearson
Chisquare:
> anova(glm.D93, update(glm.D93,~outcome*treatment), test="Rao")Analysis of Deviance Table

Model 1: counts ~ outcome + treatment
Model 2: counts ~ outcome + treatment + outcome:treatment
  Resid. Df Resid. Dev Df Deviance    Rao Pr(>Chi)
1         4     5.1291                            
2         0     0.0000  4   5.1291 5.1732     0.27


> 
>  This therefore feels like a somewhat strange (i.e. non-generic)
> example to use (although I know it's been that way for a long time).
> 
>  Perhaps more importantly, the example illustrates the use of glm()
> without a data= argument, which I think should not generally be
> encouraged.  I would prefer to see the example written as:
> 
> d.AD <- data.frame(
> counts=c(18,17,15,20,10,20,25,13,12),
> outcome=gl(3,1,9),
> treatment=gl(3,3))
> 
> print(d.AD)
> 
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
>       data=d.AD)
> 
>  Thoughts?
> 
>  Ben Bolker
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Maybe Matching Threads

Search for more possibly parallel threads

R devel - May 2013 - quick question about glm() example

[Rd] quick question about glm() example

[Rd] quick question about glm() example

Maybe Matching Threads