thr3ads.net - R help - [R] glm specification where response is a 2col matrix [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Uwe Ligges

2011-Jan-11 17:42 UTC

[R] glm specification where response is a 2col matrix

Hi,

when I apply a glm() model in two ways,
first with the response in a two column matrix specification with 
successes and failures

y <- matrix(c(
     5, 1,
     3, 3,
     2, 2,
     0, 4), ncol=2, byrow=TRUE)

X <- data.frame(x1 = factor(c(1,1,0,0)),
                 x2 = factor(c(0,1,0,1)))

glm(y ~ x1 + x2, data = X, family="binomial")


second with a model matrix that full rows (i.e. has as many rows as real 
observations) and represents identical data:


Xf <- data.frame(x1 = factor(rep(c(1,1,0,0), rowSums(y))),
                  x2 = factor(rep(c(0,1,0,1), rowSums(y))))
yf <- factor(rep(rep(0:1, 4), t(y)))

glm(yf ~ x1 + x2, data = Xf, family="binomial")


we will find that the number of degrees of freedom and the AIC etc. 
differ --  I'd expect them to be identical (as the coefficient estimates 
and such things are).

maybe I am confused tonight, hence I do not file it as a bug report 
right away and wait to be enlightened ...


Thanks and best wishes,
Uwe

Prof Brian Ripley

2011-Jan-11 18:01 UTC

head link

[R] glm specification where response is a 2col matrix

Your first model is a binomial glm witb 4 observations of 6,6,4,4 
trials.

Your second model is a Bernoulli glm with 20 observations of one trial 
each.

The saturated models are different, as are the likelihoods 
(unsurprising given the data is different): the binomial model has 
comnbinarial factors (e.g. choose(10,5)*choose(6,3)*choose(4,2)) that 
the Bernoulli model does not have, so the AICs differ.

I am not sure where these issues of aggregating Bernoulli trials is 
explained (nor am I near my books), but this is a common question.

On Tue, 11 Jan 2011, Uwe Ligges wrote:
> Hi,
>
> when I apply a glm() model in two ways,
> first with the response in a two column matrix specification with successes
> and failures
>
> y <- matrix(c(
>    5, 1,
>    3, 3,
>    2, 2,
>    0, 4), ncol=2, byrow=TRUE)
>
> X <- data.frame(x1 = factor(c(1,1,0,0)),
>                x2 = factor(c(0,1,0,1)))
>
> glm(y ~ x1 + x2, data = X, family="binomial")
>
>
> second with a model matrix that full rows (i.e. has as many rows as real 
> observations) and represents identical data:
>
>
> Xf <- data.frame(x1 = factor(rep(c(1,1,0,0), rowSums(y))),
>                 x2 = factor(rep(c(0,1,0,1), rowSums(y))))
> yf <- factor(rep(rep(0:1, 4), t(y)))
>
> glm(yf ~ x1 + x2, data = Xf, family="binomial")
>
>
> we will find that the number of degrees of freedom and the AIC etc. differ
--
> I'd expect them to be identical (as the coefficient estimates and such
things
> are).
>
> maybe I am confused tonight, hence I do not file it as a bug report right 
> away and wait to be enlightened ...
>
>
> Thanks and best wishes,
> Uwe
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jan 2011 - glm specification where response is a 2col matrix

[R] glm specification where response is a 2col matrix

[R] glm specification where response is a 2col matrix

Apparently Analagous Threads