Hi, when I apply a glm() model in two ways, first with the response in a two column matrix specification with successes and failures y <- matrix(c( 5, 1, 3, 3, 2, 2, 0, 4), ncol=2, byrow=TRUE) X <- data.frame(x1 = factor(c(1,1,0,0)), x2 = factor(c(0,1,0,1))) glm(y ~ x1 + x2, data = X, family="binomial") second with a model matrix that full rows (i.e. has as many rows as real observations) and represents identical data: Xf <- data.frame(x1 = factor(rep(c(1,1,0,0), rowSums(y))), x2 = factor(rep(c(0,1,0,1), rowSums(y)))) yf <- factor(rep(rep(0:1, 4), t(y))) glm(yf ~ x1 + x2, data = Xf, family="binomial") we will find that the number of degrees of freedom and the AIC etc. differ -- I'd expect them to be identical (as the coefficient estimates and such things are). maybe I am confused tonight, hence I do not file it as a bug report right away and wait to be enlightened ... Thanks and best wishes, Uwe
Prof Brian Ripley
2011-Jan-11 18:01 UTC
[R] glm specification where response is a 2col matrix
Your first model is a binomial glm witb 4 observations of 6,6,4,4 trials. Your second model is a Bernoulli glm with 20 observations of one trial each. The saturated models are different, as are the likelihoods (unsurprising given the data is different): the binomial model has comnbinarial factors (e.g. choose(10,5)*choose(6,3)*choose(4,2)) that the Bernoulli model does not have, so the AICs differ. I am not sure where these issues of aggregating Bernoulli trials is explained (nor am I near my books), but this is a common question. On Tue, 11 Jan 2011, Uwe Ligges wrote:> Hi, > > when I apply a glm() model in two ways, > first with the response in a two column matrix specification with successes > and failures > > y <- matrix(c( > 5, 1, > 3, 3, > 2, 2, > 0, 4), ncol=2, byrow=TRUE) > > X <- data.frame(x1 = factor(c(1,1,0,0)), > x2 = factor(c(0,1,0,1))) > > glm(y ~ x1 + x2, data = X, family="binomial") > > > second with a model matrix that full rows (i.e. has as many rows as real > observations) and represents identical data: > > > Xf <- data.frame(x1 = factor(rep(c(1,1,0,0), rowSums(y))), > x2 = factor(rep(c(0,1,0,1), rowSums(y)))) > yf <- factor(rep(rep(0:1, 4), t(y))) > > glm(yf ~ x1 + x2, data = Xf, family="binomial") > > > we will find that the number of degrees of freedom and the AIC etc. differ -- > I'd expect them to be identical (as the coefficient estimates and such things > are). > > maybe I am confused tonight, hence I do not file it as a bug report right > away and wait to be enlightened ... > > > Thanks and best wishes, > Uwe-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595