Christophe Dutang
2024-Jun-14 06:12 UTC
[R] Column names of model.matrix's output with contrast.arg
Dear list, Changing the default contrasts used in glm() makes me aware how model.matrix() set column names. With default contrasts, model.matrix() use the level values to name the columns. However with other contrasts, model.matrix() use the level indexes. In the documentation, I don?t see anything in the documentation related to this ? It does not seem natural to have such a behavior? Any comment is welcome. An example is below. Kind regards, Christophe #example from ?glm counts <- c(18,17,15,20,10,20,25,13,12) outcome <- paste0("O", gl(3,1,9)) treatment <- paste0("T", gl(3,3)) X3 <- model.matrix(counts ~ outcome + treatment) X4 <- model.matrix(counts ~ outcome + treatment, contrasts = list("outcome"="contr.sum")) X5 <- model.matrix(counts ~ outcome + treatment, contrasts = list("outcome"="contr.helmert")) #check with original factor cbind.data.frame(X3, outcome) cbind.data.frame(X4, outcome) cbind.data.frame(X5, outcome) #same issue with glm glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = list("outcome"="contr.sum")) glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = list("outcome"="contr.helmert")) coef(glm.D93) coef(glm.D94) coef(glm.D95) #check linear predictor cbind(X3 %*% coef(glm.D93), predict(glm.D93)) cbind(X4 %*% coef(glm.D94), predict(glm.D94)) ------------------------------------------------- Christophe DUTANG LJK, Ensimag, Grenoble INP, UGA, France ILB research fellow Web: http://dutangc.free.fr
peter dalgaard
2024-Jun-14 09:45 UTC
[R] Column names of model.matrix's output with contrast.arg
You're at the mercy of the various contr.XXX functions. They may or may not set the colnames on the matrices that they generate. The rationales for (not) setting them is not perfectly transparent, but you obviously cannot use level names on contr.poly, so it uses .L, .Q, etc. In MASS, contr.sdif is careful about labeling the columns with the levels that are being diff'ed. For contr.treatment, there is a straightforward connection to 0/1 dummy variables, so level names there are natural. One could use levels in contr.sum and contr.helmert, but it might confuse users that comparisons are with the average of all levels or preceding levels. (It can be quite confusing when coding is +1 for male and -1 for female, so that the gender difference is twice the coefficient.) -pd> On 14 Jun 2024, at 08:12 , Christophe Dutang <dutangc at gmail.com> wrote: > > Dear list, > > Changing the default contrasts used in glm() makes me aware how model.matrix() set column names. > > With default contrasts, model.matrix() use the level values to name the columns. However with other contrasts, model.matrix() use the level indexes. In the documentation, I don?t see anything in the documentation related to this ? It does not seem natural to have such a behavior? > > Any comment is welcome. > > An example is below. > > Kind regards, Christophe > > > #example from ?glm > counts <- c(18,17,15,20,10,20,25,13,12) > outcome <- paste0("O", gl(3,1,9)) > treatment <- paste0("T", gl(3,3)) > > X3 <- model.matrix(counts ~ outcome + treatment) > X4 <- model.matrix(counts ~ outcome + treatment, contrasts = list("outcome"="contr.sum")) > X5 <- model.matrix(counts ~ outcome + treatment, contrasts = list("outcome"="contr.helmert")) > > #check with original factor > cbind.data.frame(X3, outcome) > cbind.data.frame(X4, outcome) > cbind.data.frame(X5, outcome) > > #same issue with glm > glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) > glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = list("outcome"="contr.sum")) > glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = list("outcome"="contr.helmert")) > > coef(glm.D93) > coef(glm.D94) > coef(glm.D95) > > #check linear predictor > cbind(X3 %*% coef(glm.D93), predict(glm.D93)) > cbind(X4 %*% coef(glm.D94), predict(glm.D94)) > > ------------------------------------------------- > Christophe DUTANG > LJK, Ensimag, Grenoble INP, UGA, France > ILB research fellow > Web: http://dutangc.free.fr > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com