This is an example from chapter 11 of the 6th edition of Devore's engineering statistics text. It happens to be a balanced data set in two factors but the calculations will also work for unbalanced data. I create a factor called 'cell' from the text representation of the Variety level and the Density level using '/' as the separator character. The coefficients for the linear model 'Yield ~ cell - 1' are exactly the cell means.> library(Devore6) > data(xmp11.07) > str(xmp11.07)`data.frame': 36 obs. of 3 variables: $ Yield : num 10.5 9.2 7.9 12.8 11.2 13.3 12.1 12.6 14 10.8 ... $ Variety: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ Density: Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ...> xmp11.07$cell = with(xmp11.07, factor(paste(Variety, Density, sep = '/'))) > xtabs(~ Variety + Density, xmp11.07)Density Variety 1 2 3 4 1 3 3 3 3 2 3 3 3 3 3 3 3 3 3> means = xtabs(Yield ~ Variety+Density, xmp11.07)/xtabs(~ Variety + Density, xmp11.07) > meansDensity Variety 1 2 3 4 1 9.200000 12.433333 12.900000 10.800000 2 8.933333 12.633333 14.500000 12.766667 3 16.300000 18.100000 19.933333 18.166667> coef(fm1 <- lm(Yield ~ cell - 1, xmp11.07))cell1/1 cell1/2 cell1/3 cell1/4 cell2/1 cell2/2 cell2/3 cell2/4 9.200000 12.433333 12.900000 10.800000 8.933333 12.633333 14.500000 12.766667 cell3/1 cell3/2 cell3/3 cell3/4 16.300000 18.100000 19.933333 18.166667 The residual sum of squares for this model is the same as that for the model with crossed terms> deviance(fm1)[1] 38.04> deviance(fm2 <- lm(Yield ~ Variety * Density, xmp11.07))[1] 38.04 but the coefficients are quite different because they represent a different parameterization.> coef(fm2)(Intercept) Variety2 Variety3 Density2 9.20000000 -0.26666667 7.10000000 3.23333333 Density3 Density4 Variety2:Density2 Variety3:Density2 3.70000000 1.60000000 0.46666667 -1.43333333 Variety2:Density3 Variety3:Density3 Variety2:Density4 Variety3:Density4 1.86666667 -0.06666667 2.23333333 0.26666667 I hope this answers your question. Sorry for the delay in responding to you. "Francisco Vergara" <gerifalte28 at hotmail.com> writes:> Many thanks for your reply. An example would be very helpful to make > sure that I understand correctly what you described.
Thanks a lot for your reply. This helps a lot! Just to confirm, using lm this model will give me the mean yield value for each cell in the two way array. Now if I want to obtain the mean of group means (like a SS type III approach) using LME (since I have random effects in the model) how can I parametrize this? I could definitivelly use xtabs in a two-way case but in my case I have 2 other (continuous) covariates that are potential confounders in the model so I need to keep them to obtain the corrected means. I added a continuous variable (NewVar) to the dataset Newxmp11.07 and obtaned a model with the covariate (fm4) and another without it (fm3)>Newxmp11.07<-fix(xmp11.07) >str(Newxmp11.07)`data.frame': 36 obs. of 4 variables: $ Yield : num 10.5 9.2 7.9 12.8 11.2 13.3 12.1 12.6 14 10.8 ... $ Variety: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ Density: Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ... $ NewVar : num 10 9 7 12 11 11 12 11 15 16 ...>fm3<-gls(Yield~-1+Variety + Density, xmp11.07) >fm4<-gls(Yield~-1+Variety + Density+NewVar, Newxmp11.07) >fm3Coefficients: Variety1 Variety2 Variety3 Density2 Density3 Density4 8.922222 9.797222 15.713889 2.911111 4.300000 2.433333 Degrees of freedom: 36 total; 30 residual Residual standard error: 1.239243>fm4Coefficients: Variety1 Variety2 Variety3 Density2 Density3 Density4 8.75757265 9.70589316 15.43347009 2.88152564 4.32186752 2.40117521 NewVar 0.01157692 Degrees of freedom: 36 total; 29 residual Residual standard error: 1.252991 fm4 gives me the mean of the group means for all the varieties but apparently it gives me the treatment contrasts for the densities. If I change the order of the factors in the model specification I get>coef(fm5<-gls(Yield~-1+ Density+Variety+NewVar, Newxmp11.07))Density1 Density2 Density3 Density4 Variety2 Variety3 8.75757265 11.63909829 13.07944017 11.15874786 0.94832051 6.67589744 NewVar 0.01157692 This, just like fm4 will include the original intercept value in Density 1 which is not the actual density 1 mean. What am I missing? I am sorry if these questions are very basic but I want to make sure that I understand what I am doing. I guess that this is the price that I am paying for having used in the past packages like SAS where you just ask for lsmeans and the software will give you a "black box" answer! Best regards Francisco>From: Douglas Bates <bates at stat.wisc.edu> >To: "Francisco Vergara" <gerifalte28 at hotmail.com> >CC: r-help at r-project.org >Subject: Re: [R] Example of cell means model >Date: 15 Oct 2003 08:40:33 -0500 > >This is an example from chapter 11 of the 6th edition of Devore's >engineering statistics text. It happens to be a balanced data set in >two factors but the calculations will also work for unbalanced data. > >I create a factor called 'cell' from the text representation of the >Variety level and the Density level using '/' as the separator >character. The coefficients for the linear model 'Yield ~ cell - 1' >are exactly the cell means. > > > library(Devore6) > > data(xmp11.07) > > str(xmp11.07) >`data.frame': 36 obs. of 3 variables: > $ Yield : num 10.5 9.2 7.9 12.8 11.2 13.3 12.1 12.6 14 10.8 ... > $ Variety: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... > $ Density: Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ... > > xmp11.07$cell = with(xmp11.07, factor(paste(Variety, Density, sep = >'/'))) > > xtabs(~ Variety + Density, xmp11.07) > Density >Variety 1 2 3 4 > 1 3 3 3 3 > 2 3 3 3 3 > 3 3 3 3 3 > > means = xtabs(Yield ~ Variety+Density, xmp11.07)/xtabs(~ Variety + >Density, xmp11.07) > > means > Density >Variety 1 2 3 4 > 1 9.200000 12.433333 12.900000 10.800000 > 2 8.933333 12.633333 14.500000 12.766667 > 3 16.300000 18.100000 19.933333 18.166667 > > coef(fm1 <- lm(Yield ~ cell - 1, xmp11.07)) > cell1/1 cell1/2 cell1/3 cell1/4 cell2/1 cell2/2 cell2/3 >cell2/4 > 9.200000 12.433333 12.900000 10.800000 8.933333 12.633333 14.500000 >12.766667 > cell3/1 cell3/2 cell3/3 cell3/4 >16.300000 18.100000 19.933333 18.166667 > >The residual sum of squares for this model is the same as that for the >model with crossed terms > > > deviance(fm1) >[1] 38.04 > > deviance(fm2 <- lm(Yield ~ Variety * Density, xmp11.07)) >[1] 38.04 > >but the coefficients are quite different because they represent a >different parameterization. > > > coef(fm2) > (Intercept) Variety2 Variety3 Density2 > 9.20000000 -0.26666667 7.10000000 3.23333333 > Density3 Density4 Variety2:Density2 Variety3:Density2 > 3.70000000 1.60000000 0.46666667 -1.43333333 >Variety2:Density3 Variety3:Density3 Variety2:Density4 Variety3:Density4 > 1.86666667 -0.06666667 2.23333333 0.26666667 > >I hope this answers your question. Sorry for the delay in >responding to you. > >"Francisco Vergara" <gerifalte28 at hotmail.com> writes: > > > Many thanks for your reply. An example would be very helpful to make > > sure that I understand correctly what you described._________________________________________________________________ Surf and talk on the phone at the same time with broadband Internet access. Get high-speed for as low as $29.95/month (depending on the local service providers in your area). https://broadband.msn.com
Many thansk to Dr. Bates to take time follow on this conversation. Could anybody follow up on the question on what the SAS lsmeans function exactly does and how to reproduce that on R? Thanks Francisco>From: Douglas Bates <bates at stat.wisc.edu> >To: "Francisco Vergara" <gerifalte28 at hotmail.com> >CC: r-help at r-project.org >Subject: Re: [R] Example of cell means model >Date: 16 Oct 2003 08:38:35 -0500 > >"Francisco Vergara" <gerifalte28 at hotmail.com> writes: > > > Thanks a lot for your reply. This helps a lot! > > Just to confirm, using lm this model will give me the mean yield value > > for each cell in the two way array. Now if I want to obtain the mean > > of group means (like a SS type III approach) using LME (since I have > > random effects in the model) how can I parametrize this? > >Didn't I just answer that? > > > I could definitivelly use xtabs in a two-way case but in my case I > > have 2 other (continuous) covariates that are potential confounders in > > the model so I need to keep them to obtain the corrected means. > > I added a continuous variable (NewVar) to the dataset Newxmp11.07 and > > obtaned a model with the covariate (fm4) and another without it (fm3) > >I think you misinterpreted what I wrote. I used xtabs to show an >alternative, direct calculation of the cell means. The purpose was to >to compare the result with the coefficients in the model fit with the >new factor constructed as I described. > > > >Newxmp11.07<-fix(xmp11.07) > > >str(Newxmp11.07) > > `data.frame': 36 obs. of 4 variables: > > $ Yield : num 10.5 9.2 7.9 12.8 11.2 13.3 12.1 12.6 14 10.8 ... > > $ Variety: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... > > $ Density: Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ... > > $ NewVar : num 10 9 7 12 11 11 12 11 15 16 ... > > > > >fm3<-gls(Yield~-1+Variety + Density, xmp11.07) > > >fm4<-gls(Yield~-1+Variety + Density+NewVar, Newxmp11.07) > >Why are you using gls for these models? > >Why do you not generate a factor to achieve the cell means model that >(apparently) you want? > > > >fm3 > > Coefficients: > > Variety1 Variety2 Variety3 Density2 Density3 Density4 > > 8.922222 9.797222 15.713889 2.911111 4.300000 2.433333 > > > > Degrees of freedom: 36 total; 30 residual > > Residual standard error: 1.239243 > > > > >fm4 > > Coefficients: > > Variety1 Variety2 Variety3 Density2 Density3 Density4 > > 8.75757265 9.70589316 15.43347009 2.88152564 4.32186752 2.40117521 > > NewVar > > 0.01157692 > > > > Degrees of freedom: 36 total; 29 residual > > Residual standard error: 1.252991 > > > > fm4 gives me the mean of the group means for all the varieties but > > apparently it gives me the treatment contrasts for the densities. If > > I change the order of the factors in the model specification I get > >Please read what I wrote. > > > >coef(fm5<-gls(Yield~-1+ Density+Variety+NewVar, Newxmp11.07)) > > Density1 Density2 Density3 Density4 Variety2 Variety3 > > 8.75757265 11.63909829 13.07944017 11.15874786 0.94832051 6.67589744 > > NewVar > > 0.01157692 > > > > This, just like fm4 will include the original intercept value in > > Density 1 which is not the actual density 1 mean. What am I missing? > > I am sorry if these questions are very basic but I want to make sure > > that I understand what I am doing. I guess that this is the price that > > I am paying for having used in the past packages like SAS where you > > just ask for lsmeans and the software will give you a "black box" > > answer! > >Yes. I think it would be interesting to ask people who use the >results from lsmeans to explain what the results represent. My guess >is that less than 1% of the people who use lsmeans know what they in >fact are. > >I won't be able to continue this conversation. I am very busy right >putting new facilities into R. Perhaps others on the list will be >able to respond to your questions. >_________________________________________________________________ Try MSN Messenger 6.0 with integrated webcam functionality!