I am wondering how to interpret the parameter estimates that lm() reports in this sort of situation: y = round(rnorm(n=24,mean=5,sd=2),2) A = gl(3,2,24,labels=c("one","two","three")) B = gl(4,6,24,labels=c("i","ii","iii","iv")) # Make both observations for A=1, B=4 missing y[19] = NA y[20] = NA data.frame(y,A,B) nonadd = lm(y ~ A * B)> summary(nonadd)Call: lm(formula = y ~ A * B) Residuals: Min 1Q Median 3Q Max -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.755 1.667 2.252 0.0457 * Atwo 1.655 2.358 0.702 0.4974 Athree 3.330 2.358 1.412 0.1856 Bii 1.435 2.358 0.609 0.5552 Biii 2.055 2.358 0.871 0.4021 Biv -1.635 2.358 -0.693 0.5025 Atwo:Bii -1.145 3.335 -0.343 0.7378 Athree:Bii -4.535 3.335 -1.360 0.2011 Atwo:Biii -3.230 3.335 -0.969 0.3536 Athree:Biii -2.105 3.335 -0.631 0.5408 Atwo:Biv 1.655 3.335 0.496 0.6295 Athree:Biv NA NA NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 2.358 on 11 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752 F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044> fitted(nonadd)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985 5.810 5.810 4.235 4.235 7.035 7.035 5.430 22 23 24 5.430 5.450 5.450> t(model.matrix(nonadd)%*%coef(nonadd))1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24 [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA I guess that the parameter estimates reported are linear combinations of the cell means, but which linear combinations and how does lm() decide what parameters to report? Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350
I am wondering how to interpret the parameter estimates that lm() reports in this sort of situation: y = round(rnorm(n=24,mean=5,sd=2),2) A = gl(3,2,24,labels=c("one","two","three")) B = gl(4,6,24,labels=c("i","ii","iii","iv")) # Make both observations for A=1, B=4 missing y[19] = NA y[20] = NA data.frame(y,A,B) nonadd = lm(y ~ A * B)> summary(nonadd)Call: lm(formula = y ~ A * B) Residuals: Min 1Q Median 3Q Max -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.755 1.667 2.252 0.0457 * Atwo 1.655 2.358 0.702 0.4974 Athree 3.330 2.358 1.412 0.1856 Bii 1.435 2.358 0.609 0.5552 Biii 2.055 2.358 0.871 0.4021 Biv -1.635 2.358 -0.693 0.5025 Atwo:Bii -1.145 3.335 -0.343 0.7378 Athree:Bii -4.535 3.335 -1.360 0.2011 Atwo:Biii -3.230 3.335 -0.969 0.3536 Athree:Biii -2.105 3.335 -0.631 0.5408 Atwo:Biv 1.655 3.335 0.496 0.6295 Athree:Biv NA NA NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 2.358 on 11 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752 F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044> fitted(nonadd)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985 5.810 5.810 4.235 4.235 7.035 7.035 5.430 22 23 24 5.430 5.450 5.450> t(model.matrix(nonadd)%*%coef(nonadd))1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24 [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA I guess that the parameter estimates reported are linear combinations of the cell means, but which linear combinations and how does lm() decide what parameters to report? Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350
Does this help at all? <after your code...> > contrasts(A) two three one 0 0 two 1 0 three 0 1 > contrasts(B) ii iii iv i 0 0 0 ii 1 0 0 iii 0 1 0 iv 0 0 1 > contrasts(A:B) one:ii one:iii one:iv two:i two:ii two:iii two:iv three:i three:ii three:iii three:iv one:i 0 0 0 0 0 0 0 0 0 0 0 one:ii 1 0 0 0 0 0 0 0 0 0 0 one:iii 0 1 0 0 0 0 0 0 0 0 0 one:iv 0 0 1 0 0 0 0 0 0 0 0 two:i 0 0 0 1 0 0 0 0 0 0 0 two:ii 0 0 0 0 1 0 0 0 0 0 0 two:iii 0 0 0 0 0 1 0 0 0 0 0 two:iv 0 0 0 0 0 0 1 0 0 0 0 three:i 0 0 0 0 0 0 0 1 0 0 0 three:ii 0 0 0 0 0 0 0 0 1 0 0 three:iii 0 0 0 0 0 0 0 0 0 1 0 three:iv 0 0 0 0 0 0 0 0 0 0 1 -- David On Aug 2, 2009, at 6:40 PM, Murray Jorgensen wrote:> I am wondering how to interpret the parameter estimates that lm() > reports in this sort of situation: > > y = round(rnorm(n=24,mean=5,sd=2),2) > A = gl(3,2,24,labels=c("one","two","three")) > B = gl(4,6,24,labels=c("i","ii","iii","iv")) > # Make both observations for A=1, B=4 missing > y[19] = NA > y[20] = NA > data.frame(y,A,B) > nonadd = lm(y ~ A * B) > > >> summary(nonadd) > > Call: > lm(formula = y ~ A * B) > > Residuals: > Min 1Q Median 3Q Max > -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00 > > Coefficients: (1 not defined because of singularities) > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.755 1.667 2.252 0.0457 * > Atwo 1.655 2.358 0.702 0.4974 > Athree 3.330 2.358 1.412 0.1856 > Bii 1.435 2.358 0.609 0.5552 > Biii 2.055 2.358 0.871 0.4021 > Biv -1.635 2.358 -0.693 0.5025 > Atwo:Bii -1.145 3.335 -0.343 0.7378 > Athree:Bii -4.535 3.335 -1.360 0.2011 > Atwo:Biii -3.230 3.335 -0.969 0.3536 > Athree:Biii -2.105 3.335 -0.631 0.5408 > Atwo:Biv 1.655 3.335 0.496 0.6295 > Athree:Biv NA NA NA NA > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 2.358 on 11 degrees of freedom > (2 observations deleted due to missingness) > Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752 > F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044 > >> fitted(nonadd) > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 > 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 > 3.985 > 5.810 5.810 4.235 4.235 7.035 7.035 5.430 > 22 23 24 > 5.430 5.450 5.450 >> t(model.matrix(nonadd)%*%coef(nonadd)) > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24 > [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > > I guess that the parameter estimates reported are linear > combinations of > the cell means, but which linear combinations and how does lm() decide > what parameters to report? > > Cheers, Murray > > -- > Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html > Department of Statistics, University of Waikato, Hamilton, New ZealandDavid Winsemius, MD Heritage Laboratories West Hartford, CT
Murray Jorgensen wrote:> I am wondering how to interpret the parameter estimates that lm() > reports in this sort of situation: > > y = round(rnorm(n=24,mean=5,sd=2),2) > A = gl(3,2,24,labels=c("one","two","three")) > B = gl(4,6,24,labels=c("i","ii","iii","iv")) > # Make both observations for A=1, B=4 missing > y[19] = NA > y[20] = NA > data.frame(y,A,B) > nonadd = lm(y ~ A * B) > > >> summary(nonadd) > > Call: > lm(formula = y ~ A * B) > > Residuals: > Min 1Q Median 3Q Max > -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00 > > Coefficients: (1 not defined because of singularities) > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.755 1.667 2.252 0.0457 * > Atwo 1.655 2.358 0.702 0.4974 > Athree 3.330 2.358 1.412 0.1856 > Bii 1.435 2.358 0.609 0.5552 > Biii 2.055 2.358 0.871 0.4021 > Biv -1.635 2.358 -0.693 0.5025 > Atwo:Bii -1.145 3.335 -0.343 0.7378 > Athree:Bii -4.535 3.335 -1.360 0.2011 > Atwo:Biii -3.230 3.335 -0.969 0.3536 > Athree:Biii -2.105 3.335 -0.631 0.5408 > Atwo:Biv 1.655 3.335 0.496 0.6295 > Athree:Biv NA NA NA NA > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 2.358 on 11 degrees of freedom > (2 observations deleted due to missingness) > Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752 > F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044 > >> fitted(nonadd) > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 > 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985 > 5.810 5.810 4.235 4.235 7.035 7.035 5.430 > 22 23 24 > 5.430 5.450 5.450 >> t(model.matrix(nonadd)%*%coef(nonadd)) > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24 > [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > > I guess that the parameter estimates reported are linear combinations of > the cell means, but which linear combinations and how does lm() decide > what parameters to report? > > Cheers, Murray >What's the problem? The parameters are defined as usual for the two-way layout: The intercept is the fitted value in the top left corner The A coefficients are the fitted values in the first column minus the intercept. The B coefficients vice versa. The interaction coefficients are the fitted values minus the sum of the the intercept and the corresponding A and B coefficients. One interaction coefficient is set missing because you have no data, but except for that, the fitted values equal the cell means. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907