I am playing with the a 1-way anova with and without the "-1" option. I have a simple cooked up example below but it behaves the same on a more complex real example. From what I can tell: 1) the estimated means of the different levels are correctly estimated either way (although reported as means with the -1 and as contrasts without the -1 as expected) 2) the residuals are identical (in this contrived example they differ slightly due to numeric instability but in a more real-world example they truly are identical) 3) BUT the r2/F/p-value are different (in my real-world example they are drastically different) How can a model that gets the same parameter estimates on the same data leading to the same residuals get different r2/F/p-value? I suspect it depends on the difference in the model.matrix (see below) but this just confused me how it got the same parameter estimates without really clearing up why the r2's are different. Any help on this is greatly appreciated! > x<-as.factor(c(1,1,1,2,2,2)) > y<-c(1.1,1.0,0.9,2.0,2.1,1.9) > summary(lm(y~x)) Call: lm(formula = y ~ x) Residuals: 1 2 3 4 5 6 1.000e-01 -4.980e-16 -1.000e-01 8.538e-18 1.000e-01 -1.000e-01 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.00000 0.05774 17.32 6.52e-05 *** x2 1.00000 0.08165 12.25 0.000255 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1 on 4 degrees of freedom Multiple R-Squared: 0.974, Adjusted R-squared: 0.9675 F-statistic: 150 on 1 and 4 DF, p-value: 0.0002552 > summary(lm(y~x-1)) Call: lm(formula = y ~ x - 1) Residuals: 1 2 3 4 5 6 1.000e-01 5.027e-16 -1.000e-01 4.405e-20 1.000e-01 -1.000e-01 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 1.00000 0.05774 17.32 6.52e-05 *** x2 2.00000 0.05774 34.64 4.14e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1 on 4 degrees of freedom Multiple R-Squared: 0.9973, Adjusted R-squared: 0.996 F-statistic: 750 on 2 and 4 DF, p-value: 7.073e-06 > m2nc=lm(y~x-1) > m2wc=lm(y~x) > resid(m2nc) 1 2 3 4 5 6 1.000000e-01 5.026734e-16 -1.000000e-01 4.404571e-20 1.000000e-01 -1.000000e-01 > resid(m2wc) 1 2 3 4 5 6 1.000000e-01 -4.980012e-16 -1.000000e-01 8.538092e-18 1.000000e-01 -1.000000e-01 > model.matrix(m2nc) x1 x2 1 1 0 2 1 0 3 1 0 4 0 1 5 0 1 6 0 1 attr(,"assign") [1] 1 1 attr(,"contrasts") attr(,"contrasts")$x [1] "contr.treatment" > model.matrix(m2wc) (Intercept) x2 1 1 0 2 1 0 3 1 0 4 1 1 5 1 1 6 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$x [1] "contr.treatment" Brian McGill Dept of Biology McGill University Stewart Biology Bldg 1205 ave Docteur Penfield Montreal, QC H3A 1B1 CANADA (514) 398-6417 http://www.brianmcgill.org mail@brianmcgill.org [[alternative HTML version deleted]]
Berwin A Turlach
2008-Feb-08 06:08 UTC
[R] Don't understand removing constant on 1-way ANOVA
G'day Brian, On Thu, 07 Feb 2008 17:56:07 -0500 Brian McGill <brian.mcgill at mcgill.ca> wrote:> I am playing with the a 1-way anova with and without the "-1" option. > > [...] > > From what I can tell: > 1) the estimated means of the different levels are correctly > estimated either way (although reported as means with the -1 and as > contrasts without the -1 as expected) > 2) the residuals are identical (in this contrived example they differ > slightly due to numeric instability but in a more real-world example > they truly are identical) > 3) BUT the r2/F/p-value are different (in my real-world example they > are drastically different) > > How can a model that gets the same parameter estimates on the same > data leading to the same residuals get different r2/F/p-value?In an R session, type "help(summary.lm)" and, under the section "Value", read about the way r.squared is calculated. Note that if your model specifies a "-1" (or a "+0"), then the model is assumed to have no intercept. Or, rather, that the null model (i.e. the model when all covariates are removed) is a response of 0 + noise. This should explain the differences that you see. R does not check whether a vector of 1s is in the column space of the design matrix and, of course, does not base the decision on whether the model has an intercept or not (i.e. whether the null model should be "mu + noise" or "0 + noise) on this not-performed test. It is a design issue, presumably to be compatible to S and, obviously, it also makes implementation of summary.lm easier, I guess. :) Occasionally, I use "-1" in the formula to construct a specific design matrix and, hence, get estimates and standard errors for quantities that are of interest. It is a bit annoying to have to refit the model with out the "-1" to get the r2/F that I want, but one has to live with it. I would prefer that the decision on whether the null model has an intercept or not would be based on whether a vector of 1s is in the column space of the design matrix and not on whether the formula has a "-1" on it. But I can also see the argument for the alternative preference. But I just noticed, if you so wish, you could be subversive:> fm1<-lm(y~x-1) > attr(fm1$terms, "intercept") <- 1 > summary(fm1)Call: lm(formula = y ~ x - 1) Residuals: 1 2 3 4 5 6 1.000e-01 3.608e-16 -1.000e-01 -3.469e-18 1.000e-01 -1.000e-01 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 1.00000 0.05774 17.32 6.52e-05 *** x2 2.00000 0.05774 34.64 4.14e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1 on 4 degrees of freedom Multiple R-Squared: 0.974, Adjusted R-squared: 0.9675 F-statistic: 150 on 1 and 4 DF, p-value: 0.0002552 Hope this helps. Cheers, Berwin =========================== Full address ============================Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba