Hello list I'm a little confused about the R2 and adjusted R2 values reported by lm() when I try to fix an intercept. When using +0 or -1 in the formula I have found that the standard error generally increases (as I would expect) but the R2 also increases (which seems counter intuitive). I've pasted a short test script below to illustrate. I do realise that many will say I shouldn't be fixing the intercept anyway but I'd appreciate knowing if this is a problem in the software or with my own logic.> x=1:100 > y= 20 + 0.8*(x+20*rnorm(100))> mod1 = lm(y ~ x) > summary(mod1)Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -41.332 -9.885 1.191 12.842 34.067 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.30668 3.02193 6.389 5.64e-09 *** x 0.82630 0.05195 15.905 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 15 on 98 degrees of freedom Multiple R-Squared: 0.7208, Adjusted R-squared: 0.7179 F-statistic: 253 on 1 and 98 DF, p-value: < 2.2e-16> mod2 = lm(y ~ 0 + x) > summary(mod2)Call: lm(formula = y ~ 0 + x) Residuals: Min 1Q Median 3Q Max -34.049 -6.728 6.364 18.292 47.323 Coefficients: Estimate Std. Error t value Pr(>|t|) x 1.11446 0.03053 36.51 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 17.76 on 99 degrees of freedom Multiple R-Squared: 0.9308, Adjusted R-squared: 0.9301 F-statistic: 1333 on 1 and 99 DF, p-value: < 2.2e-16 I'm running R on Windows XP and have been rolling back from version 2.8 to see if it is a version issue but I'm back to 2.1 and I am still getting the same output. Cheers, Glenn ______________________________________________ Glenn Newnham CSIRO Sustainable Ecosystems Private Bag 10 Clayton South, VIC 3169, Australia ______________________________________________ [[alternative HTML version deleted]]
G'day Glenn, On Tue, 2 Dec 2008 12:53:44 +1100 <Glenn.Newnham at csiro.au> wrote:> I'm a little confused about the R2 and adjusted R2 values reported by > lm() when I try to fix an intercept. When using +0 or -1 in the > formula I have found that the standard error generally increases (as > I would expect) but the R2 also increases (which seems counter > intuitive).?summary.lm In particular the part: r.squared: R^2, the 'fraction of variance explained by the model', R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), where y* is the mean of y[i] if there is an intercept and zero otherwise.> I do realise that many will say I shouldn't be fixing the intercept > anywayQuite true; accept if there are very good reasons. I have seen intercept through the origin being misused to obtain a large R^2 and significant coefficient when there were none. Cheers, Berwin =========================== Full address ============================Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba