Greetings Listers! the R-squared value reported by summary of lm is calculated as 1 - RSS/RSS_m where RSS_m is the residual sum of squares of a minimal model. In most cases, the minimal model is simply y = mean(y), but when a constant is left out of the model, the minimal model is y = 0. However, if you manually add a constant, R still considers y = 0 the minimal model. This also causes different F stats, DF, and p values. Is there a way to specify that the R-squared should be calculated using y = mean(y)? Here's an example:> a <- rnorm(100,10,5) > b <- rnorm(100,10,5) > c <- rep(1,100)> summary(lm(a~b))Call: lm(formula = a ~ b) Residuals: Min 1Q Median 3Q Max -11.8677 -3.4442 -0.5625 4.1099 10.5102 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.23724 1.05256 9.726 4.76e-16 *** b -0.02942 0.09818 -0.300 0.765 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.799 on 98 degrees of freedom Multiple R-Squared: 0.0009153, Adjusted R-squared: -0.009279 F-statistic: 0.08978 on 1 and 98 DF, p-value: 0.7651> summary(lm(a ~ b + c - 1)Call: lm(formula = a ~ b + c - 1) Residuals: Min 1Q Median 3Q Max -11.8677 -3.4442 -0.5625 4.1099 10.5102 Coefficients: Estimate Std. Error t value Pr(>|t|) b -0.02942 0.09818 -0.300 0.765 c 10.23724 1.05256 9.726 4.76e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.799 on 98 degrees of freedom Multiple R-Squared: 0.8146, Adjusted R-squared: 0.8108 F-statistic: 215.3 on 2 and 98 DF, p-value: < 2.2e-16 Thanks in advance. tim -- Tim Calkins 0406 753 997
"Tim Calkins" <tcalkins at gmail.com> writes:> Greetings Listers! > > the R-squared value reported by summary of lm is calculated as > > 1 - RSS/RSS_m > > where RSS_m is the residual sum of squares of a minimal model. In > most cases, the minimal model is simply y = mean(y), but when a > constant is left out of the model, the minimal model is y = 0. > However, if you manually add a constant, R still considers y = 0 the > minimal model. This also causes different F stats, DF, and p values.> Is there a way to specify that the R-squared should be calculated > using y = mean(y)?No. There's no structural way of discerning b and c in a ~ b + c - 1, short of an explicit check that c is a constant. So how would R know whether a ~ b - 1 or a ~ c - 1 is minimal? (And defining R-squared from non-nested models allows nastiness like values of R-squared larger than 1, so don't. You can define partial R-squared between any two nested models though, just not automatically.)> Here's an example: > > a <- rnorm(100,10,5) > > b <- rnorm(100,10,5) > > c <- rep(1,100)...> > summary(lm(a ~ b + c - 1)-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907