Hi, I would consider the calculation of r-squared in the following to be a bug, but then, I've been wrong before. It seems that R looks to see if the model contains an intercept term, and if it does not, computes r-squared in a way I don't understand. To my mind, the following are two alternative parametrizations of the same model, and should yield the same r-squared. Any insight much appreciated Simon.> set.seed(10,kind=NULL) > x <- runif(10) > g <- gl(2,5) > y <- runif(10) > > summary(lm(y ~ g*x))Call: lm(formula = y ~ g * x) Residuals: Min 1Q Median 3Q Max -0.35205 -0.14021 0.02486 0.13958 0.39671 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.3138 0.2749 1.141 0.297 g2 -0.1568 0.4339 -0.361 0.730 x 0.3556 0.6082 0.585 0.580 g2:x 0.3018 1.0522 0.287 0.784 Residual standard error: 0.276 on 6 degrees of freedom Multiple R-Squared: 0.1491, Adjusted R-squared: -0.2763 F-statistic: 0.3505 on 3 and 6 DF, p-value: 0.7907> summary(lm(y ~ g/x-1))Call: lm(formula = y ~ g/x - 1) Residuals: Min 1Q Median 3Q Max -0.35205 -0.14021 0.02486 0.13958 0.39671 Coefficients: Estimate Std. Error t value Pr(>|t|) g1 0.3138 0.2749 1.141 0.297 g2 0.1570 0.3357 0.468 0.657 g1:x 0.3556 0.6082 0.585 0.580 g2:x 0.6574 0.8586 0.766 0.473 Residual standard error: 0.276 on 6 degrees of freedom Multiple R-Squared: 0.8061, Adjusted R-squared: 0.6769 F-statistic: 6.237 on 4 and 6 DF, p-value: 0.02491 --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status major = 1 minor = 8.0 year = 2003 month = 10 day = 08 language = R Windows ME 4.90 (build 3000) Search Path: .GlobalEnv, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, Autoloads, package:base ---
On Mon, 3 Nov 2003, Simon Wotherspoon wrote:> I would consider the calculation of r-squared in the following to be a > bug, but then, I've been wrong before. It seems that R looks to see if the > model contains an intercept term, and if it does not, computes r-squared in > a way I don't understand. To my mind, the following are two alternative > parametrizations of the same model, and should yield the same r-squared.But the minimal model contains a overall mean in the first case and not in the second. Your models are 1+g+x+g:x and 0+g+g:x. So whereas they are alternative parametrizations of the same full model, R^2 compares two models, not just one. You will find this explained on ?summary.lm (RTFM!) and several times in the archives. One solution is to ignore the R^2 lines (as I do and tell my students to do). But we might consider labelling the printed output as say Multiple R-Squared (no int): 0.8061, Adjusted R-squared: 0.6769 to remind people.> > Any insight much appreciated > > Simon. > > > > > set.seed(10,kind=NULL) > > x <- runif(10) > > g <- gl(2,5) > > y <- runif(10) > > > > summary(lm(y ~ g*x)) > > Call: > lm(formula = y ~ g * x) > > Residuals: > Min 1Q Median 3Q Max > -0.35205 -0.14021 0.02486 0.13958 0.39671 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.3138 0.2749 1.141 0.297 > g2 -0.1568 0.4339 -0.361 0.730 > x 0.3556 0.6082 0.585 0.580 > g2:x 0.3018 1.0522 0.287 0.784 > > Residual standard error: 0.276 on 6 degrees of freedom > Multiple R-Squared: 0.1491, Adjusted R-squared: -0.2763 > F-statistic: 0.3505 on 3 and 6 DF, p-value: 0.7907 > > > summary(lm(y ~ g/x-1)) > > Call: > lm(formula = y ~ g/x - 1) > > Residuals: > Min 1Q Median 3Q Max > -0.35205 -0.14021 0.02486 0.13958 0.39671 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > g1 0.3138 0.2749 1.141 0.297 > g2 0.1570 0.3357 0.468 0.657 > g1:x 0.3556 0.6082 0.585 0.580 > g2:x 0.6574 0.8586 0.766 0.473 > > Residual standard error: 0.276 on 6 degrees of freedom > Multiple R-Squared: 0.8061, Adjusted R-squared: 0.6769 > F-statistic: 6.237 on 4 and 6 DF, p-value: 0.02491 > > > > --please do not edit the information below-- > > Version: > platform = i386-pc-mingw32 > arch = i386 > os = mingw32 > system = i386, mingw32 > status > major = 1 > minor = 8.0 > year = 2003 > month = 10 > day = 08 > language = R > > Windows ME 4.90 (build 3000) > > Search Path: > .GlobalEnv, package:methods, package:ctest, package:mva, package:modreg, > package:nls, package:ts, Autoloads, package:base > --- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 3 Nov 2003 at 17:53, Simon Wotherspoon wrote: If you are really interested, you could write your own function to calculate r-squared, deciding from the model matrix if the range space of the model matrix contains a constant vector. If model is the model matrix with n rows, one way is to compare the $rank components of qr(model) and qr(cbind(rep(1,n), model)) Kjetil Halvorsen> Hi, > I would consider the calculation of r-squared in the following to be a > bug, but then, I've been wrong before. It seems that R looks to see if the > model contains an intercept term, and if it does not, computes r-squared in > a way I don't understand. To my mind, the following are two alternative > parametrizations of the same model, and should yield the same r-squared. > > Any insight much appreciated > > Simon. > > > > > set.seed(10,kind=NULL) > > x <- runif(10) > > g <- gl(2,5) > > y <- runif(10) > > > > summary(lm(y ~ g*x)) > > Call: > lm(formula = y ~ g * x) > > Residuals: > Min 1Q Median 3Q Max > -0.35205 -0.14021 0.02486 0.13958 0.39671 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.3138 0.2749 1.141 0.297 > g2 -0.1568 0.4339 -0.361 0.730 > x 0.3556 0.6082 0.585 0.580 > g2:x 0.3018 1.0522 0.287 0.784 > > Residual standard error: 0.276 on 6 degrees of freedom > Multiple R-Squared: 0.1491, Adjusted R-squared: -0.2763 > F-statistic: 0.3505 on 3 and 6 DF, p-value: 0.7907 > > > summary(lm(y ~ g/x-1)) > > Call: > lm(formula = y ~ g/x - 1) > > Residuals: > Min 1Q Median 3Q Max > -0.35205 -0.14021 0.02486 0.13958 0.39671 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > g1 0.3138 0.2749 1.141 0.297 > g2 0.1570 0.3357 0.468 0.657 > g1:x 0.3556 0.6082 0.585 0.580 > g2:x 0.6574 0.8586 0.766 0.473 > > Residual standard error: 0.276 on 6 degrees of freedom > Multiple R-Squared: 0.8061, Adjusted R-squared: 0.6769 > F-statistic: 6.237 on 4 and 6 DF, p-value: 0.02491 > > > > --please do not edit the information below-- > > Version: > platform = i386-pc-mingw32 > arch = i386 > os = mingw32 > system = i386, mingw32 > status > major = 1 > minor = 8.0 > year = 2003 > month = 10 > day = 08 > language = R > > Windows ME 4.90 (build 3000) > > Search Path: > .GlobalEnv, package:methods, package:ctest, package:mva, package:modreg, > package:nls, package:ts, Autoloads, package:base > --- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help