k=lm(y~x) summary(k) returns R^2=0.9994 lm(y~x) is supposed to find coef. a anb b in y=a*x+b l=lm(y~x+0) summary(l) returns R^2=0.9998 lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 The question is why do I get better R^2, when it should be otherwise? Im sorry to use the word "MS exel" here, but I verified it in exel and it gives: R^2=0.9994 when y=a*x+b is used R^2=0.99938 when y=a*x+0 is used -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3382818.html Sent from the R help mailing list archive at Nabble.com.
Hi Derek, R^2 doesn't mean the same thing when you omit the intercept, as has been discussed on this list before. See http://r.789695.n4.nabble.com/lm-without-intercept-td3312429.html Best, Ista On Wed, Mar 16, 2011 at 3:49 PM, derek <jan.kacaba at gmail.com> wrote:> k=lm(y~x) > summary(k) > returns R^2=0.9994 > > lm(y~x) is supposed to find coef. a anb b in y=a*x+b > > l=lm(y~x+0) > summary(l) > returns R^2=0.9998 > lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 > > The question is why do I get better R^2, when it should be otherwise? > > Im sorry to use the word "MS exel" here, but I verified it in exel and it > gives: > R^2=0.9994 when y=a*x+b is used > R^2=0.99938 when y=a*x+0 is used > > -- > View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3382818.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
?summary.lm The R^2 section explains that R^2 is computed differently depending on whether or not an intercept is in the model. -- Bert On Wed, Mar 16, 2011 at 12:49 PM, derek <jan.kacaba at gmail.com> wrote:> k=lm(y~x) > summary(k) > returns R^2=0.9994 > > lm(y~x) is supposed to find coef. a anb b in y=a*x+b > > l=lm(y~x+0) > summary(l) > returns R^2=0.9998 > lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 > > The question is why do I get better R^2, when it should be otherwise? > > Im sorry to use the word "MS exel" here, but I verified it in exel and it > gives: > R^2=0.9994 when y=a*x+b is used > R^2=0.99938 when y=a*x+0 is used > > -- > View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3382818.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics
lm(y~x+0) yields the regression on x without the constant, i.e., y=bx+e, not y = a +e derek <jan.kacaba@gmail.com> Sent by: r-help-bounces@r-project.org 03/16/2011 03:49 PM To r-help@r-project.org cc Subject [R] Strange R squared, possible error k=lm(y~x) summary(k) returns R^2=0.9994 lm(y~x) is supposed to find coef. a anb b in y=a*x+b l=lm(y~x+0) summary(l) returns R^2=0.9998 lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 The question is why do I get better R^2, when it should be otherwise? Im sorry to use the word "MS exel" here, but I verified it in exel and it gives: R^2=0.9994 when y=a*x+b is used R^2=0.99938 when y=a*x+0 is used -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3382818.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
<JLucke <at> ria.buffalo.edu> writes:> > lm(y~x+0) yields the regression on x without the constant, i.e., y=bx+e, > not y = a +e > > derek <jan.kacaba <at> gmail.com> > Sent by: r-help-bounces <at> r-project.org > 03/16/2011 03:49 PM >Would someone like to (please!) write this up and submit it to Kurt Hornik for inclusion in the R FAQ? Ben Bolker
On Wed, Mar 16, 2011 at 3:49 PM, derek <jan.kacaba at gmail.com> wrote:> k=lm(y~x) > summary(k) > returns R^2=0.9994 > > lm(y~x) is supposed to find coef. a anb b in y=a*x+b > > l=lm(y~x+0) > summary(l) > returns R^2=0.9998 > lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 > > The question is why do I get better R^2, when it should be otherwise? > > Im sorry to use the word "MS exel" here, but I verified it in exel and it > gives: > R^2=0.9994 when y=a*x+b is used > R^2=0.99938 when y=a*x+0 is used >The idea is that if you have a positive quantity that can be broken down into two nonnegative quantities: X = X1 + X2 then it makes sense to ask what proportion X1 is of X. For example: 10 = 6 + 4 and 6 is .6 of the total. Now, in the case of a model with an intercept its a mathematical fact that the variance of the response equals the variance of the fitted model plus the variance of the residuals. Thus it makes sense to ask what fraction of the variance of the response is represented by the variance of the fitted model (this fraction is R^2). But if there is no intercept then that mathematical fact breaks down. That is, its no longer true that the variance of the response equals the variance of the fitted model plus the variance of the residuals. Thus how meaningful is it to ask what proportion the variance of the fitted model is of the variance of the response in the first place? In this case we need to rethink the entire approach which is why a different formula is required. Also, maybe the real problem is not this at all. That is perhaps you are not really trying to find the goodness of fit but rather you are trying to compare two particular models: one with intercept and one without. In that case R^2 is not really what you want. Instead use the R anova command. For example, using the built in BOD data frame:> fm <- lm(demand ~ Time, BOD) > fm0 <- lm(demand ~ Time - 1, BOD) > anova(fm, fm0)Analysis of Variance Table Model 1: demand ~ Time Model 2: demand ~ Time - 1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 4 38.069 2 5 135.820 -1 -97.751 10.271 0.03275 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Here we see that the residual sum of squares is much less for the full model than for the reduced model and its significant at the 3.275% level. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com