Hi, everybody, 3 questions about R-square: ---------(1)----------- Does R2 always increase as variables are added? ---------(2)----------- Does R2 always greater than 1? ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared calculated? It is different from (r.square=sum((y.hat-mean (y))^2)/sum((y-mean(y))^2)) I will illustrate these problems by the following codes: ---------(1)----------- R2 doesn't always increase as variables are added> x=matrix(rnorm(20),ncol=2) > y=rnorm(10) > > lm=lm(y~1) > y.hat=rep(1*lm$coefficients,length(y)) > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))[1] 2.646815e-33> > lm=lm(y~x-1) > y.hat=x%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))[1] 0.4443356> > ################ This is the biggest model, but its R2 is not the biggest,why?> lm=lm(y~x) > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))[1] 0.2704789 ---------(2)----------- R2 can greater than 1> x=rnorm(10) > y=runif(10) > lm=lm(y~x-1) > y.hat=x*lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))[1] 3.513865 ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared calculated? It is different from (r.square=sum((y.hat-mean (y))^2)/sum((y-mean(y))^2))> x=matrix(rnorm(20),ncol=2) > xx=cbind(rep(1,10),x) > y=x%*%c(1,2)+rnorm(10) > ### r2 calculated by lm(y~x) > lm=lm(y~x) > summary(lm)$r.squared[1] 0.9231062> ### r2 calculated by lm(y~xx-1) > lm=lm(y~xx-1) > summary(lm)$r.squared[1] 0.9365253> ### r2 calculated by me > y.hat=xx%*%lm$coefficients > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))[1] 0.9231062 Thanks a lot for any cue:) -- Junjie Li, klijunjie@gmail.com Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]]
On 17/05/2007 7:02 AM, ??? wrote: > Hi, everybody, > > 3 questions about R-square: > ---------(1)----------- Does R2 always increase as variables are added? > ---------(2)----------- Does R2 always greater than 1? > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared > calculated? It is different from (r.square=sum((y.hat-mean > (y))^2)/sum((y-mean(y))^2)) You are mixing models with intercepts with models without them. When you do this, it doesn't make sense to compare the variability about the mean. The model y = constant + error is not a special case of the model y = slope*x + error. Duncan Murdoch
I know that "-1" indicates to remove the intercept term. But my question is why intercept term CAN NOT be treated as a variable term as we place a column consited of 1 in the predictor matrix. If I stick to make a comparison between a model with intercept and one without intercept on adjusted r2 term, now I think the strategy is always to use another definition of r-square or adjusted r-square, in which r-square=sum((y.hat)^2)/sum((y)^2). Am I in the right way? Thanks Li Junjie 2007/5/19, Paul Lynch <plynchnlm@gmail.com>:> > In case you weren't aware, the meaning of the "-1" in y ~ x - 1 is to > remove the intercept term that would otherwise be implied. > --Paul > > On 5/17/07, Àî¿¡½Ü <klijunjie@gmail.com> wrote: > > Hi, everybody, > > > > 3 questions about R-square: > > ---------(1)----------- Does R2 always increase as variables are added? > > ---------(2)----------- Does R2 always greater than 1? > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared > > calculated? It is different from (r.square=sum((y.hat-mean > > (y))^2)/sum((y-mean(y))^2)) > > > > I will illustrate these problems by the following codes: > > ---------(1)----------- R2 doesn't always increase as variables are > added > > > > > x=matrix(rnorm(20),ncol=2) > > > y=rnorm(10) > > > > > > lm=lm(y~1) > > > y.hat=rep(1*lm$coefficients,length(y)) > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > [1] 2.646815e-33 > > > > > > lm=lm(y~x-1) > > > y.hat=x%*%lm$coefficients > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > [1] 0.4443356 > > > > > > ################ This is the biggest model, but its R2 is not the > biggest, > > why? > > > lm=lm(y~x) > > > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > [1] 0.2704789 > > > > > > ---------(2)----------- R2 can greater than 1 > > > > > x=rnorm(10) > > > y=runif(10) > > > lm=lm(y~x-1) > > > y.hat=x*lm$coefficients > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > [1] 3.513865 > > > > > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared > > calculated? It is different from (r.square=sum((y.hat-mean > > (y))^2)/sum((y-mean(y))^2)) > > > x=matrix(rnorm(20),ncol=2) > > > xx=cbind(rep(1,10),x) > > > y=x%*%c(1,2)+rnorm(10) > > > ### r2 calculated by lm(y~x) > > > lm=lm(y~x) > > > summary(lm)$r.squared > > [1] 0.9231062 > > > ### r2 calculated by lm(y~xx-1) > > > lm=lm(y~xx-1) > > > summary(lm)$r.squared > > [1] 0.9365253 > > > ### r2 calculated by me > > > y.hat=xx%*%lm$coefficients > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > [1] 0.9231062 > > > > > > Thanks a lot for any cue:) > > > > > > > > > > -- > > Junjie Li, klijunjie@gmail.com > > Undergranduate in DEP of Tsinghua University, > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Paul Lynch > Aquilent, Inc. > National Library of Medicine (Contractor) >-- Junjie Li, klijunjie@gmail.com Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]]
Hi, Mark What I want to do exactly is that I want to make a comparison between a model with intercept and one without intercept on adjusted r2 term, since we know that minimizing adjusted r-square is a variable selection strategy. I know there are other alternatives to conduct a variable selection, but I really have to try this one. Thanks. 2007/5/21, Leeds, Mark (IED) <Mark.Leeds@morganstanley.com>:> > Hi : You can put in the -1 and then create your own vector of 1's which > which will be a "variable" but I'm not sure if I undersrand what you want > and I don't think others do either because > I didn't see other responses. I don't mnean to be offensive or rude but > can you explain what you want to do more clearly. If you do that, I'm sure > you willg et more responses. > > > > -----Original Message----- > From: r-help-bounces@stat.math.ethz.ch [mailto: > r-help-bounces@stat.math.ethz.ch] On Behalf Of ??? > Sent: Saturday, May 19, 2007 2:54 AM > To: Paul Lynch > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] R2 always increases as variables are added? > > I know that "-1" indicates to remove the intercept term. But my question > is why intercept term CAN NOT be treated as a variable term as we place a > column consited of 1 in the predictor matrix. > > If I stick to make a comparison between a model with intercept and one > without intercept on adjusted r2 term, now I think the strategy is always to > use another definition of r-square or adjusted r-square, in which > r-square=sum((y.hat)^2)/sum((y)^2). > > Am I in the right way? > > Thanks > > Li Junjie > > > 2007/5/19, Paul Lynch <plynchnlm@gmail.com>: > > > > In case you weren't aware, the meaning of the "-1" in y ~ x - 1 is to > > remove the intercept term that would otherwise be implied. > > --Paul > > > > On 5/17/07, Àî¿¡½Ü <klijunjie@gmail.com> wrote: > > > Hi, everybody, > > > > > > 3 questions about R-square: > > > ---------(1)----------- Does R2 always increase as variables are > added? > > > ---------(2)----------- Does R2 always greater than 1? > > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared > > > calculated? It is different from (r.square=sum((y.hat-mean > > > (y))^2)/sum((y-mean(y))^2)) > > > > > > I will illustrate these problems by the following codes: > > > ---------(1)----------- R2 doesn't always increase as variables > > > are > > added > > > > > > > x=matrix(rnorm(20),ncol=2) > > > > y=rnorm(10) > > > > > > > > lm=lm(y~1) > > > > y.hat=rep(1*lm$coefficients,length(y)) > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > > [1] 2.646815e-33 > > > > > > > > lm=lm(y~x-1) > > > > y.hat=x%*%lm$coefficients > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > > [1] 0.4443356 > > > > > > > > ################ This is the biggest model, but its R2 is not the > > biggest, > > > why? > > > > lm=lm(y~x) > > > > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > > [1] 0.2704789 > > > > > > > > > ---------(2)----------- R2 can greater than 1 > > > > > > > x=rnorm(10) > > > > y=runif(10) > > > > lm=lm(y~x-1) > > > > y.hat=x*lm$coefficients > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > > [1] 3.513865 > > > > > > > > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared > > > calculated? It is different from (r.square=sum((y.hat-mean > > > (y))^2)/sum((y-mean(y))^2)) > > > > x=matrix(rnorm(20),ncol=2) > > > > xx=cbind(rep(1,10),x) > > > > y=x%*%c(1,2)+rnorm(10) > > > > ### r2 calculated by lm(y~x) > > > > lm=lm(y~x) > > > > summary(lm)$r.squared > > > [1] 0.9231062 > > > > ### r2 calculated by lm(y~xx-1) > > > > lm=lm(y~xx-1) > > > > summary(lm)$r.squared > > > [1] 0.9365253 > > > > ### r2 calculated by me > > > > y.hat=xx%*%lm$coefficients > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2)) > > > [1] 0.9231062 > > > > > > > > > Thanks a lot for any cue:) > > > > > > > > > > > > > > > -- > > > Junjie Li, klijunjie@gmail.com > > > Undergranduate in DEP of Tsinghua University, > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > -- > > Paul Lynch > > Aquilent, Inc. > > National Library of Medicine (Contractor) > > > > > > -- > Junjie Li, klijunjie@gmail.com > Undergranduate in DEP of Tsinghua University, > > [[alternative HTML version deleted]] > -------------------------------------------------------- > > This is not an offer (or solicitation of an offer) to buy/sell the > securities/instruments mentioned or an official confirmation. Morgan > Stanley may deal as principal in or own or act as market maker for > securities/instruments mentioned or may advise the issuers. This is not > research and is not from MS Research but it may refer to a research > analyst/research report. Unless indicated, these views are the author's and > may differ from those of Morgan Stanley research or others in the Firm. We > do not represent this is accurate or complete and we may not update > this. Past performance is not indicative of future returns. For additional > information, research reports and important disclosures, contact me or see > https://secure.ms.com/servlet/cls. You should not use e-mail to request, > authorize or effect the purchase or sale of any security or instrument, to > send transfer instructions, or to effect any other transactions. We cannot > guarantee that any such requests received via e-mail will be processed in a > timely manner. This communication is solely for the addressee(s) and may > contain confidential information. We do not waive confidentiality by > mistransmission. Contact me if you do not wish to receive these > communications. In the UK, this communication is directed in the UK to > those persons who are market counterparties or intermediate customers (as > defined in the UK Financial Services Authority's rules). >-- Junjie Li, klijunjie@gmail.com Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]]
The answer to your question three is that the calculation of r-squared
in summary.lm does depend on whether or not an intercept is included in
the model. (Another part of the reason for you puzzlement is, I think,
that you are computing R-squared as SSR/SST, which is only valid when
when the model has an intercept).
The code is in summary.lm, here are the relevant excerpts (assuming your
model does not have weights):
r <- z$residuals
f <- z$fitted
w <- z$weights
if (is.null(w)) {
mss <- if (attr(z$terms, "intercept"))
sum((f - mean(f))^2)
else sum(f^2)
rss <- sum(r^2)
}
...
ans$r.squared <- mss/(mss + rss)
If you want to compare models with and without an intercept based on
R^2, then I suspect it's most appropriate to use the version of R^2 that
does not use a mean.
It's also worthwhile thinking about what you are actually doing. I find
the most intuitive definition of R^2
(http://en.wikipedia.org/wiki/R_squared) is
R2 = 1 - SSE / SST
where SSE = sum_i (yhat_i - y_i)^2, (sum of errors in predictions for
you model)
and SST = sum_i (y_i - mean(y))^2 (sum of errors in predictions for an
intercept-only model)
This means that the standard definition of R2 effectively compares the
model with an intercept-only model. As the error in predictions goes
down, R2 goes up, and the model that uses the mean(y) as a prediction
(i.e., the intercept-only model) provides a scale for these errors.
If you think or know that the true mean of y is zero then it may be
appropriate to compare against a zero model rather than an
intercept-only model (in SST). And if the sample mean of y is quite
different from zero, and you compare a no-intercept model against an
intercept-only model, then you're going to get results that are not
easily interpreted.
Note that a common way of expressing and computing R^2 is as SSR/SST
(which you used). (Where SSR = sum_i (yhat_i - mean(y))^2 ). However,
this is only valid when the model has an intercept (i.e., SSR/SST = 1 -
SSE/SST ONLY when the model has an intercept.)
Here's some examples, based on your example:
> set.seed(1)
> data <- data.frame(x1=rnorm(10), x2=rnorm(10), y=rnorm(10), I=1)
>
> lm1 <- lm(y~1, data=data)
> summary(lm1)$r.squared
[1] 0
> y.hat <- fitted(lm1)
> sum((y.hat-mean(data$y))^2)/sum((data$y-mean(data$y))^2)
[1] 5.717795e-33
>
> # model with no intercept
> lm2 <- lm(y~x1+x2-1, data=data)
> summary(lm2)$r.squared
[1] 0.6332317
> y.hat <- fitted(lm2)
> # no-intercept version of R^2 (2 ways to compute)
> 1-sum((y.hat-data$y)^2)/sum((data$y)^2)
[1] 0.6332317
> sum((y.hat)^2)/sum((data$y)^2)
[1] 0.6332317
> # standard (assuming model has intercept) computations for R^2:
> SSE <- sum((y.hat - data$y)^2)
> SST <- sum((data$y - mean(data$y))^2)
> SSR <- sum((y.hat - mean(data$y))^2)
> 1 - SSE/SST
[1] 0.6252577
> # Note that SSR/SST != 1 - SSE/SST (because the model doesn't have an
intercept)
> SSR/SST
[1] 0.6616612
>
> # model with intercept included in data
> lm3 <- lm(y~x1+x2+I-1, data=data)
> summary(lm3)$r.squared
[1] 0.6503186
> y.hat <- fitted(lm3)
> # no-intercept version of R^2 (2 ways to compute)
> 1-sum((y.hat-data$y)^2)/sum((data$y)^2)
[1] 0.6503186
> sum((y.hat)^2)/sum((data$y)^2)
[1] 0.6503186
> # standard (assuming model has intercept) computations for R^2:
> SSE <- sum((y.hat - data$y)^2)
> SST <- sum((data$y - mean(data$y))^2)
> SSR <- sum((y.hat - mean(data$y))^2)
> 1 - SSE/SST
[1] 0.6427161
> SSR/SST
[1] 0.6427161
>
>
hope this helps,
Tony Plate
Disclaimer: I too do not have any degrees in statistics, but I'm 95%
sure the above is mostly correct :-) If there are any major mistakes,
I'm sure someone will point them out.
??? wrote:> Hi, everybody,
>
> 3 questions about R-square:
> ---------(1)----------- Does R2 always increase as variables are added?
> ---------(2)----------- Does R2 always greater than 1?
> ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> calculated? It is different from (r.square=sum((y.hat-mean
> (y))^2)/sum((y-mean(y))^2))
>
> I will illustrate these problems by the following codes:
> ---------(1)----------- R2 doesn't always increase as variables are
added
>
>> x=matrix(rnorm(20),ncol=2)
>> y=rnorm(10)
>>
>> lm=lm(y~1)
>> y.hat=rep(1*lm$coefficients,length(y))
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 2.646815e-33
>> lm=lm(y~x-1)
>> y.hat=x%*%lm$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.4443356
>> ################ This is the biggest model, but its R2 is not the
biggest,
> why?
>> lm=lm(y~x)
>> y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.2704789
>
>
> ---------(2)----------- R2 can greater than 1
>
>> x=rnorm(10)
>> y=runif(10)
>> lm=lm(y~x-1)
>> y.hat=x*lm$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 3.513865
>
>
> ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> calculated? It is different from (r.square=sum((y.hat-mean
> (y))^2)/sum((y-mean(y))^2))
>> x=matrix(rnorm(20),ncol=2)
>> xx=cbind(rep(1,10),x)
>> y=x%*%c(1,2)+rnorm(10)
>> ### r2 calculated by lm(y~x)
>> lm=lm(y~x)
>> summary(lm)$r.squared
> [1] 0.9231062
>> ### r2 calculated by lm(y~xx-1)
>> lm=lm(y~xx-1)
>> summary(lm)$r.squared
> [1] 0.9365253
>> ### r2 calculated by me
>> y.hat=xx%*%lm$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.9231062
>
>
> Thanks a lot for any cue:)
>
>
>
>