thr3ads.net - R devel - [Rd] Rsquared bug lm() (PR#10516) [Dec 2007]

If this information is useful, please help other people find it:
Share via:

lieven.clement at gmail.com

2007-Dec-14 13:10 UTC

[Rd] Rsquared bug lm() (PR#10516)

Full_Name: lieven clement
Version:  R version 2.4.0 Patched (2006-11-25 r39997)
OS: i486-pc-linux-gnu
Submission from: (NULL) (157.193.193.180)


summary.lm() does not calculate R?? accurately for models without intercepts if
one of the predictor variables is a factor.
In order to avoid one of the factor levels to be considered as a reference class
you can use the -1 option in a formula. When you use this, R?? is not correctly
calculated.
>  x1<-rnorm(100)
> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
> y<-10*x1+x2+rnorm(100,0,4)
> x2<-as.factor(x2)
> lmtest<-lm(y~-1+x1+x2)
> summary(lmtest)$r.sq
[1] 0.9650201> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)[1] 0.9342672

The R squared by summary is calculated as> 1-sum(lmtest$res^2)/sum((y)^2)[1] 0.9650201
apparently because lm.summary assumes the mean of y to be zero.

In case of an intercept model everything seems ok> lmtest<-lm(y~x1+x2)
> summary(lmtest)$r.sq
[1] 0.9342672> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)[1] 0.9342672

Duncan Murdoch

2007-Dec-14 14:18 UTC

head link

[Rd] Rsquared bug lm() (PR#10516)

On 12/14/2007 8:10 AM, lieven.clement at gmail.com
wrote:> Full_Name: lieven clement
> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
> OS: i486-pc-linux-gnu
> Submission from: (NULL) (157.193.193.180)
> 
> 
> summary.lm() does not calculate R?? accurately for models without
intercepts if
> one of the predictor variables is a factor.
> In order to avoid one of the factor levels to be considered as a reference
class
> you can use the -1 option in a formula. When you use this, R?? is not
correctly
> calculated.
This is not a bug.  A model without an intercept should be using y=0 as 
a reference.

Duncan Murdoch
> 
>>  x1<-rnorm(100)
>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>> y<-10*x1+x2+rnorm(100,0,4)
>> x2<-as.factor(x2)
>> lmtest<-lm(y~-1+x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9650201
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
> 
> The R squared by summary is calculated as
>> 1-sum(lmtest$res^2)/sum((y)^2)
> [1] 0.9650201
> apparently because lm.summary assumes the mean of y to be zero.
> 
> In case of an intercept model everything seems ok
>> lmtest<-lm(y~x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9342672
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Thomas Lumley

2007-Dec-14 23:24 UTC

head link

[Rd] Rsquared bug lm() (PR#10516)

This is deliberate and as documented in ?summary.lm. It is not a bug.

     -thomas

On Fri, 14 Dec 2007 lieven.clement at gmail.com wrote:
> Full_Name: lieven clement
> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
> OS: i486-pc-linux-gnu
> Submission from: (NULL) (157.193.193.180)
>
>
> summary.lm() does not calculate R?? accurately for models without
intercepts if
> one of the predictor variables is a factor.
> In order to avoid one of the factor levels to be considered as a reference
class
> you can use the -1 option in a formula. When you use this, R?? is not
correctly
> calculated.
>
>>  x1<-rnorm(100)
>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>> y<-10*x1+x2+rnorm(100,0,4)
>> x2<-as.factor(x2)
>> lmtest<-lm(y~-1+x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9650201
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> The R squared by summary is calculated as
>> 1-sum(lmtest$res^2)/sum((y)^2)
> [1] 0.9650201
> apparently because lm.summary assumes the mean of y to be zero.
>
> In case of an intercept model everything seems ok
>> lmtest<-lm(y~x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9342672
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

lieven

2007-Dec-17 09:14 UTC

head link

[Rd] Rsquared bug lm() (PR#10516)

Basically, I used the without intercept to get an estimate for each of my
factor levels instead of using a reference class. So I use a kind of hidden
intercept.

I should have noticed that the behavior was documented in ?summary.lm.

Sorry for the inconvenience. 

Lieven



Duncan Murdoch-2 wrote:> 
> On 12/14/2007 8:10 AM, lieven.clement at gmail.com wrote:
>> Full_Name: lieven clement
>> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
>> OS: i486-pc-linux-gnu
>> Submission from: (NULL) (157.193.193.180)
>> 
>> 
>> summary.lm() does not calculate R?? accurately for models without
>> intercepts if
>> one of the predictor variables is a factor.
>> In order to avoid one of the factor levels to be considered as a
>> reference class
>> you can use the -1 option in a formula. When you use this, R?? is not
>> correctly
>> calculated.
> 
> This is not a bug.  A model without an intercept should be using y=0 as 
> a reference.
> 
> Duncan Murdoch
> 
>> 
>>>  x1<-rnorm(100)
>>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>>> y<-10*x1+x2+rnorm(100,0,4)
>>> x2<-as.factor(x2)
>>> lmtest<-lm(y~-1+x1+x2)
>>> summary(lmtest)$r.sq
>> [1] 0.9650201
>>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
>> [1] 0.9342672
>> 
>> The R squared by summary is calculated as
>>> 1-sum(lmtest$res^2)/sum((y)^2)
>> [1] 0.9650201
>> apparently because lm.summary assumes the mean of y to be zero.
>> 
>> In case of an intercept model everything seems ok
>>> lmtest<-lm(y~x1+x2)
>>> summary(lmtest)$r.sq
>> [1] 0.9342672
>>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
>> [1] 0.9342672
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
-- 
View this message in context:
http://www.nabble.com/Rsquared-bug-lm%28%29-%28PR-10516%29-tp14335791p14370172.html
Sent from the R devel mailing list archive at Nabble.com.

R devel - Dec 2007 - Rsquared bug lm() (PR#10516)

[Rd] Rsquared bug lm() (PR#10516)

[Rd] Rsquared bug lm() (PR#10516)

[Rd] Rsquared bug lm() (PR#10516)

[Rd] Rsquared bug lm() (PR#10516)