Full_Name: lieven clement Version: R version 2.4.0 Patched (2006-11-25 r39997) OS: i486-pc-linux-gnu Submission from: (NULL) (157.193.193.180) summary.lm() does not calculate R?? accurately for models without intercepts if one of the predictor variables is a factor. In order to avoid one of the factor levels to be considered as a reference class you can use the -1 option in a formula. When you use this, R?? is not correctly calculated.> x1<-rnorm(100) > x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) > y<-10*x1+x2+rnorm(100,0,4) > x2<-as.factor(x2) > lmtest<-lm(y~-1+x1+x2) > summary(lmtest)$r.sq[1] 0.9650201> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)[1] 0.9342672 The R squared by summary is calculated as> 1-sum(lmtest$res^2)/sum((y)^2)[1] 0.9650201 apparently because lm.summary assumes the mean of y to be zero. In case of an intercept model everything seems ok> lmtest<-lm(y~x1+x2) > summary(lmtest)$r.sq[1] 0.9342672> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)[1] 0.9342672
On 12/14/2007 8:10 AM, lieven.clement at gmail.com wrote:> Full_Name: lieven clement > Version: R version 2.4.0 Patched (2006-11-25 r39997) > OS: i486-pc-linux-gnu > Submission from: (NULL) (157.193.193.180) > > > summary.lm() does not calculate R?? accurately for models without intercepts if > one of the predictor variables is a factor. > In order to avoid one of the factor levels to be considered as a reference class > you can use the -1 option in a formula. When you use this, R?? is not correctly > calculated.This is not a bug. A model without an intercept should be using y=0 as a reference. Duncan Murdoch> >> x1<-rnorm(100) >> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) >> y<-10*x1+x2+rnorm(100,0,4) >> x2<-as.factor(x2) >> lmtest<-lm(y~-1+x1+x2) >> summary(lmtest)$r.sq > [1] 0.9650201 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > The R squared by summary is calculated as >> 1-sum(lmtest$res^2)/sum((y)^2) > [1] 0.9650201 > apparently because lm.summary assumes the mean of y to be zero. > > In case of an intercept model everything seems ok >> lmtest<-lm(y~x1+x2) >> summary(lmtest)$r.sq > [1] 0.9342672 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
This is deliberate and as documented in ?summary.lm. It is not a bug. -thomas On Fri, 14 Dec 2007 lieven.clement at gmail.com wrote:> Full_Name: lieven clement > Version: R version 2.4.0 Patched (2006-11-25 r39997) > OS: i486-pc-linux-gnu > Submission from: (NULL) (157.193.193.180) > > > summary.lm() does not calculate R?? accurately for models without intercepts if > one of the predictor variables is a factor. > In order to avoid one of the factor levels to be considered as a reference class > you can use the -1 option in a formula. When you use this, R?? is not correctly > calculated. > >> x1<-rnorm(100) >> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) >> y<-10*x1+x2+rnorm(100,0,4) >> x2<-as.factor(x2) >> lmtest<-lm(y~-1+x1+x2) >> summary(lmtest)$r.sq > [1] 0.9650201 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > The R squared by summary is calculated as >> 1-sum(lmtest$res^2)/sum((y)^2) > [1] 0.9650201 > apparently because lm.summary assumes the mean of y to be zero. > > In case of an intercept model everything seems ok >> lmtest<-lm(y~x1+x2) >> summary(lmtest)$r.sq > [1] 0.9342672 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Basically, I used the without intercept to get an estimate for each of my factor levels instead of using a reference class. So I use a kind of hidden intercept. I should have noticed that the behavior was documented in ?summary.lm. Sorry for the inconvenience. Lieven Duncan Murdoch-2 wrote:> > On 12/14/2007 8:10 AM, lieven.clement at gmail.com wrote: >> Full_Name: lieven clement >> Version: R version 2.4.0 Patched (2006-11-25 r39997) >> OS: i486-pc-linux-gnu >> Submission from: (NULL) (157.193.193.180) >> >> >> summary.lm() does not calculate R?? accurately for models without >> intercepts if >> one of the predictor variables is a factor. >> In order to avoid one of the factor levels to be considered as a >> reference class >> you can use the -1 option in a formula. When you use this, R?? is not >> correctly >> calculated. > > This is not a bug. A model without an intercept should be using y=0 as > a reference. > > Duncan Murdoch > >> >>> x1<-rnorm(100) >>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) >>> y<-10*x1+x2+rnorm(100,0,4) >>> x2<-as.factor(x2) >>> lmtest<-lm(y~-1+x1+x2) >>> summary(lmtest)$r.sq >> [1] 0.9650201 >>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) >> [1] 0.9342672 >> >> The R squared by summary is calculated as >>> 1-sum(lmtest$res^2)/sum((y)^2) >> [1] 0.9650201 >> apparently because lm.summary assumes the mean of y to be zero. >> >> In case of an intercept model everything seems ok >>> lmtest<-lm(y~x1+x2) >>> summary(lmtest)$r.sq >> [1] 0.9342672 >>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) >> [1] 0.9342672 >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- View this message in context: http://www.nabble.com/Rsquared-bug-lm%28%29-%28PR-10516%29-tp14335791p14370172.html Sent from the R devel mailing list archive at Nabble.com.