Sandra Hawthorne
2010-Jun-11 08:16 UTC
[R] Calculation of r squared from a linear regression
Hi, I'm trying to verify the calculation of coefficient of determination (r squared) for linear regression. I've done the calculation manually with a simple test case and using the definition of r squared outlined in summary(lm) help. There seems to be a discrepancy between the what R produced and the manual calculation. Does anyone know why this is so? What does the multiple r squared reported in summary(lm) represent? # The test case: x <- c(1,2,3,4) y <- c(1.6,4.4,5.5,8.3) dummy <- data.frame(x, y) fm1 <- lm(y ~ x-1, data = dummy) summary(fm1) betax <- fm1$coeff[x] * sd(x) / sd(y) # cd is coefficient of determination cd <- betax * cor(y, x) Thanks.
On 2010-06-11 2:16, Sandra Hawthorne wrote:> Hi, > > I'm trying to verify the calculation of coefficient of determination (r squared) for linear regression. I've done the calculation manually with a simple test case and using the definition of r squared outlined in summary(lm) help. There seems to be a discrepancy between the what R produced and the manual calculation. Does anyone know why this is so? What does the multiple r squared reported in summary(lm) represent? > > # The test case: > x<- c(1,2,3,4) > y<- c(1.6,4.4,5.5,8.3) > dummy<- data.frame(x, y) > fm1<- lm(y ~ x-1, data = dummy) > summary(fm1) > betax<- fm1$coeff[x] * sd(x) / sd(y) > # cd is coefficient of determination > cd<- betax * cor(y, x)The discrepancy is due to incorrect manual calculation. You're using (incorrectly, at that) formulas for simple regression _with an intercept term_ whereas you model has _no_ intercept term. What summary.lm reports is clearly described on the help page. See r.squared in the Value section. -Peter Ehlers
Bernardo Rangel Tura
2010-Jun-11 10:25 UTC
[R] Calculation of r squared from a linear regression
On Fri, 2010-06-11 at 01:16 -0700, Sandra Hawthorne wrote:> Hi, > > I'm trying to verify the calculation of coefficient of determination (r squared) for linear regression. I've done the calculation manually with a simple test case and using the definition of r squared outlined in summary(lm) help. There seems to be a discrepancy between the what R produced and the manual calculation. Does anyone know why this is so? What does the multiple r squared reported in summary(lm) represent? > > # The test case: > x <- c(1,2,3,4) > y <- c(1.6,4.4,5.5,8.3) > dummy <- data.frame(x, y) > fm1 <- lm(y ~ x-1, data = dummy) > summary(fm1) > betax <- fm1$coeff[x] * sd(x) / sd(y) > # cd is coefficient of determination > cd <- betax * cor(y, x) > > Thanks.Sorry Sandra, But the problem in yours script. Look this x <- c(1,2,3,4) y <- c(1.6,4.4,5.5,8.3) dummy <- data.frame(x, y) fm1 <- lm(y ~ x, data = dummy) summary(fm1) Call: lm(formula = y ~ x, data = dummy) Residuals: 1 2 3 4 -0.17 0.51 -0.51 0.17 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.3500 0.6584 -0.532 0.6481 x 2.1200 0.2404 8.818 0.0126 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.5376 on 2 degrees of freedom Multiple R-squared: 0.9749, Adjusted R-squared: 0.9624 F-statistic: 77.76 on 1 and 2 DF, p-value: 0.01262 betax <- fm1$coeff[2] * sd(x) / sd(y) # cd is coefficient of determination cd <- betax * cor(y, x) cd x 0.974924 The formula "fm1$coeff[2] * sd(x) / sd(y)" is valid only the model have a intercept... -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil
Hi, as pointed out previously, the problem is in using the canned routine (lm) without including an intercept term. Here is a working, generic example with commented code. #Simulate data x=rnorm(100) e=rnorm(100) y=x+e #Create X matrix with intercept X=cbind(1,x) #Projection matrix P=X%*%solve(t(X)%*%X)%*%t(X) #Fitted values fv=P%*%y #Run canned regression reg=lm(y~x) #Canned and hand computed fitted values cbind(fitted(reg),fv) #Are they all equal? all.equal(as.vector(as.numeric(fitted(reg))),as.vector(as.numeric(fv))) #This already implies that the R-squared is equal #Compute R-squared by hand R.sq=1-sum((y-fv)^2)/sum((y-mean(y))^2) #Is this equal to the R-squared from the canned routine? summary(reg)$r.squared==R.sq Daniel -- View this message in context: http://r.789695.n4.nabble.com/Calculation-of-r-squared-from-a-linear-regression-tp2251438p2255537.html Sent from the R help mailing list archive at Nabble.com.