Does anyone know of a literature reference, or a piece of code that can help me calculate the amount of variation explained (R2 value), in a regression constrained to have a slope of 1 and an intercept of 0? Thanks! Sebastian J. Sebastián Tello Department of Biological Sciences 285 Life Sciences Building Louisiana State University Baton Rouge, LA, 70803 (225) 578-4284 (office and lab.) [[alternative HTML version deleted]]
On Mon, 3 Nov 2008, J. Sebastian Tello wrote:> Does anyone know of a literature reference, or a piece of code that can > help me calculate the amount of variation explained (R2 value), in a > regression constrained to have a slope of 1 and an intercept of 0?Sebastien, In the future, please follow the posting guide or use help.request() to craft a better posting to this list. Something like this is what you are after?> x <- rnorm(100) > y <- rnorm(100,x) > # unexplained > sum(residuals(lm(y~0+offset(x)))^2)/sum(y^2)[1] 0.500178> (sum(y^2) - sum( residuals( lm(y~0+offset(x)) )^2))/sum(y^2)[1] 0.499822>Of course, I could have finessed the use of lm(), but why pass up an opportunity to show how the formula language handles this? --- Be advised that this (fixing values of coefficients in tow models and then comparing them) is a tricky business. You can get 'explained' values that are not in [0,1], which is a source of confusion to many. You can use RSiteSearch("R2 intercept") to find threads on this. The usual distribution theory for nested linear models does not apply. (Read: do not try to compute a p-value unless you have the assistance of a statistician who can explain this sentence.) HTH, Chuck> > Thanks! > > Sebastian > > J. Sebasti?n Tello > > > Department of Biological Sciences > 285 Life Sciences Building > Louisiana State University > Baton Rouge, LA, 70803 > (225) 578-4284 (office and lab.) > > > > > [[alternative HTML version deleted]] > >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
On 4/11/2008, at 4:30 AM, J. Sebastian Tello wrote:> Does anyone know of a literature reference, or a piece of code that > can help me > calculate the amount of variation explained (R2 value), in a > regression constrained > to have a slope of 1 and an intercept of 0?The question is ``wrong''. The idea of ``amount of variation explained'' depends on decomposing the ``total sum of squares'' into two pieces --- the sum of squares of the residuals what is left over which is the sum of squares ``explained by the model''. In the usual regression setting this is sum((y_i - ybar)^2) = sum((y_i - yhat_i)^2) + sum((yhat_i - ybar)^2) or SST = SSE + SSR (T for total, E for error, R for regression) where yhat_i results from fitting the model by least squares. The R-squared value is SSR/SST or 1 - SSE/SST. (Or this quantity time 100%.) However if you constrain the slope to be 1 and the intercept to be 0 then yhat_i = x_i and the forgoing identity does not hold. The problem is that the ``sum of squares left over'' can be negative (and hence not a sum of squares). I.e. in this case you have SST = SSE + something where ``something'' is not necessarily a sum of squares. Thus you can have the ``amount of variation explained'' being negative! E.g. x_1 = -1, x_2 = 1, y_1 = 1, y_2 = -1. In this setting the ``total sum of squares'' is 2 and the ``residual sum of squares'' is 4, so the ``amount of variation explained by the model'' is -2, or you could say that R-squared is -100%. (!!!) Bottom line --- the R-squared concept makes no sense in this context. The R-squared concept is at best dubious, and should be used, if at all, only in the completely orthodox setting. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}