J. Sebastian Tello
2008-Oct-31 23:30 UTC
[R] Estimating R2 value for a unit-slope regression
Dear all, I am in need to estimate the amount of variation explained in a variable after simulations that produce a predictor which is in the same units as the dependent variable (numbers of species). Since the dependent and predictor variables are the same, I would think the most appropriate analysis would be a regression constrained to have an intercept of 0 and a slope of 1. I am trying to write a piece of R code to do this, but I am running into some problems, so I wanted to ask for your advice. I have inverstigated 3 approaches, and i am including a jpg file with the behaivour of these three pieces of code (R2 values as a function of the slope of the ols regression). I also included the regular R2 value from the ols regression for comparison (black symbols in figure). R2 for a regression can be calculated by the formula: R2= (SSY-SSE)/SSY; so: #1 Green symbols in figure SSY<-sum((y-mean(y))^2) SSE<-SSE<-sum((y-x)^2) R2<-(SSY-SSE)/SSY where y is the dependent and x the predictor variables respectively, of course. However, I am running into trouble because some times the residual sum of squares (SSE) is larger than the SS of the dependent variable (SSY) and I end up having negative R2s which of course make no sense. Another way to put the same formula is: R2=SSR/SSR+SSE; so: #2 Blue symbols in figure SSR<-sum((x-mean(y))^2) SSE<-sum((y-x)^2) R2<-SSR/(SSR+SSE) This approach behaves beter in the sense that it stays within the 0 to 1 expected range, it peaks when the slope is equal to 1, but its decal as the slope moves away from 1 is too slow, and for example when the slope is zero, according to this the R2 value is of about 0.4. The third and final approach that I have used is that described by Romdal et al. 2005. In which they use the second formula: R2=SSR/SSR+SSE, but they use (at least is how I understand it) the sum of squares of a regular OLS to estimate the sum of squares of the regression: so the corresponding code would be: #3 Red symbols in figure lm.y.x<-lm(y~x) SSR<-(deviance(lm(y~1))-sum((lm.y.x$residuals)^2)) SSE<-sum((y-x)^2) R2<-SSR/(SSR+SSE) This also of course stays within the expected range of 0 to 1, but has its own troubling behaivour, it does not peak at a slope of 1, there is an accelerated decrease at slopes less than 1, but not at slopes larger than 1, and it increases again at slopes less than 0 (like if negative associations between y and x would be better than a flat line, when the predictor is the same vairable as the dependen this does not make sense again). Any advice, recomendations for appropiate literature, or pieces of code would be highly appreciated. Best, Sebastian J. Sebasti?n Tello Department of Biological Sciences 285 Life Sciences Building Louisiana State University Baton Rouge, LA, 70803 (225) 578-4284 (office and lab.)