I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results: data(cars) model <- lm(dist~speed,data=cars) coef(model) fitted.right <- model$fitted fitted.wrong <- -17+5*cars$speed When using the OLS fitted values, the lines below all return the same R2 value: 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2) cor(cars$dist,fitted.right)^2 (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2 However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values. 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) cor(x=cars$dist,y=fitted.wrong)^2 (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2 I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks. Jon [[alternative HTML version deleted]]
Hi, try cor(fitted.right,fitted.wrong) should give 1 as both are a linear function of speed! Hence cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the same. HTH d ________________________________________ Felad?: R-help [r-help-bounces at r-project.org] ; meghatalmazó: Jonathan Thayn [jthayn at ilstu.edu] K?ldve: 2015. febru?r 21. 22:42 To: r-help at r-project.org T?rgy: [R] Correlation question I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results: data(cars) model <- lm(dist~speed,data=cars) coef(model) fitted.right <- model$fitted fitted.wrong <- -17+5*cars$speed When using the OLS fitted values, the lines below all return the same R2 value: 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2) cor(cars$dist,fitted.right)^2 (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2 However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values. 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) cor(x=cars$dist,y=fitted.wrong)^2 (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2 I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks. Jon [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance. On Feb 21, 2015, at 4:36 PM, Kehl D?niel wrote:> Hi, > > try > > cor(fitted.right,fitted.wrong) > > should give 1 as both are a linear function of speed! Hence cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the same. > > HTH > d > ________________________________________ > Felad?: R-help [r-help-bounces at r-project.org] ; meghatalmazó: Jonathan Thayn [jthayn at ilstu.edu] > K?ldve: 2015. febru?r 21. 22:42 > To: r-help at r-project.org > T?rgy: [R] Correlation question > > I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results: > > data(cars) > model <- lm(dist~speed,data=cars) > coef(model) > fitted.right <- model$fitted > fitted.wrong <- -17+5*cars$speed > > > When using the OLS fitted values, the lines below all return the same R2 value: > > 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(cars$dist,fitted.right)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2 > > > However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values. > > 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(x=cars$dist,y=fitted.wrong)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2 > > > I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks. > > Jon > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.