thr3ads.net - R help - [R] Correlation question [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Thayn

2015-Feb-21 21:42 UTC

[R] Correlation question

I recently compared two different approaches to calculating the correlation of
two variables, and I cannot explain the different results:

data(cars)
model <- lm(dist~speed,data=cars)
coef(model)
fitted.right <- model$fitted
fitted.wrong <- -17+5*cars$speed


When using the OLS fitted values, the lines below all return the same R2 value:

1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(cars$dist,fitted.right)^2
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2


However, when I use my estimated parameters to find the fitted values,
"fitted.wrong", the first equation returns a much lower R2 value,
which I would expect since the fit is worse, but the other lines return the same
R2 that I get when using the OLS fitted values.

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(x=cars$dist,y=fitted.wrong)^2
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2


I'm sure I'm missing something simple, but can someone explain the
difference between these two methods of finding R2? Thanks.

Jon
	[[alternative HTML version deleted]]

Kehl Dániel

2015-Feb-21 22:36 UTC

head link

[R] Correlation question

Hi,

try

cor(fitted.right,fitted.wrong)

should give 1 as both are a linear function of speed! Hence
cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the
same.

HTH
d
________________________________________
Felad?: R-help [r-help-bounces at r-project.org] ; meghatalmaz&#243;:
Jonathan Thayn [jthayn at ilstu.edu]
K?ldve: 2015. febru?r 21. 22:42
To: r-help at r-project.org
T?rgy: [R] Correlation question

I recently compared two different approaches to calculating the correlation of
two variables, and I cannot explain the different results:

data(cars)
model <- lm(dist~speed,data=cars)
coef(model)
fitted.right <- model$fitted
fitted.wrong <- -17+5*cars$speed


When using the OLS fitted values, the lines below all return the same R2 value:

1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(cars$dist,fitted.right)^2
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2


However, when I use my estimated parameters to find the fitted values,
"fitted.wrong", the first equation returns a much lower R2 value,
which I would expect since the fit is worse, but the other lines return the same
R2 that I get when using the OLS fitted values.

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(x=cars$dist,y=fitted.wrong)^2
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2


I'm sure I'm missing something simple, but can someone explain the
difference between these two methods of finding R2? Thanks.

Jon
        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jonathan Thayn

2015-Feb-22 06:01 UTC

head link

[R] Correlation question

Of course! Thank you, I knew I was missing something painfully obvious. Its
seems, then, that this line

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)

is finding something other than the traditional correlation. I found this in a
lecture introducing correlation, but , now, I'm not sure what it is. It does
do a better job of showing that the fitted.wrong variable is not a good
prediction of the distance.



On Feb 21, 2015, at 4:36 PM, Kehl D?niel wrote:
> Hi,
> 
> try
> 
> cor(fitted.right,fitted.wrong)
> 
> should give 1 as both are a linear function of speed! Hence
cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the
same.
> 
> HTH
> d
> ________________________________________
> Felad?: R-help [r-help-bounces at r-project.org] ; meghatalmaz&#243;:
Jonathan Thayn [jthayn at ilstu.edu]
> K?ldve: 2015. febru?r 21. 22:42
> To: r-help at r-project.org
> T?rgy: [R] Correlation question
> 
> I recently compared two different approaches to calculating the correlation
of two variables, and I cannot explain the different results:
> 
> data(cars)
> model <- lm(dist~speed,data=cars)
> coef(model)
> fitted.right <- model$fitted
> fitted.wrong <- -17+5*cars$speed
> 
> 
> When using the OLS fitted values, the lines below all return the same R2
value:
> 
> 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(cars$dist,fitted.right)^2
>
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2
> 
> 
> However, when I use my estimated parameters to find the fitted values,
"fitted.wrong", the first equation returns a much lower R2 value,
which I would expect since the fit is worse, but the other lines return the same
R2 that I get when using the OLS fitted values.
> 
> 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(x=cars$dist,y=fitted.wrong)^2
>
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2
> 
> 
> I'm sure I'm missing something simple, but can someone explain the
difference between these two methods of finding R2? Thanks.
> 
> Jon
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Feb 2015 - Correlation question

[R] Correlation question

[R] Correlation question

[R] Correlation question