Dear R list, I have one very elementary question regrading correlation between two variables. x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43)> cov(x, y)[1] -2.428571 However, if I try to calculate the covariance using the formula as covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 or covariance = sum(x*y)/8-(mean(x)*mean(y)) gives covariance = 2.125 I am not able to figure out where I am going wrong w.r.t. the covariance formula. Kindly guide. Regards Vincy [[alternative HTML version deleted]]
well, you don't have the correct denominator, i.e., n-1, with n denoting the sample size. Have a look at the *Details* section of the online help file for cov(), and try also sum((x-mean(x))*(y-mean(y)))/7 cov(x, y) I hope it helps. Best, Dimitris On 8/23/2011 1:18 PM, Vincy Pyne wrote:> Dear R list, I have one very elementary question regrading correlation between two variables. > > x = c(44,46,46,47,45,43,45,44) > y = c(44,43,41,41,46,48,44,43) > >> cov(x, y) > [1] -2.428571 > > However, if I try to calculate the covariance using the formula as > > > covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired obs. = 8 > > or > > covariance = sum(x*y)/8-(mean(x)*mean(y)) > > gives > > covariance = 2.125 > > I am not able to figure out where I am going wrong w.r.t. the covariance formula. Kindly guide. > > Regards > > Vincy > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Divide by 8 leads biased estimator of covariance. R cov function calculates unbiased estimator(divide by (sample size)-1). Regards, Kohta -- View this message in context: http://r.789695.n4.nabble.com/Correlation-discrepancy-tp3762457p3762491.html Sent from the R help mailing list archive at Nabble.com.
Dear Mr. Dimitris and Mr Harding, thanks a lot for your guidance. It will be interesting to find out how the Excel deals with this formula. I will try it. Thanks again. Regards Ashok --- On Tue, 8/23/11, ted.harding@wlandres.net <ted.harding@wlandres.net> wrote: From: ted.harding@wlandres.net <ted.harding@wlandres.net> Subject: Re: [R] Correlation discrepancy To: r-help@r-project.org Cc: "Vincy Pyne" <vincy_pyne@yahoo.ca> Received: Tuesday, August 23, 2011, 11:38 AM In addition, something has gone wrong, Vincy, with your data x,y between evaluating cov(x,y) and evaluating your explicit formula. If I repeat your commands: x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) # [1] -2.428571 sum((x-mean(x))*(y-mean(y)))/8 # [1] -2.125 which has the right sign and, when changed to incorporate the correct denomonator (n-1 = 7) as suggested by Dimitris: sum((x-mean(x))*(y-mean(y)))/7 # [1] -2.428571 gives exact agreement. With regard to your second formula, this should correspondingly be: sum(x*y)/7 - (mean(x)*mean(y))*8/7 # [1] -2.428571 again agreeing exactly. Your result:>> covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired >> obs. = 8 >> >> or >> >> covariance = sum(x*y)/8-(mean(x)*mean(y)) >> >> gives >> >> covariance = 2.125agrees in numerical magnitude with the "1/8" form, but has the wrong sign. Or maybe you simply mis-typed "-2.125" as "2.125". Hoping this helps, Ted. On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote:> well, you don't have the correct denominator, i.e., n-1, > with n denoting the sample size. Have a look at the *Details* > section of the online help file for cov(), and try also > > sum((x-mean(x))*(y-mean(y)))/7 > cov(x, y) > > > I hope it helps. > > Best, > Dimitris > > > On 8/23/2011 1:18 PM, Vincy Pyne wrote: >> Dear R list, I have one very elementary question regrading correlation >> between two variables. >> >> x = c(44,46,46,47,45,43,45,44) >> y = c(44,43,41,41,46,48,44,43) >> >>> cov(x, y) >> [1] -2.428571 >> >> However, if I try to calculate the covariance using the formula as >> >> >> covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired >> obs. 8 >> >> or >> >> covariance = sum(x*y)/8-(mean(x)*mean(y)) >> >> gives >> >> covariance = 2.125 >> >> I am not able to figure out where I am going wrong w.r.t. the >> covariance formula. Kindly guide. >> >> Regards >> >> Vincy >> >> >> >> >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASEdo read the posting guide>> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Dimitris Rizopoulos > Assistant Professor > Department of Biostatistics > Erasmus University Medical Center > > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands > Tel: +31/(0)10/7043478 > Fax: +31/(0)10/7043014 > Web: http://www.erasmusmc.nl/biostatistiek/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding@wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 23-Aug-11 Time: 12:38:36 ------------------------------ XFMail ------------------------------ [[alternative HTML version deleted]]
Dear Mr Dimitris and Mr Harding, by mistake I have typed my colleagues name (i.e. Ashok) while thanking you. Please excuse me for that. Regards Vincy --- On Tue, 8/23/11, ted.harding@wlandres.net <ted.harding@wlandres.net> wrote: From: ted.harding@wlandres.net <ted.harding@wlandres.net> Subject: Re: [R] Correlation discrepancy To: r-help@r-project.org Cc: "Vincy Pyne" <vincy_pyne@yahoo.ca> Received: Tuesday, August 23, 2011, 11:38 AM In addition, something has gone wrong, Vincy, with your data x,y between evaluating cov(x,y) and evaluating your explicit formula. If I repeat your commands: x = c(44,46,46,47,45,43,45,44) y = c(44,43,41,41,46,48,44,43) cov(x, y) # [1] -2.428571 sum((x-mean(x))*(y-mean(y)))/8 # [1] -2.125 which has the right sign and, when changed to incorporate the correct denomonator (n-1 = 7) as suggested by Dimitris: sum((x-mean(x))*(y-mean(y)))/7 # [1] -2.428571 gives exact agreement. With regard to your second formula, this should correspondingly be: sum(x*y)/7 - (mean(x)*mean(y))*8/7 # [1] -2.428571 again agreeing exactly. Your result:>> covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired >> obs. = 8 >> >> or >> >> covariance = sum(x*y)/8-(mean(x)*mean(y)) >> >> gives >> >> covariance = 2.125agrees in numerical magnitude with the "1/8" form, but has the wrong sign. Or maybe you simply mis-typed "-2.125" as "2.125". Hoping this helps, Ted. On 23-Aug-11 11:25:15, Dimitris Rizopoulos wrote:> well, you don't have the correct denominator, i.e., n-1, > with n denoting the sample size. Have a look at the *Details* > section of the online help file for cov(), and try also > > sum((x-mean(x))*(y-mean(y)))/7 > cov(x, y) > > > I hope it helps. > > Best, > Dimitris > > > On 8/23/2011 1:18 PM, Vincy Pyne wrote: >> Dear R list, I have one very elementary question regrading correlation >> between two variables. >> >> x = c(44,46,46,47,45,43,45,44) >> y = c(44,43,41,41,46,48,44,43) >> >>> cov(x, y) >> [1] -2.428571 >> >> However, if I try to calculate the covariance using the formulaas>> >> >> covariance = sum((x-mean(x))*(y-mean(y)))/8 # no of of paired >> obs. = 8 >> >> or >> >> covariance = sum(x*y)/8-(mean(x)*mean(y)) >> >> gives >> >> covariance = 2.125 >> >> I am not able to figure out where I am going wrong w.r.t. the >> covariance formula. Kindly guide. >> >> Regards >> >> Vincy >> >> >> >> >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> ______________________________________________ >> R-help@r-project.org mailinglist>> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Dimitris Rizopoulos > Assistant Professor > Department of Biostatistics > Erasmus University Medical Center > > Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands > Tel: +31/(0)10/7043478 > Fax: +31/(0)10/7043014 > Web: http://www.erasmusmc.nl/biostatistiek/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding@wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 23-Aug-11 Time: 12:38:36 ------------------------------ XFMail ------------------------------ [[alternative HTML version deleted]]