apw at us.ibm.com
2009-May-26 09:50 UTC
[Rd] Covariance calculation gives different answer than Excel (PR#13720)
Full_Name: Amos Waterland Version: 2.8.1 OS: Ubuntu Linux Submission from: (NULL) (68.175.8.163) I calculated the covariance for a small data set as follows: X <- c(1,2,3,4) Y <- c(3,3,4,3) cov(X,Y) [1] 0.1666667 But when doing the computation with pencil and paper I get: ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/4 [1] 0.125 Microsoft Excel 2003 covar() also gives 0.125. I suspect that you guys are doing something like this: ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/3 [1] 0.1666667 That is, you are dividing by N minus 1 rather than N. So who is correct?
Duncan Murdoch
2009-May-27 07:58 UTC
[Rd] Covariance calculation gives different answer than Excel (PR#13720)
On 26/05/2009 5:50 AM, apw at us.ibm.com wrote:> Full_Name: Amos Waterland > Version: 2.8.1 > OS: Ubuntu Linux > Submission from: (NULL) (68.175.8.163) > > > I calculated the covariance for a small data set as follows: > > X <- c(1,2,3,4) > Y <- c(3,3,4,3) > cov(X,Y) > [1] 0.1666667 > > But when doing the computation with pencil and paper I get: > > ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/4 > [1] 0.125 > > Microsoft Excel 2003 covar() also gives 0.125. I suspect that you guys are > doing something like this: > > ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/3 > [1] 0.1666667 > > That is, you are dividing by N minus 1 rather than N. So who is correct?Please don't claim something is a bug when you are not sure. cov() is clearly documented to use n-1 in the denominator. Excel (for their own reasons) uses n, which leads to surprises like var(x) != covar(x, x), because they use n-1 in their variance calculation. Duncan Murdoch