Baoqiang Cao
2011-May-16 21:03 UTC
[R] about spearman and kendal correlation coefficient calculation in "cor"
Hi, I have the following two measurements stored in mat:> print(mat)[,1] [,2] [1,] -14.80976 -265.786 [2,] -14.92417 -54.724 [3,] -13.92087 -58.912 [4,] -9.11503 -115.580 [5,] -17.05970 -278.749 [6,] -25.23313 -219.513 [7,] -19.62465 -497.873 [8,] -13.92087 -659.486 [9,] -14.24629 -131.680 [10,] -20.81758 -604.961 [11,] -15.32194 -18.735 To calculate the ranking correlation, I used "cor":> cor(mat[,1], mat[,2], method="spearman")[1] 0.2551259> cor(mat[,1], mat[,2], method="kendal")[1] 0.1834940 However, when I tried to reproduce the two correlation coefficients by following their "defination", I got different results than from "cor". For Spearman, I got: od1 = order(mat[,1], decreasing=T) od2= order(mat[,2], decreasing=T)> 1-6*sum((od1-od2)^2)/(length(od1)^3-length(od1))[1] -0.2909091 This is different with from "cor", which is 0.255. For Kendal, I got: accord=0 disaccord=0 experi=mat[,1] target=mat[,2] N= length(experi) for(i in 1:(N-1)) { for(j in (i+1):N) { if((target[i] < target[j]) && (experi[i] < experi[j])) { accord=accord+1 } else if ((target[i] > target[j]) && (experi[i] > experi[j])) { disaccord=disaccord+1 } } }> (accord-disaccord)/(N*(N-1)/2)[1] -0.2181818 This is also different with from "cor", which is 0.183. Anybody could help me out explaining the right answer? Thanks in advance! Baoqiang
Thomas Lumley
2011-May-16 21:46 UTC
[R] about spearman and kendal correlation coefficient calculation in "cor"
On Tue, May 17, 2011 at 9:03 AM, Baoqiang Cao <bqcaomail at gmail.com> wrote:> Hi, > > I have the following two measurements stored in mat: > >> print(mat) > ? ? ? ? ? [,1] ? ? [,2] > ?[1,] -14.80976 -265.786 > ?[2,] -14.92417 ?-54.724 > ?[3,] -13.92087 ?-58.912 > ?[4,] ?-9.11503 -115.580 > ?[5,] -17.05970 -278.749 > ?[6,] -25.23313 -219.513 > ?[7,] -19.62465 -497.873 > ?[8,] -13.92087 -659.486 > ?[9,] -14.24629 -131.680 > [10,] -20.81758 -604.961 > [11,] -15.32194 ?-18.735 > > To calculate the ranking correlation, I used "cor": >> cor(mat[,1], mat[,2], method="spearman") > [1] 0.2551259 >> cor(mat[,1], mat[,2], method="kendal") > [1] 0.1834940 > > However, when I tried to reproduce the two correlation coefficients by > following their "defination", I got different results than from "cor". > For Spearman, I got: > > od1 = order(mat[,1], decreasing=T) > od2= order(mat[,2], decreasing=T) >> 1-6*sum((od1-od2)^2)/(length(od1)^3-length(od1)) > [1] -0.2909091You've confused order() and rank(), I think.> This is different with from "cor", which is 0.255. > > For Kendal, I got: > > accord=0 > disaccord=0 > > experi=mat[,1] > target=mat[,2] > N= length(experi) > for(i in 1:(N-1)) { > ? for(j in (i+1):N) { > ? ? ?if((target[i] < target[j]) && (experi[i] < experi[j])) { > ? ? ? ? accord=accord+1 > ? ? ?} else if ((target[i] > target[j]) && (experi[i] > experi[j])) { > ? ? ? ? disaccord=disaccord+1 > ? ? ?} > ? } > }And this is just wrong. Both if() clauses find concordant pairs (Yi<Yj) & (Xi < Zj), and (Yi>Yj) & (Xi>Xj), but you're calling half of them discordant and not counting the actual discordant pairs. A quicker and correct computation would use outer(). In the absence of ties, outer(x,x,"<")==outer(y,y,"<") gets you TRUE for concordant and FALSE for discordant pairs, and sum() adds them up. This counts each pair twice, so you need to rescale. -thomas>> (accord-disaccord)/(N*(N-1)/2) > [1] -0.2181818 > > This is also different with from "cor", which is 0.183. > > Anybody could help me out explaining the right answer? Thanks in > advance! > > Baoqiang > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Thomas Lumley Professor of Biostatistics University of Auckland