Hello R gurus, I want to calculate the Spearman rho between two ranked lists. I am getting results with cor.test that differ in comparison to my own spearman function: > my.spearman function(l1, l2) { if(length(l1) != length(l2)) stop("lists must have same length") r1 <- rank(l1) r2 <- rank(l2) dsq <- sapply(r1-r2,function(x) x^2) 1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1))) } Perhaps I'm doing something wrong in that code, but it's a pretty straightforward calculation, so it's hard to see what, especially with rank() handling the ties correctly. One example difference: > a [1] 0.112761940 0.130260949 -0.010567817 -0.411906701 0.004588443 [6] -0.034337846 -0.148082981 -0.243724351 0.186690390 0.408983820 > b [1] 8 13 14 15 5 7 8 2 19 19 > cor.test(a,b,method="spearman") Spearman's rank correlation rho data: a and b S = 85, p-value = 0.1544 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4878139 Warning message: p-values may be incorrect due to ties in: cor.test.default(a, b, method = "spearman") > my.spearman(a,b) [1] 0.4909091 Which, as you can see, isn't quite the same. And also: > c [1] 0 0 0 0 0 0 0 0 0 0 > cor.test(a,c,method="spearman") Spearman's rank correlation rho data: a and c S = NA, p-value = NA alternative hypothesis: true rho is not equal to 0 sample estimates: rho NA Warning message: The standard deviation is zero in: cor(x, y, na.method, method == "kendall") > my.spearman(a,c) [1] 0.5 Any suggestions as to what I'm doing wrong? Thanks in advance, -- William Morgan wmorgan at mitre dot org
>>>>> "William" == William T Morgan <wmorgan at mitre.org> >>>>> on 15 Mar 2004 16:37:08 -0500 writes:William> Hello R gurus, William> I want to calculate the Spearman rho between two ranked lists. I am William> getting results with cor.test that differ in comparison to my own William> spearman function: >> my.spearman William> function(l1, l2) { William> if(length(l1) != length(l2)) stop("lists must have same length") William> r1 <- rank(l1) William> r2 <- rank(l2) William> dsq <- sapply(r1-r2,function(x) x^2) William> 1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1))) William> } William> Perhaps I'm doing something wrong in that code, but it's a pretty William> straightforward calculation, so it's hard to see what, especially with William> rank() handling the ties correctly. Well, the "ties" in your example are really the "problem". The formula you use, 1 - 6 S(d^2) / (n^3 - n) ( d = r1 - r2 ; r{12} := rank(x{12}) ) is only equal to the more natural definition, cor(r1, r2), in the situation when there are no ties [plus in a few "lucky" situations with ties]. cor.test() and now cor(*, method = "spearman") in R have always used the correlation of the ranks. It seems that this needs to be documented, since you are right, the "1 - 6 S / (..)" formula is also in use as *definition* for Spearman's rank correlation. Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><