JRitter@hhh.umn.edu
2004-Mar-03 21:01 UTC
[Rd] cor(..., method="spearman") or cor(..., method="kendall") (PR#6641)
Dear R maintainers, R is great. Now that I have that out of the way, I believe I have encountered a bug, or at least an inconsistency, in how Spearman and Kendall rank correlations are handled. Specifically, cor() and cor.test() do not produce the same answer when the data contain NAs. cor() treats the NAs as data, while cor.test() eliminates them. The option use="complete.obs" has no effect on cor() with method="s" or "k". An illustration follows. I'm running R for Windows, version 1.81 on a Pentium 4, Windows 2000. Regards, Joe Ritter #================================================================== > x = c(1,2,NA,3,5,88,NA) > y = c(3,8,4,7,1,12,NA) > cor(x,y,method="s") [1] 0.4642857 > cor(x,y,method="s",use="c") [1] 0.4642857 > cor.test(x,y,method="s") Spearman's rank correlation rho data: x and y S = 14, p-value = 0.6833 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.3 > cor(na.omit(data.frame(x,y)),method="s",use="c") x y x 1.0 0.3 y 0.3 1.0 > cor(x,y,method="k") [1] 0.3333333 > cor(x,y,method="k",use="complete.obs") [1] 0.3333333 > cor.test(x,y,method="k") Kendall's rank correlation tau data: x and y T = 6, p-value = 0.8167 alternative hypothesis: true tau is not equal to 0 sample estimates: tau 0.2
Peter Dalgaard
2004-Mar-03 23:44 UTC
[Rd] cor(..., method="spearman") or cor(..., method="kendall") (PR#6641)
JRitter@hhh.umn.edu writes:> Dear R maintainers, > > R is great. Now that I have that out of the way, I believe I have > encountered a bug, or at least an inconsistency, in how Spearman and > Kendall rank correlations are handled. Specifically, cor() and > cor.test() do not produce the same answer when the data contain NAs.We know... PR#6448 is the same thing. (The problem is that rank() follows sort() which by default sorts NA's to the end of the sorted vector. Thus NA's get a high rank and if both x and y have NA at the same time, a high spearman correlation is calculated.) It is fixed in the patched R version and also in the development sources (soon to be 1.9.0). -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Possibly Parallel Threads
- PCA with spearman and kendall correlations
- about spearman and kendal correlation coefficient calculation in "cor"
- R/S-Plus/SAS yield different results for Kendall-tau and Spearman nonparametric regression
- Incorrect Kendall's tau for ordered variables (PR#14207)
- Bug in cor.test - Spearman