Simon Anders
2011-Jan-21 18:13 UTC
[Rd] Possible bug in Spearman correlation with use="pairwise.complete.obs"
Hi, I have just encountered a strange behaviour from 'cor' with regards to the treatment of NAs when calculating Spearman correlations. I guess it is a subtle bug. If I understand the help page correctly, the two modes 'complete.obs' and 'pairwise.complete.obs' specify how to deal with correlation coefficients when calculating a correlation _matrix_. When calculating a single (scalar) correlation coefficient for two data vectors x and y, both should give the same result. For Pearson correlation, this is in fact the case:> x <- runif( 10 ) > y <- runif( 10 ) > y[5] <- NA> cor( x, y, use="complete.obs" )[1] 0.407858> cor( x, y, use="pairwise.complete.obs" )[1] 0.407858 For Spearman correlation, we do NOT get the same results> cor( x, y, method="spearman", use="complete.obs" )[1] 0.3416009> cor( x, y, method="spearman", use="pairwise.complete.obs" )[1] 0.3333333 To see the likely reason for this possible bug, observe:> goodobs <- !is.na(x) & !is.na(y)> cor( rank(x)[goodobs], rank(y)[goodobs] )[1] 0.3416009> cor( rank(x[goodobs]), rank(y[goodobs]) )[1] 0.3333333 I would claim that only the calculation resulting in 0.3333 is a proper Spearman correlation, while the line resulting in 0.3416 is not. After all, the following is not a complete set of ranks because there are 9 observations, numbered from 1 to 10, skipping the 3:> rank(x)[goodobs][1] 10 6 8 7 4 5 1 9 2 Would you hence agree that 'method="spearman"' with 'use="pairwise.complete.obs"' is incorrect? Cheers Simon> sessionInfo()R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] pspearman_0.2-5 SuppDists_1.1-8 loaded via a namespace (and not attached): [1] tools_2.12.0 +--- | Dr. Simon Anders, Dipl.-Phys. | European Molecular Biology Laboratory (EMBL), Heidelberg | office phone +49-6221-387-8632 | preferred (permanent) e-mail: sanders at fs.tum.de
Apparently Analagous Threads
- Wrong result with cor(x, y, method="spearman", use="complete.obs") with NA's???
- bug? in stats::cor for use=complete.obs with NAs
- cor(..., method="spearman") or cor(..., method="kendall") (PR#6641)
- strange behavior of cor() with pairwise.complete.obs
- Incorrect matrix of spearman correlations .... in 64-bit Linux ... (PR#9568)