Dennis Fisher
2005-Aug-13 00:11 UTC
[R] R/S-Plus/SAS yield different results for Kendall-tau and Spearman nonparametric regression
Colleagues,
I ran some nonparametric regressions in R (run in RedHat Linux), then
a colleague repeated the analyses in SAS. When we obtained different
results, I tested S-Plus (same Linux box). And, got yet different
results. I replicated the results with a small dataset:
DATA:
37.5
23
37.5
13
25
16
25
12
100
15
12.5
19
50
20
100
13
100
10
100
10
100
16
50
10
87.5
13
100
15
50
11
100
14
50
19
87.5
20
100
20
37.5
20
100
13
100
14
50
15
100
17
100
14
Code for S-Plus and R:
DATA <- read.table("NonparametricRegressionData")
cor.test(DATA[,1], DATA[,2], method = "spearman")
cor.test(DATA[,1], DATA[,2], method = "kendall")
-------------------------------------------------------------
S-Plus (version 6)
> cor.test(DATA[,1], DATA[,2], method = "spearman")
Spearman's rank correlation
data: DATA[, 1] and DATA[, 2]
normal-z = -1.1028, p-value = 0.2701
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.2247199
> cor.test(DATA[,1], DATA[,2], method = "kendall")
Kendall's rank correlation tau
data: DATA[, 1] and DATA[, 2]
normal-z = -1.0583, p-value = 0.2899
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.14
------------------------------------------------------------
R 2.1.1
> cor.test(DATA[,1], DATA[,2], method = "spearman")
Spearman's rank correlation rho
data: DATA[, 1] and DATA[, 2]
S = 3184, p-value = 0.2791
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.2247199
Warning message:
p-values may be incorrect due to ties in: cor.test.default(DATA[, 1],
DATA[, 2], method = "spearman")
> cor.test(DATA[,1], DATA[,2], method = "kendall")
Kendall's rank correlation tau
data: DATA[, 1] and DATA[, 2]
z = -1.1948, p-value = 0.2322
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.1705247
Warning message:
Cannot compute exact p-value with ties in: cor.test.default(DATA[,
1], DATA[, 2], method = "kendall")
------------------------------------------
SAS
Spearman:
Rho: -0.22472
P: 0.2802
Kendall:
Rho: -0.17052
P: 0.2899
Each of the programs yields some differences, possibly because of how
ties are handled (R warns about this). Can anyone enlighten me?
Dennis Fisher
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-415-564-2220
www.PLessThan.com
[[alternative HTML version deleted]]
Peter Dalgaard
2005-Aug-15 03:18 UTC
[R] R/S-Plus/SAS yield different results for Kendall-tau and Spearman nonparametric regression
Dennis Fisher <fisher at plessthan.com> writes:> Colleagues, > I ran some nonparametric regressions in R (run in RedHat Linux), then > a colleague repeated the analyses in SAS. When we obtained different > results, I tested S-Plus (same Linux box). And, got yet different > results. I replicated the results with a small dataset: > > DATA:(They came across somewhat garbled, but we'll believe you...) ...> Each of the programs yields some differences, possibly because of how > ties are handled (R warns about this). Can anyone enlighten me?Ties are certainly involved in the Spearman case. There are more accurate expressions for the variance of the test statistic in the tied case, than the formula that R is using. As you see, the difference is not exactly huge (at least for a small number of ties), but it is something that we should get around to fixing. I assume that there is a similar issue with Kendall's tau. In addition, S-PLUS appears to modify the actual definition of the test statistic, which might be a matter of taste. (K's tau relies on counting concordant and discordant pairs relative to the total number of pairs, and with ties, some pairs will be undecided. You can either discard such pairs or count them as zeros. S-PLUS appears to be doing the latter. A quick test is to notice that x <- y <- rep(0:1,4) gives a tau that is less than 1 in S-PLUS but gives 1 in R.) -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907