thr3ads.net - R devel - [Rd] cor.test(method = spearman, exact = TRUE) not exact (PR#14095) [Nov 2009]

If this information is useful, please help other people find it:
Share via:

dsimcha at gmail.com

2009-Nov-30 03:00 UTC

[Rd] cor.test(method = spearman, exact = TRUE) not exact (PR#14095)

Full_Name: David Simcha
Version: 2.10
OS: Windows XP Home
Submission from: (NULL) (173.3.208.5)

> a <- c(1:10)
> b <- c(1:10)
> cor.test(a, b, method = "spearman", alternative =
"greater", exact = TRUE)
        Spearman's rank correlation rho

data:  a and b 
S = 0, p-value < 2.2e-16
alternative hypothesis: true rho is greater than 0 
sample estimates:
rho 
  1 
> 1 / factorial(10)[1] 2.755732e-07

Since we have perfect rank correlation and only one permutation out of 10! could
give this for N = 10, the p-value should be 1/10!.  Reading the code in prho.c,
it appears that the "exact" calculation uses the Edgeworth
approximation for N >
9.  This makes sense because, for similar examples with N <= 9, the results
are
as expected (1 / N!).

The "exact" p-value calculation is good enough for most practical
purposes, but
is clearly not exact.  Some informal testing I've done indicates that it can
even be less accurate than the "approximate" p-value calculation in
some cases.
I think it's absurd to call these p-values "exact" when they are
clearly based
on an asymptotic approximation that can be off by orders of magnitude in some
cases.

Petr Savicky

2009-Nov-30 11:58 UTC

head link

[Rd] cor.test(method = spearman, exact = TRUE) not exact (PR#14095)

On Mon, Nov 30, 2009 at 04:00:12AM +0100, dsimcha at gmail.com
wrote:> > a <- c(1:10)
> > b <- c(1:10)
> > cor.test(a, b, method = "spearman", alternative =
"greater", exact = TRUE)
> 
>         Spearman's rank correlation rho
> 
> data:  a and b 
> S = 0, p-value < 2.2e-16
> alternative hypothesis: true rho is greater than 0 
> sample estimates:
> rho 
>   1 
> 
> > 1 / factorial(10)
> [1] 2.755732e-07
> 
> Since we have perfect rank correlation and only one permutation out of 10!
could
> give this for N = 10, the p-value should be 1/10!.  Reading the code in
prho.c,
> it appears that the "exact" calculation uses the Edgeworth
approximation for N >
> 9.  This makes sense because, for similar examples with N <= 9, the
results are
> as expected (1 / N!).
The Edgeworth approximation is not exact, although the error is quite small.
Computing the exact values has time complexity roughly 2^n, if Ryser's
formula
for permanent is used. In cases, where one needs the exact values for small n,
it is possible to use the package pspearman at CRAN, which includes precomuted
values up to n=22. See also the paper
  M.A. van de Wiel and A. Di Bucchianico,
  Fast computation of the exact null distribution of Spearman's rho and
Page's L statistic
  for samples with and without ties, J. Stat. Plann. Inf. 92 (2001), pp.
133-145.
which deals with this question.

  library(pspearman)
  a <- c(1:10)
  b <- c(1:10)
  out <- spearman.test(a, b, alternative = "greater",
approximation="exact")
  out
  #         Spearman's rank correlation rho
  # data:  a and b 
  # S = 0, p-value = 2.756e-07
  # alternative hypothesis: true rho is greater than 0 
  # sample estimates:
  # rho 
  #   1 
  out$p.value - 1/factorial(10) # [1] 0

PS.

Seemingly Similar Threads

Search for more possibly parallel threads

R devel - Nov 2009 - cor.test(method = spearman, exact = TRUE) not exact (PR#14095)

[Rd] cor.test(method = spearman, exact = TRUE) not exact (PR#14095)

[Rd] cor.test(method = spearman, exact = TRUE) not exact (PR#14095)

Seemingly Similar Threads