jtk@cmp.uea.ac.uk
2005-Aug-23 13:15 UTC
[Rd] NAs by integer overflow in Spearman's test p-value (PR#8087)
Full_Name: Jan T. Kim Version: 2.1.0 (and better) OS: Linux Submission from: (NULL) (139.222.3.229) The p value in Spearman's test is NA if the length of x exceeds 46340, due to an integer overflow, occurring if length(n) > sqrt(2^31): > n <- 46341; > set.seed(1); > x <- runif(n); > y <- runif(n); > cor.test(x, y, method = "spearman"); Spearman's rank correlation rho data: x and y S = 1.654965e+13, p-value = NA alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.002199426 Warning message: NAs produced by integer overflow in: n * n The integer overflow occurs in src/library/stats/R/cor.test.R, and it can be fixed by converting n to double appropriately (see *** fix label ***, lines 110 onwards are shown): ## Use the test statistic S = sum(rank(x) - rank(y))^2 ## and AS 89 for obtaining better p-values than via the ## simple normal approximation. ## In the case of no ties, S = (1-rho) * (n^3-n)/6. pspearman <- function(q, n, lower.tail = TRUE) { if(n <= 1290) # n*(n^2 - 1) does not overflow .C("prho", as.integer(n), as.double(q + 1), p = double(1), integer(1), as.logical(lower.tail), PACKAGE = "stats")$p else { # for large n: aymptotic t_{n-2} n <- as.double(n); # *** fix *** r <- 1 - 6 * q / (n*(n*n - 1)) pt(r / sqrt((1 - r^2)/(n-2)), df = n-2, lower.tail= !lower.tail) } } q <- round((n^3 - n) * (1 - r) / 6) STATISTIC <- c(S = q) PVAL <- switch(alternative, "two.sided" = { p <- if(q > (n^3 - n) / 6) pspearman(q - 1, n, lower.tail = FALSE) else pspearman(q, n, lower.tail = TRUE) min(2 * p, 1) }, "greater" = pspearman(q, n, lower.tail = TRUE), "less" = pspearman(q - 1, n, lower.tail = FALSE)) if(TIES) warning("p-values may be incorrect due to ties") Inserting the typecast only in the pspearman function is a minimally invasive fix -- alternatively, replacing at line 17 n <- length(x) with n <- as.double(length(x)) achieves the same fix and may take care of other unnecessary integer overflows, but it may also introduce new problems e.g. in .C calls etc.
Possibly Parallel Threads
- (PR#8087) NAs by integer overflow in Spearman's test p-value
- the computation of exact p-value for the nonparametric cor-test with ties
- Spearman's rank correlation test (PR#13574)
- cor.test(method = spearman, exact = TRUE) not exact (PR#14095)
- Spearman's rank correlation test