ripley@stats.ox.ac.uk
2005-Aug-23 17:19 UTC
[Rd] (PR#8087) NAs by integer overflow in Spearman's test p-value
There is an even simpler way: someone wrote n*(n^2-1) as n*(n-1)*(n+1) and caused the problem. Your superfluous semicolons do definitely make your code harder to read. On Tue, 23 Aug 2005 jtk at cmp.uea.ac.uk wrote:> Full_Name: Jan T. Kim > Version: 2.1.0 (and better) > OS: Linux > Submission from: (NULL) (139.222.3.229) > > > The p value in Spearman's test is NA if the length of x exceeds 46340, due to > an integer overflow, occurring if length(n) > sqrt(2^31): > > > n <- 46341; > > set.seed(1); > > x <- runif(n); > > y <- runif(n); > > cor.test(x, y, method = "spearman"); > > Spearman's rank correlation rho > > data: x and y > S = 1.654965e+13, p-value = NA > alternative hypothesis: true rho is not equal to 0 > sample estimates: > rho > 0.002199426 > > Warning message: > NAs produced by integer overflow in: n * n > > The integer overflow occurs in src/library/stats/R/cor.test.R, and it can be > fixed > by converting n to double appropriately (see *** fix label ***, lines 110 > onwards > are shown): > > ## Use the test statistic S = sum(rank(x) - rank(y))^2 > ## and AS 89 for obtaining better p-values than via the > ## simple normal approximation. > ## In the case of no ties, S = (1-rho) * (n^3-n)/6. > pspearman <- function(q, n, lower.tail = TRUE) { > if(n <= 1290) # n*(n^2 - 1) does not overflow > .C("prho", > as.integer(n), > as.double(q + 1), > p = double(1), > integer(1), > as.logical(lower.tail), > PACKAGE = "stats")$p > else { # for large n: aymptotic t_{n-2} > n <- as.double(n); # *** fix *** > r <- 1 - 6 * q / (n*(n*n - 1)) > pt(r / sqrt((1 - r^2)/(n-2)), df = n-2, > lower.tail= !lower.tail) > } > } > q <- round((n^3 - n) * (1 - r) / 6) > STATISTIC <- c(S = q) > PVAL <- > switch(alternative, > "two.sided" = { > p <- if(q > (n^3 - n) / 6) > pspearman(q - 1, n, lower.tail = FALSE) > else > pspearman(q, n, lower.tail = TRUE) > min(2 * p, 1) > }, > "greater" = pspearman(q, n, lower.tail = TRUE), > "less" = pspearman(q - 1, n, lower.tail = FALSE)) > if(TIES) > warning("p-values may be incorrect due to ties") > > Inserting the typecast only in the pspearman function is a minimally invasive > fix -- alternatively, replacing at line 17 > > n <- length(x) > > with > > n <- as.double(length(x)) > > achieves the same fix and may take care of other unnecessary integer overflows, > but it may also introduce new problems e.g. in .C calls etc. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595