r.hankin at noc.soton.ac.uk
2008-Jan-08 09:25 UTC
[Rd] fisher.test(... , simulate.p.value=TRUE) (PR#10558)
Full_Name: Robin Hankin Version: R-2.6.1 OS: macosx 10.5.1 Submission from: (NULL) (139.166.245.10) fisher.test() with a matrix a <- diag(1:3) has a p-value of 1/60 ~= 0.016666 Setting the simulate.p.value flag to TRUE gives a very incorrect answer:> fisher.test(a,simulate.p.value=TRUE)$p.value[1] 0.0004997501> fisher.test(a,simulate.p.value=TRUE,B=10000)$p.value[1] 9.999e-05>These values are repeatable [there is no variability]. The relevant line in fisher.test() is reproduced below; it is the one where PVAL is set: n <- sum(sc) STATISTIC <- -sum(lfactorial(x)) tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), as.integer(sr), as.integer(sc), as.integer(n), as.integer(B), integer(nr * nc), double(n + 1), integer(nc), results = double(B), PACKAGE = "stats")$results almost.1 <- 1 + 64 * .Machine$double.eps PVAL <- (1 + sum(tmp <= almost.1 * STATISTIC))/(B + 1) } the problem is sum(tmp <= almost.1 * STATISTIC). This line would be correct if tmp (and STATISTIC) were positive, but they are negative (or zero). The fisher.test() is returning 1/(B+1) because *no* element of tmp satisfies tmp <almost.1 * STATISTIC. Changing the line to read PVAL <- (1 + sum(tmp <= STATISTIC/almost.1))/(B + 1) fixes the problem:> myfish(a,simulate.p.value=TRUE,B=10000)$p.value[1] 0.01749825> myfish(a,simulate.p.value=TRUE,B=10000)$p.value[1] 0.01779822> myfish(a,simulate.p.value=TRUE,B=10000)$p.value[1] 0.01559844> myfish(a,simulate.p.value=TRUE,B=10000)$p.value[1] 0.01739826>See how the p-values agree with the theoretical value of 1/60 (more or less).