Kurt.Hornik@wu-wien.ac.at
2003-Aug-21 20:20 UTC
[Rd] The two chisq.test p values differ when the contingency table (PR#3896)
>>>>> dmurdoch writes:>> Date: Wed, 16 Jul 2003 01:27:25 +0200 (MET DST) >> From: shitao@ucla.edu>>> x >> [,1] [,2] >> [1,] 149 151 >> [2,] 1 8 >>> c2x<-chisq.test(x, simulate.p.value=T, B=100000)$p.value >>> for(i in (1:20)){c2x<-c(c2x,chisq.test(x, >> simulate.p.value=T,B=100000)$p.value)} >>> c2tx<-chisq.test(t(x), simulate.p.value=T, B=100000)$p.value >>> for(i in (1:20)){c2tx<-c(c2tx,chisq.test(t(x), simulate.p.value=T, >> + B=100000)$p.value)} >>> cbind(c2x,c2tx) >> c2x c2tx >> [1,] 0.03711 0.01683 >> [2,] 0.03717 0.01713> The problem is in ctest/R/chisq.test.R, where the p-value is > calculated as> STATISTIC <- sum((x - E) ^ 2 / E) > PARAMETER <- NA > PVAL <- sum(tmp$results >= STATISTIC) / B> Here tmp$results is a collection of simulated chisquare values, but > because of different rounding, the statistics corresponding to tables > equal to the observed table are slightly smaller than the value > calculated in STATISTIC, and effectively the p-value is calcuated as> PVAL <- sum(tmp$results > STATISTIC) / B> instead.> What's the appropriate fix here?> PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B> works on this example, but is there something better?Argh. Very interesting ... I think it works to use STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE)) instead: this starts by summing the big values, and hence if at all slightly 'underestimates' the real value, which is fine for the comparisons. Fix committed to r-devel. Thanks for looking into this. -k
Apparently Analagous Threads
- The two chisq.test p values differ when the contingency table is transposed! (PR#3486)
- Why two chisq.test p values differ when the contingency
- Why two chisq.test p values differ when the contingency table is transposed?
- Numerical stability in chisq.test
- bug(?) in chisq.test