jerome@hivnet.ubc.ca
2003-Aug-30 02:04 UTC
[Rd] fisher.test() gives wrong confidence interval (PR#4019)
The problem occurs when the sample odds ratio is Inf, such as in the following example. Given the fact that both upper bounds of the two 95% confidence intervals are Inf, I would have expected that the two lower bounds be equal, but they aren't. x <- matrix(c(9,4,0,2),2,2) x # [,1] [,2] #[1,] 9 0 #[2,] 4 2 rbind("two.sided.95CI"=fisher.test(x)$conf.int, "greater.95CI"=fisher.test(x,alt="greater")$conf.int) # [,1] [,2] #two.sided.95CI 0.2985103 Inf #greater.95CI 0.4625314 Inf Using the noncentral hypergeometric distribution, we can calculate the probability mass of each possible table with same marginals as x. Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley. Page 67. Hence, the result below suggests that the two-sided confidence interval has a confidence level of 97.5% as opposed to 95%. n11 <- 7:9 theta <- 0.2985103 choose(9,n11)*choose(15-9,13-n11)*theta^n11/ sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) #[1] 0.67344877 0.30154709 0.02500414 The 95% confidence interval with one-sided (greater) alternative appears to be correct. theta <- 0.4625314 choose(9,n11)*choose(15-9,13-n11)*theta^n11/ sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) #[1] 0.5608724 0.3891316 0.0499960 Sincerely, Jerome Asselin -- Jerome Asselin (Jérôme), Statistical Analyst British Columbia Centre for Excellence in HIV/AIDS St. Paul's Hospital, 608 - 1081 Burrard Street Vancouver, British Columbia, CANADA V6Z 1Y6 Email: jerome@hivnet.ubc.ca Phone: 604 806-9112 Fax: 604 806-9044
jerome@hivnet.ubc.ca
2003-Aug-30 02:48 UTC
[Rd] Re: fisher.test() gives wrong confidence interval (PR#4019)
Forgot to mention... I'm using R 1.7.1 on Red Hat Linux 7.2.
Peter Dalgaard BSA
2003-Aug-30 11:30 UTC
[Rd] fisher.test() gives wrong confidence interval (PR#4019)
jerome@hivnet.ubc.ca writes:> The problem occurs when the sample odds ratio is Inf, such as in the > following example. Given the fact that both upper bounds of the two 95% > confidence intervals are Inf, I would have expected that the two lower > bounds be equal, but they aren't. > > x <- matrix(c(9,4,0,2),2,2) > x > # [,1] [,2] > #[1,] 9 0 > #[2,] 4 2 > rbind("two.sided.95CI"=fisher.test(x)$conf.int, > "greater.95CI"=fisher.test(x,alt="greater")$conf.int) > # [,1] [,2] > #two.sided.95CI 0.2985103 Inf > #greater.95CI 0.4625314 Inf > > Using the noncentral hypergeometric distribution, we can calculate the > probability mass of each possible table with same marginals as x. > Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley. > Page 67. > > Hence, the result below suggests that the two-sided confidence interval > has a confidence level of 97.5% as opposed to 95%. > > n11 <- 7:9 > theta <- 0.2985103 > choose(9,n11)*choose(15-9,13-n11)*theta^n11/ > sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) > #[1] 0.67344877 0.30154709 0.02500414 > > The 95% confidence interval with one-sided (greater) alternative appears > to be correct. > > theta <- 0.4625314 > choose(9,n11)*choose(15-9,13-n11)*theta^n11/ > sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) > #[1] 0.5608724 0.3891316 0.0499960I don't think this is a bug, insofar as the problem is solvable at all. The confidence interval consists of those x for which the test of OR==x is not rejected at the 95% level. In the two sided case, you need to count also the tables that differ in the "opposite direction", and we unceremoniously assume that their probability is the same. For some small tables like this one the assumption is false since there are only three tables consistent with the marginals. However, this is not invariably the case, and trying to solve the test problem exactly runs into the issue that the exact two-sided p-value for OR==x is not a nice function of x (it has discontinuities and may be non-monotone). Consider this case, and you'll see part of the point fisher.test(matrix(c(3,0,0,3),2),alt="g") -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Kurt Hornik
2003-Aug-30 12:44 UTC
[Rd] fisher.test() gives wrong confidence interval (PR#4019)
>>>>> jerome writes:> The problem occurs when the sample odds ratio is Inf, such as in the > following example. Given the fact that both upper bounds of the two 95% > confidence intervals are Inf, I would have expected that the two lower > bounds be equal, but they aren't.> x <- matrix(c(9,4,0,2),2,2) > x > # [,1] [,2] > #[1,] 9 0 > #[2,] 4 2 > rbind("two.sided.95CI"=fisher.test(x)$conf.int, > "greater.95CI"=fisher.test(x,alt="greater")$conf.int) > # [,1] [,2] > #two.sided.95CI 0.2985103 Inf > #greater.95CI 0.4625314 Inf> Using the noncentral hypergeometric distribution, we can calculate the > probability mass of each possible table with same marginals as x. > Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley. > Page 67.> Hence, the result below suggests that the two-sided confidence interval > has a confidence level of 97.5% as opposed to 95%.> n11 <- 7:9 > theta <- 0.2985103 > choose(9,n11)*choose(15-9,13-n11)*theta^n11/ > sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) > #[1] 0.67344877 0.30154709 0.02500414> The 95% confidence interval with one-sided (greater) alternative appears > to be correct.> theta <- 0.4625314 > choose(9,n11)*choose(15-9,13-n11)*theta^n11/ > sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11) > #[1] 0.5608724 0.3891316 0.0499960Right. When going for a confidence interval that is to include the parameter estimate and this is already at the boundary of the parameter range, then there is no point finding mass even further away as there obviously cannot be any. I changed the code (r-devel) to CINT <- switch(alternative, less = c(0, ncp.U(x, 1 - conf.level)), greater = c(ncp.L(x, 1 - conf.level), Inf), two.sided = { if(ESTIMATE == 0) c(0, ncp.U(x, 1 - conf.level)) else if(ESTIMATE == Inf) c(ncp.L(x, 1 - conf.level), Inf) else { alpha <- (1 - conf.level) / 2 c(ncp.L(x, alpha), ncp.U(x, alpha)) } }) which seems to fix the problem. Thanks, -k