thr3ads.net - R devel - [Rd] fisher.test() gives wrong confidence interval (PR#4019) [Aug 2003]

If this information is useful, please help other people find it:
Share via:

jerome@hivnet.ubc.ca

2003-Aug-30 02:04 UTC

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

The problem occurs when the sample odds ratio is Inf, such as in the 
following example. Given the fact that both upper bounds of the two 95% 
confidence intervals are Inf, I would have expected that the two lower 
bounds be equal, but they aren't.

x <- matrix(c(9,4,0,2),2,2)
x
#     [,1] [,2]
#[1,]    9    0
#[2,]    4    2
rbind("two.sided.95CI"=fisher.test(x)$conf.int,
"greater.95CI"=fisher.test(x,alt="greater")$conf.int)
#                    [,1] [,2]
#two.sided.95CI 0.2985103  Inf
#greater.95CI   0.4625314  Inf

Using the noncentral hypergeometric distribution, we can calculate the 
probability mass of each possible table with same marginals as x.
Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley. 
Page 67.

Hence, the result below suggests that the two-sided confidence interval 
has a confidence level of 97.5% as opposed to 95%.

n11 <- 7:9
theta <- 0.2985103
choose(9,n11)*choose(15-9,13-n11)*theta^n11/
  sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
#[1] 0.67344877 0.30154709 0.02500414

The 95% confidence interval with one-sided (greater) alternative appears 
to be correct.

theta <- 0.4625314
choose(9,n11)*choose(15-9,13-n11)*theta^n11/
  sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
#[1] 0.5608724 0.3891316 0.0499960

Sincerely,
Jerome Asselin

-- 

Jerome Asselin (Jérôme), Statistical Analyst
British Columbia Centre for Excellence in HIV/AIDS
St. Paul's Hospital, 608 - 1081 Burrard Street
Vancouver, British Columbia, CANADA V6Z 1Y6
Email: jerome@hivnet.ubc.ca
Phone: 604 806-9112   Fax: 604 806-9044

jerome@hivnet.ubc.ca

2003-Aug-30 02:48 UTC

head link

[Rd] Re: fisher.test() gives wrong confidence interval (PR#4019)

Forgot to mention...
I'm using R 1.7.1 on Red Hat Linux 7.2.

Peter Dalgaard BSA

2003-Aug-30 11:30 UTC

head link

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

jerome@hivnet.ubc.ca writes:
> The problem occurs when the sample odds ratio is Inf, such as in the 
> following example. Given the fact that both upper bounds of the two 95% 
> confidence intervals are Inf, I would have expected that the two lower 
> bounds be equal, but they aren't.
> 
> x <- matrix(c(9,4,0,2),2,2)
> x
> #     [,1] [,2]
> #[1,]    9    0
> #[2,]    4    2
> rbind("two.sided.95CI"=fisher.test(x)$conf.int,
> "greater.95CI"=fisher.test(x,alt="greater")$conf.int)
> #                    [,1] [,2]
> #two.sided.95CI 0.2985103  Inf
> #greater.95CI   0.4625314  Inf
> 
> Using the noncentral hypergeometric distribution, we can calculate the 
> probability mass of each possible table with same marginals as x.
> Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley. 
> Page 67.
> 
> Hence, the result below suggests that the two-sided confidence interval 
> has a confidence level of 97.5% as opposed to 95%.
> 
> n11 <- 7:9
> theta <- 0.2985103
> choose(9,n11)*choose(15-9,13-n11)*theta^n11/
>   sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
> #[1] 0.67344877 0.30154709 0.02500414
> 
> The 95% confidence interval with one-sided (greater) alternative appears 
> to be correct.
> 
> theta <- 0.4625314
> choose(9,n11)*choose(15-9,13-n11)*theta^n11/
>   sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
> #[1] 0.5608724 0.3891316 0.0499960
I don't think this is a bug, insofar as the problem is solvable at
all. The confidence interval consists of those x for which the test of
OR==x is not rejected at the 95% level. In the two sided case, you
need to count also the tables that differ in the "opposite direction",
and we unceremoniously assume that their probability is the same. 

For some small tables like this one the assumption is false since
there are only three tables consistent with the marginals. However,
this is not invariably the case, and trying to solve the test problem
exactly runs into the issue that the exact two-sided p-value for OR==x
is not a nice function of x (it has discontinuities and may be
non-monotone).

Consider this case, and you'll see part of the point
fisher.test(matrix(c(3,0,0,3),2),alt="g")

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

Kurt Hornik

2003-Aug-30 12:44 UTC

head link

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

>>>>> jerome  writes:
> The problem occurs when the sample odds ratio is Inf, such as in the 
> following example. Given the fact that both upper bounds of the two 95% 
> confidence intervals are Inf, I would have expected that the two lower 
> bounds be equal, but they aren't.
> x <- matrix(c(9,4,0,2),2,2)
> x
> #     [,1] [,2]
> #[1,]    9    0
> #[2,]    4    2
> rbind("two.sided.95CI"=fisher.test(x)$conf.int,
> "greater.95CI"=fisher.test(x,alt="greater")$conf.int)
> #                    [,1] [,2]
> #two.sided.95CI 0.2985103  Inf
> #greater.95CI   0.4625314  Inf
> Using the noncentral hypergeometric distribution, we can calculate the
> probability mass of each possible table with same marginals as x.
> Ref.: Alan Agresti (1990). Categorical data analysis. New York: Wiley.
> Page 67.
> Hence, the result below suggests that the two-sided confidence interval 
> has a confidence level of 97.5% as opposed to 95%.
> n11 <- 7:9
> theta <- 0.2985103
> choose(9,n11)*choose(15-9,13-n11)*theta^n11/
>   sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
> #[1] 0.67344877 0.30154709 0.02500414
> The 95% confidence interval with one-sided (greater) alternative appears 
> to be correct.
> theta <- 0.4625314
> choose(9,n11)*choose(15-9,13-n11)*theta^n11/
>   sum(choose(9,n11)*choose(15-9,13-n11)*theta^n11)
> #[1] 0.5608724 0.3891316 0.0499960
Right.  When going for a confidence interval that is to include the
parameter estimate and this is already at the boundary of the parameter
range, then there is no point finding mass even further away as there
obviously cannot be any.

I changed the code (r-devel) to

        CINT <- switch(alternative,
                       less = c(0, ncp.U(x, 1 - conf.level)),
                       greater = c(ncp.L(x, 1 - conf.level), Inf),
                       two.sided = {
                           if(ESTIMATE == 0)
                               c(0, ncp.U(x, 1 - conf.level))
                           else if(ESTIMATE == Inf)
                               c(ncp.L(x, 1 - conf.level), Inf)
                           else {
                               alpha <- (1 - conf.level) / 2
                               c(ncp.L(x, alpha), ncp.U(x, alpha))
                           }
                       })

which seems to fix the problem.

Thanks,

-k

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Aug 2003 - fisher.test() gives wrong confidence interval (PR#4019)

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

[Rd] Re: fisher.test() gives wrong confidence interval (PR#4019)

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

[Rd] fisher.test() gives wrong confidence interval (PR#4019)

Possibly Parallel Threads