thr3ads.net - R help - [R] Is it safe? Cochran etc [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Dan Bolser

2004-Oct-09 17:18 UTC

[R] Is it safe? Cochran etc

I have the following contingency table

dat <- matrix(c(1,506,13714,878702),nr=2)

And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}

        | b   | not(b) |
--------+-----+--------+
 a      |   1 |  13714 |
--------+-----+--------+
 not(a) | 506 | 878702 |
--------+-----+--------+

I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.

Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...

I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.

I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.

I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...

Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?

I cannot get insight into this problem...

How come if I do...

dat <- matrix(c(50,60,100,100),nr=2)

prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value

I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358

When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference > 0.004

I am so fed up with this problem :(

Dan Bolser

2004-Oct-09 18:06 UTC

head link

[R] Is it safe? Cochran etc

Why can't I just use Log odds? Does the standard error of the logs score
depend on a similar chisq assumption?



On Sat, 9 Oct 2004, Dan Bolser wrote:
>
>I have the following contingency table
>
>dat <- matrix(c(1,506,13714,878702),nr=2)
>
>And I want to test if their is an association between events 
>
>A:{a,not(a)} and B:{b,not(b)}
>
>        | b   | not(b) |
>--------+-----+--------+
> a      |   1 |  13714 |
>--------+-----+--------+
> not(a) | 506 | 878702 |
>--------+-----+--------+
>
>I am worried that prop.test and chisq.test are not valid given the low
>counts and low probabilites associated with 'sucess' in each
category.
>
>Is it safe to use them, and what is the alternative? (given that
>fisher.test can't handle this data... hold the phone...
>
>I just found fisher.test can handle this data if the test is one-tailed
>and not two-tailed.
>
>I don't understand the difference between chisq.test, prop.test and
>fisher.test when the hybrid=1 option is used for the fisher.test.
>
>I was using the binomial distribution to test the 'extremity' of the
>observed data, but now I think I know why that is inapropriate, however,
>with the binomial (and its approximation) at least I know what I am
>doing. And I can do it in perl easily...
>
>Generally, how should I calculate fisher.test in perl (i.e. what are its
>principles). When is it safe to approximate fisher to chisq?
>
>I cannot get insight into this problem...
>
>How come if I do...
>
>dat <- matrix(c(50,60,100,100),nr=2)
>
>prop.test(dat)$p.value
>chisq.test(dat)$p.value
>fisher.test(dat)$p.value
>
>I get 
>
>[1] 0.5173269
>[1] 0.5173269
>[1] 0.4771358
>
>When I looked at the binomial distribution and the normal approximation
>thereof with similar counts I never had a p-value difference > 0.004
>
>I am so fed up with this problem :(
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Kjetil Brinchmann Halvorsen

2004-Oct-09 19:16 UTC

head link

[R] Is it safe? Cochran etc

Dan Bolser wrote:
>I have the following contingency table
>
>dat <- matrix(c(1,506,13714,878702),nr=2)
>
>And I want to test if their is an association between events 
>
>A:{a,not(a)} and B:{b,not(b)}
>
>        | b   | not(b) |
>--------+-----+--------+
> a      |   1 |  13714 |
>--------+-----+--------+
> not(a) | 506 | 878702 |
>--------+-----+--------+
>
>I am worried that prop.test and chisq.test are not valid given the low
>counts and low probabilites associated with 'sucess' in each
category.
>  
>
 > test <- matrix( c(1,506, 13714, 878702), 2,2)
 > test
     [,1]   [,2]
[1,]    1  13714
[2,]  506 878702
 > chisq.test(test)

        Pearson's Chi-squared test with Yates' continuity correction

data:  test
X-squared = 5.1584, df = 1, p-value = 0.02313

 > chisq.test(test,sim=TRUE)

        Pearson's Chi-squared test with simulated p-value (based on 2000
        replicates)

data:  test
X-squared = 6.0115, df = NA, p-value = 0.02099

 > chisq.test(test,sim=TRUE, B=200000)
     # To rule out simulation uncertainty
        Pearson's Chi-squared test with simulated p-value (based on 200000
        replicates)

data:  test
X-squared = 6.0115, df = NA, p-value = 0.01634

 > N <- sum(test)
 > rows <- rowSums(test)
 > cols <- colSums(test)
 > E <- rows %o% cols/N
 > E
           [,1]      [,2]
[1,]   7.787351  13707.21
[2,] 499.212649 878708.79
 >

None of the expected'eds are lesser than 5, an often  used rule of thumb 
(which might even be
to conservative. Just to check on the distribution:

rows <- round(rows)
cols <- round(cols)

 > pvals <- sapply(r2dtable(100000, rows, cols), function(x) 
chisq.test(x)$p.value)
 > hist(pvals)
     # not very good approximation to uniform histogram, but:
 > sum(pvals < 0.05)/100000
[1] 0.03068
 > sum(pvals < 0.01)/100000
[1] 0.00669

So the true levels are not very far from the calculated by te chisq 
approximation, and it seems safe to use it.

All of this to show that with R you are not anymore dependent on old 
"rules os thumb",
you can investigate for yourself.

Kjetil
>Is it safe to use them, and what is the alternative? (given that
>fisher.test can't handle this data... hold the phone...
>
>I just found fisher.test can handle this data if the test is one-tailed
>and not two-tailed.
>
>I don't understand the difference between chisq.test, prop.test and
>fisher.test when the hybrid=1 option is used for the fisher.test.
>
>I was using the binomial distribution to test the 'extremity' of the
>observed data, but now I think I know why that is inapropriate, however,
>with the binomial (and its approximation) at least I know what I am
>doing. And I can do it in perl easily...
>
>Generally, how should I calculate fisher.test in perl (i.e. what are its
>principles). When is it safe to approximate fisher to chisq?
>
>I cannot get insight into this problem...
>
>How come if I do...
>
>dat <- matrix(c(50,60,100,100),nr=2)
>
>prop.test(dat)$p.value
>chisq.test(dat)$p.value
>fisher.test(dat)$p.value
>
>I get 
>
>[1] 0.5173269
>[1] 0.5173269
>[1] 0.4771358
>
>When I looked at the binomial distribution and the normal approximation
>thereof with similar counts I never had a p-value difference > 0.004
>
>I am so fed up with this problem :(
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>
>  
>

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Oct 2004 - Is it safe? Cochran etc

[R] Is it safe? Cochran etc

[R] Is it safe? Cochran etc

[R] Is it safe? Cochran etc

Possibly Parallel Threads