thr3ads.net - R devel - [Rd] chisq.test with simulate.p.value=TRUE (PR#13292) [Nov 2008]

If this information is useful, please help other people find it:
Share via:

constant at unb.br

2008-Nov-16 00:00 UTC

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

Full_Name: Reginaldo Constantino
Version: 2.8.0
OS: Ubuntu Hardy (32 bit, kernel 2.6.24)
Submission from: (NULL) (189.61.88.2)


For many tables, chisq.test with simulate.p.value=TRUE gives a p value that is
obviously incorrect and inversely proportional to the number of replicates:
> data(HairEyeColor)
> x <- margin.table(HairEyeColor, c(1, 2))
> chisq.test(x,simulate.p.value=TRUE,B=2000)        Pearson's Chi-squared test with simulated p-value (based on 2000
        replicates)
data:  x
X-squared = 138.2898, df = NA, p-value = 0.0004998
> chisq.test(x,simulate.p.value=TRUE,B=10000)X-squared = 138.2898, df = NA, p-value = 1e-04
> chisq.test(x,simulate.p.value=TRUE,B=100000)X-squared = 138.2898, df = NA, p-value = 1e-05
> chisq.test(x,simulate.p.value=TRUE,B=1000000)X-squared = 138.2898, df = NA, p-value = 1e-06
...

Also tested the same R version under Windows XP and got the same results.

Berwin A Turlach

2008-Nov-17 08:23 UTC

head link

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

G'day Reginaldo, 

On Sun, 16 Nov 2008 01:00:09 +0100 (CET)
constant at unb.br wrote:
> Full_Name: Reginaldo Constantino
> Version: 2.8.0
> OS: Ubuntu Hardy (32 bit, kernel 2.6.24)
> Submission from: (NULL) (189.61.88.2)
> 
> 
> For many tables, chisq.test with simulate.p.value=TRUE gives a p
> value that is obviously incorrect and inversely proportional to the
> number of replicates:
Why is this Monte-Carlo p-value obviously incorrect?  

In your example, did you look at the observed ChiSquare statistics?
Any idea how likely it is to observe a value that is at least as
extreme as the one observed?

Essentially, you are doing B Monte-Carlo simulations and in none of
these simulations do you obtain a statistic that is at least as extreme
as the one that you have observed.  So your Monte-Carlo p-value ends up
to be 1/(B+1).  

I do not see any problem or bug here.
> > data(HairEyeColor)
> > x <- margin.table(HairEyeColor, c(1, 2))
> > chisq.test(x,simulate.p.value=TRUE,B=2000)
>         Pearson's Chi-squared test with simulated p-value (based on
> 2000 replicates)
> data:  x
> X-squared = 138.2898, df = NA, p-value = 0.0004998
> 
> > chisq.test(x,simulate.p.value=TRUE,B=10000)
> X-squared = 138.2898, df = NA, p-value = 1e-04
> 
> > chisq.test(x,simulate.p.value=TRUE,B=100000)
> X-squared = 138.2898, df = NA, p-value = 1e-05
> 
> > chisq.test(x,simulate.p.value=TRUE,B=1000000)
> X-squared = 138.2898, df = NA, p-value = 1e-06
> ...
> 
> Also tested the same R version under Windows XP and got the same
> results.
Cheers,

	Berwin

=========================== Full address ============================Berwin A
Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    stat.nus.edu.sg/~statba

Ben Bolker

2008-Nov-18 04:22 UTC

head link

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

<constant <at> unb.br> writes:
> 
> Full_Name: Reginaldo Constantino
> Version: 2.8.0
> OS: Ubuntu Hardy (32 bit, kernel 2.6.24)
> Submission from: (NULL) (189.61.88.2)
> 
> For many tables, chisq.test with simulate.p.value=TRUE gives a p value that
is
> obviously incorrect and inversely proportional to the number of replicates:
> 
> > data(HairEyeColor)
> > x <- margin.table(HairEyeColor, c(1, 2))
> > chisq.test(x,simulate.p.value=TRUE,B=2000)
>         Pearson's Chi-squared test with simulated p-value (based on
2000
>         replicates)
> data:  x
> X-squared = 138.2898, df = NA, p-value = 0.0004998
> 
> > chisq.test(x,simulate.p.value=TRUE,B=10000)
> X-squared = 138.2898, df = NA, p-value = 1e-04
> 
> > chisq.test(x,simulate.p.value=TRUE,B=100000)
> X-squared = 138.2898, df = NA, p-value = 1e-05
> 
> > chisq.test(x,simulate.p.value=TRUE,B=1000000)
> X-squared = 138.2898, df = NA, p-value = 1e-06
> ...
> 
> Also tested the same R version under Windows XP and got the same results.
> 
  Could you explain why this is wrong?
The data are extremely unlikely under the null hypothesis
(the standard chisq.test() gives p<2.2e-16), so the result
of the simulation protocol is always 1/(B+1); that is, as
is standard with these protocols, the observed value is added
to the ensemble of simulations.
  Why is the p value "obviously incorrect"?

  cheers
   Ben Bolker

Ben Bolker

2008-Nov-19 13:22 UTC

head link

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

<constant <at> unb.br> writes:
> For many tables, chisq.test with simulate.p.value=TRUE gives a p value that
is
> obviously incorrect and inversely proportional to the number of replicates:
> 
> > data(HairEyeColor)
> > x <- margin.table(HairEyeColor, c(1, 2))
> > chisq.test(x,simulate.p.value=TRUE,B=2000)
>         Pearson's Chi-squared test with simulated p-value (based on
2000
>         replicates)
> data:  x
> X-squared = 138.2898, df = NA, p-value = 0.0004998
> 
> > chisq.test(x,simulate.p.value=TRUE,B=10000)
> X-squared = 138.2898, df = NA, p-value = 1e-04
> 
> > chisq.test(x,simulate.p.value=TRUE,B=100000)
> X-squared = 138.2898, df = NA, p-value = 1e-05
> 
> > chisq.test(x,simulate.p.value=TRUE,B=1000000)
> X-squared = 138.2898, df = NA, p-value = 1e-06
> ...
> 
  Tried to answer this the other day but the answer must
have gotten lost.  The standard analytical chi-squared test
here gives p<2.2e-16 (i.e. very very small).  The values given
above, up to limited display of significant digits, are
precisely 1/(B+1); that is, the simulated chi-squared values
are never less than the observed chi-squared statistic (the
observed value itself is included in the ensemble, so the
p-value is given as 1/(B+1) rather that <1/B; you can read
about the reasons for this elsewhere [?]).  Bottom line:
why do you think these results are "obviously incorrect"?

  Ben Bolker

Reasonably Related Threads

Search for more possibly parallel threads

R devel - Nov 2008 - chisq.test with simulate.p.value=TRUE (PR#13292)

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

Reasonably Related Threads