dgrove@fhcrc.org
2005-Oct-20  06:08 UTC
[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)
Hi, This report deals with p-values coming from chisq.test using the simulate.p=TRUE option. The issue is numerical accuracy and was brought up in previous bug reports 3486 and 3896. The bug was considered fixed but apparently was only mostly fixed. Just the typical problem of two values that are mathematically equal not ending up numerically equivalent. Consider this series of three 2x2 tables: [1,] 1 7 [2,] 0 15 [1,] 1 7 [2,] 0 16 [1,] 1 7 [2,] 0 17 The pvals returned from chisq.test(m, sim=TRUE)$p.value are 0.3543228, 0.0004997501 and 0.3273363 respectively. The 2nd seems a bit unlikely, huh? I checked into it and the value I'm getting for the statistic (calculated in R code) is 4*.Machine$double.eps less than the value (which should be equal) that is returned from the C-code that does the simulation. Code for creating/testing the three matrices shown above: m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value Running SuSE9.3 on a AMD Athlon4000+> versionplatform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status Patched major 2 minor 1.1 year 2005 month 07 day 29 language R Thanks, Doug Douglas Grove Statistical Research Associate Fred Hutchinson Cancer Research Center Seattle WA 98109
Simone Giannerini
2005-Oct-20  08:10 UTC
[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)
Hi, I obtain the same result under Win. XP SP2 on AMD 64 3700+ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 2.0 year 2005 month 10 day 06 svn rev 35749 language R> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value[1] 0.3598201> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value[1] 0.0004997501> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value[1] 0.3403298 Ciao Simone On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote:> Hi, > > This report deals with p-values coming from chisq.test using > the simulate.p=TRUE option. The issue is numerical accuracy > and was brought up in previous bug reports 3486 and 3896. > The bug was considered fixed but apparently was only mostly > fixed. Just the typical problem of two values that are > mathematically equal not ending up numerically equivalent. > > Consider this series of three 2x2 tables: > > [1,] 1 7 > [2,] 0 15 > > [1,] 1 7 > [2,] 0 16 > > [1,] 1 7 > [2,] 0 17 > > > The pvals returned from chisq.test(m, sim=TRUE)$p.value are > 0.3543228, 0.0004997501 and 0.3273363 respectively. > > The 2nd seems a bit unlikely, huh? > > I checked into it and the value I'm getting for the statistic > (calculated in R code) is 4*.Machine$double.eps less than the > value (which should be equal) that is returned from the C-code > that does the simulation. > > > Code for creating/testing the three matrices shown above: > m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value > m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value > m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value > > > Running SuSE9.3 on a AMD Athlon4000+ > > > > version > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status Patched > major 2 > minor 1.1 > year 2005 > month 07 > day 29 > language R > > > Thanks, > Doug > > > Douglas Grove > Statistical Research Associate > Fred Hutchinson Cancer Research Center > Seattle WA 98109 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- ______________________________________________________ Simone Giannerini Dipartimento di Scienze Statistiche "Paolo Fortunati" Universita' di Bologna Via delle belle arti 41 - 40126 Bologna, ITALY Tel: +39 051 2098248 Fax: +39 051 232153 E-mail: giannerini at stat.unibo.it
maechler@stat.math.ethz.ch
2005-Oct-20  09:00 UTC
[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)
Thank you, Douglas (and Simone) for the bug report.>>>>> "Simone" == Simone Giannerini <sgiannerini at gmail.com> >>>>> on Thu, 20 Oct 2005 10:10:01 +0200 writes:Simone> Hi, Simone> I obtain the same result under Win. XP SP2 on AMD 64 3700+ Simone> platform i386-pc-mingw32 Simone> arch i386 Simone> os mingw32 Simone> system i386, mingw32 Simone> status Simone> major 2 Simone> minor 2.0 Simone> year 2005 Simone> month 10 Simone> day 06 Simone> svn rev 35749 Simone> language R >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value Simone> [1] 0.3598201 >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value Simone> [1] 0.0004997501 >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value Simone> [1] 0.3403298 So, it's only the middle matrix giving problems for you. It doesn't for me on a AMD 64-bit platform, even for 1000 simulations:> m <- cbind(1:0, c(7,16)) > summary(p <- replicate(1000, chisq.test(m, sim=TRUE)$p.value))Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3043 0.3268 0.3338 0.3337 0.3405 0.3663 but it does show the problem on 32-bit one. A fix is easy and will be in R-patched (and R-devel of course) soon. Martin Maechler, ETH Zurich Simone> Ciao Simone> Simone Simone> On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote: >> Hi, >> >> This report deals with p-values coming from chisq.test using >> the simulate.p=TRUE option. The issue is numerical accuracy >> and was brought up in previous bug reports 3486 and 3896. >> The bug was considered fixed but apparently was only mostly >> fixed. Just the typical problem of two values that are >> mathematically equal not ending up numerically equivalent. >> >> Consider this series of three 2x2 tables: >> >> [1,] 1 7 >> [2,] 0 15 >> >> [1,] 1 7 >> [2,] 0 16 >> >> [1,] 1 7 >> [2,] 0 17 >> >> >> The pvals returned from chisq.test(m, sim=TRUE)$p.value are >> 0.3543228, 0.0004997501 and 0.3273363 respectively. >> >> The 2nd seems a bit unlikely, huh? >> >> I checked into it and the value I'm getting for the statistic >> (calculated in R code) is 4*.Machine$double.eps less than the >> value (which should be equal) that is returned from the C-code >> that does the simulation. >> >> >> Code for creating/testing the three matrices shown above: >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value >> >> >> Running SuSE9.3 on a AMD Athlon4000+ >> >> >> > version >> platform i686-pc-linux-gnu >> arch i686 >> os linux-gnu >> system i686, linux-gnu >> status Patched >> major 2 >> minor 1.1 >> year 2005 >> month 07 >> day 29 >> language R >> >> >> Thanks, >> Doug >> >> >> Douglas Grove >> Statistical Research Associate >> Fred Hutchinson Cancer Research Center >> Seattle WA 98109
dgrove@fhcrc.org
2005-Oct-20  16:39 UTC
[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)
Yes, only the middle matrix was problematic. Others were included to show what the value should (approx) be. Sorry that I didn't mention I was using 32 bit. There are of course very easy fixes to this, just wasn't sure what was the "best" approach in this situation (i.e. wasn't sure if the usual tolerance of sqrt(.Machine$double.eps) was too large. Thanks much, Doug On Thu, 20 Oct 2005, Martin Maechler wrote:> Thank you, Douglas (and Simone) for the bug report. > > >>>>> "Simone" == Simone Giannerini <sgiannerini at gmail.com> > >>>>> on Thu, 20 Oct 2005 10:10:01 +0200 writes: > > Simone> Hi, > Simone> I obtain the same result under Win. XP SP2 on AMD 64 3700+ > > Simone> platform i386-pc-mingw32 > Simone> arch i386 > Simone> os mingw32 > Simone> system i386, mingw32 > Simone> status > Simone> major 2 > Simone> minor 2.0 > Simone> year 2005 > Simone> month 10 > Simone> day 06 > Simone> svn rev 35749 > Simone> language R > > >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value > Simone> [1] 0.3598201 > > >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value > Simone> [1] 0.0004997501 > > >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value > Simone> [1] 0.3403298 > > So, it's only the middle matrix giving problems for you. > > It doesn't for me on a AMD 64-bit platform, even for 1000 simulations: > > > m <- cbind(1:0, c(7,16)) > > summary(p <- replicate(1000, chisq.test(m, sim=TRUE)$p.value)) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.3043 0.3268 0.3338 0.3337 0.3405 0.3663 > > but it does show the problem on 32-bit one. > > A fix is easy and will be in R-patched (and R-devel of > course) soon. > > Martin Maechler, ETH Zurich > > > Simone> Ciao > Simone> Simone > > Simone> On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote: > >> Hi, > >> > >> This report deals with p-values coming from chisq.test using > >> the simulate.p=TRUE option. The issue is numerical accuracy > >> and was brought up in previous bug reports 3486 and 3896. > >> The bug was considered fixed but apparently was only mostly > >> fixed. Just the typical problem of two values that are > >> mathematically equal not ending up numerically equivalent. > >> > >> Consider this series of three 2x2 tables: > >> > >> [1,] 1 7 > >> [2,] 0 15 > >> > >> [1,] 1 7 > >> [2,] 0 16 > >> > >> [1,] 1 7 > >> [2,] 0 17 > >> > >> > >> The pvals returned from chisq.test(m, sim=TRUE)$p.value are > >> 0.3543228, 0.0004997501 and 0.3273363 respectively. > >> > >> The 2nd seems a bit unlikely, huh? > >> > >> I checked into it and the value I'm getting for the statistic > >> (calculated in R code) is 4*.Machine$double.eps less than the > >> value (which should be equal) that is returned from the C-code > >> that does the simulation. > >> > >> > >> Code for creating/testing the three matrices shown above: > >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value > >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value > >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value > >> > >> > >> Running SuSE9.3 on a AMD Athlon4000+ > >> > >> > >> > version > >> platform i686-pc-linux-gnu > >> arch i686 > >> os linux-gnu > >> system i686, linux-gnu > >> status Patched > >> major 2 > >> minor 1.1 > >> year 2005 > >> month 07 > >> day 29 > >> language R > >> > >> > >> Thanks, > >> Doug > >> > >> > >> Douglas Grove > >> Statistical Research Associate > >> Fred Hutchinson Cancer Research Center > >> Seattle WA 98109 >