skylab.gupta at gmail.com
2007-Oct-10 14:35 UTC
[Rd] pt inaccurate when x is close to 0 (PR#9945)
Full_Name: Skylab Gupta Version: R version 2.5.1 (2007-06-27) OS: Windows XP Submission from: (NULL) (216.82.144.137) Hello, I have been playing around with the statistical distributions in R. I think the computations for the cumulative distribution function of the students t distribution in R are not very accurate. For instance, the cdf of a students t distribution with 13 degrees of freedom at 1e-4 is reported in R as "0.5000391350986764"; from Mathematica, it seems the correct value is "0.50003913510150055", only about 9 accurate digits reported in R. I also did the following from within R: ------------- df<-seq(1,100,by=1) y<-pt(1e-4,df) z<-c(0.50003183098839799,0.50003535533895194,0.50003675525997071,0.50003749999985481,0.50003796066840744,0.50003827327749706,0.50003849914427922,0.50003866990364754,0.50003880349244212,0.50003891083995444,0.50003899897813187,0.50003907263208447,0.50003913510150055,0.50003918874627440,0.50003923531785055,0.50003927612461441,0.50003931217478748,0.50003934425324170,0.50003937297989520,0.50003939886014204,0.50003942229165621,0.50003944360703978,0.50003946308016112,0.50003948094039441,0.50003949738053710,0.50003951256485324,0.50003952663295181,0.50003953969680248,0.50003955185925653,0.50003956322006460,0.50003957385523301,0.50003958382054481,0.50003959318443636,0.50003960200394315,0.50003961032679112,0.50003961818144815,0.50003962562026172,0.50003963266089213,0.50003963934773465,0.50003964569404735,0.50003965173577758,0.50003965749688895,0.50003966298323521,0.50003966823056478,0.50003967322766096,0.50003967801868676,0.50003968260005904,0.50003968700228751,0.50003969121916547,0.500 03969526955183,0.50003969915340063,0.50003970290428668,0.50003970650705731,0.50003970997149927,0.50003971332909936,0.50003971654204993,0.50003971964040972,0.50003972264367180,0.50003972553808163,0.50003972835715427,0.50003973106835642,0.50003973370765664,0.50003973624942966,0.50003973868896101,0.50003974107556448,0.50003974338818691,0.50003974563557085,0.50003974781567961,0.50003974993203681,0.50003975199594708,0.50003975399737965,0.50003975593675354,0.50003975782715593,0.50003975966389691,0.50003976145762119,0.50003976321975063,0.50003976489560775,0.50003976655049909,0.50003976818673812,0.50003976975798736,0.50003977127434285,0.50003977277055756,0.50003977423495483,0.50003977566285773,0.50003977705769798,0.50003977841313474,0.50003977975147973,0.50003978102874791,0.50003978230822732,0.50003978356836509,0.50003978477872879,0.50003978596096421,0.50003978713049724,0.50003978827577344,0.50003978935715154,0.50003979045422919,0.50003979153680134,0.50003979256756137,0.500039793589 57851,0.50003979462027492) plot(df,(y-z)/z, type="s") ------------- In the above R code, df contains the 100 integers between 1-100, y contains the cdf of the students t distribution computed at 1e-4 from R, for all the df degrees of freedom; and z contains the correct values (to 17 decimal digits) of the students t distribution cdf at 1e-4 computed from Mathematica; when I plot the relative errors between the computed values from Mathematica and R, it seems the relative errors are large; we get only about 10-12 digits of accuracy from R rather than about 15 digits (all this assuming that the Mathematica computed values are correct). This happens for all values close to 0 where the cdf is evaluated. I am working on Windows XP, and I installed a precompiled binary version of R. The following information might also be useful: ---------------> sessionInfo()R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base"> versionplatform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 5.1 year 2007 month 06 day 27 svn rev 42083 language R version.string R version 2.5.1 (2007-06-27) --------------- Is there a reason for this loss of accuracy, or am I missing something here? Thanks.
On 10/10/2007 10:35 AM, skylab.gupta at gmail.com wrote:> Full_Name: Skylab Gupta > Version: R version 2.5.1 (2007-06-27) > OS: Windows XP > Submission from: (NULL) (216.82.144.137) > > > Hello, > > I have been playing around with the statistical distributions in R. I think the > computations for the cumulative distribution function of the students t > distribution in R are not very accurate. > > For instance, the cdf of a students t distribution with 13 degrees of freedom at > 1e-4 is reported in R as "0.5000391350986764"; from Mathematica, it seems the > correct value is "0.50003913510150055", only about 9 accurate digits reported in > R.As Charles Berry told you when this was posted to R-help, it looks as though it is Mathematica that is inaccurate. For example, I would expect this plot to be smooth, and it is not in either R or Mathematica, but R is at least monotone: # the Mathematica values plot(diff(z[80:100]), type='l') The R values plot(diff(pt(1e-4, df=80:100)), type='l')> > I also did the following from within R: > > ------------- > df<-seq(1,100,by=1) > y<-pt(1e-4,df) > z<-c(0.50003183098839799,0.50003535533895194,0.50003675525997071,0.50003749999985481,0.50003796066840744,0.50003827327749706,0.50003849914427922,0.50003866990364754,0.50003880349244212,0.50003891083995444,0.50003899897813187,0.50003907263208447,0.50003913510150055,0.50003918874627440,0.50003923531785055,0.50003927612461441,0.50003931217478748,0.50003934425324170,0.50003937297989520,0.50003939886014204,0.50003942229165621,0.50003944360703978,0.50003946308016112,0.50003948094039441,0.50003949738053710,0.50003951256485324,0.50003952663295181,0.50003953969680248,0.50003955185925653,0.50003956322006460,0.50003957385523301,0.50003958382054481,0.50003959318443636,0.50003960200394315,0.50003961032679112,0.50003961818144815,0.50003962562026172,0.50003963266089213,0.50003963934773465,0.50003964569404735,0.50003965173577758,0.50003965749688895,0.50003966298323521,0.50003966823056478,0.50003967322766096,0.50003967801868676,0.50003968260005904,0.50003968700228751,0.50003969121916547,0.500> 03969526955183,0.50003969915340063,0.50003970290428668,0.50003970650705731,0.50003970997149927,0.50003971332909936,0.50003971654204993,0.50003971964040972,0.50003972264367180,0.50003972553808163,0.50003972835715427,0.50003973106835642,0.50003973370765664,0.50003973624942966,0.50003973868896101,0.50003974107556448,0.50003974338818691,0.50003974563557085,0.50003974781567961,0.50003974993203681,0.50003975199594708,0.50003975399737965,0.50003975593675354,0.50003975782715593,0.50003975966389691,0.50003976145762119,0.50003976321975063,0.50003976489560775,0.50003976655049909,0.50003976818673812,0.50003976975798736,0.50003977127434285,0.50003977277055756,0.50003977423495483,0.50003977566285773,0.50003977705769798,0.50003977841313474,0.50003977975147973,0.50003978102874791,0.50003978230822732,0.50003978356836509,0.50003978477872879,0.50003978596096421,0.50003978713049724,0.50003978827577344,0.50003978935715154,0.50003979045422919,0.50003979153680134,0.50003979256756137,0.500039793589> 57851,0.50003979462027492) > > plot(df,(y-z)/z, type="s") > ------------- > > In the above R code, df contains the 100 integers between 1-100, y contains the > cdf of the students t distribution computed at 1e-4 from R, for all the df > degrees of freedom; and z contains the correct values (to 17 decimal digits) of > the students t distribution cdf at 1e-4 computed from Mathematica; when I plot > the relative errors between the computed values from Mathematica and R, it seems > the relative errors are large; we get only about 10-12 digits of accuracy from R > rather than about 15 digits (all this assuming that the Mathematica computed > values are correct).It seems you are making a bad assumption. Duncan Murdoch This happens for all values close to 0 where the cdf is> evaluated. > > I am working on Windows XP, and I installed a precompiled binary version of R. > The following information might also be useful: > > --------------- >> sessionInfo() > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > "base" > >> version > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 5.1 > year 2007 > month 06 > day 27 > svn rev 42083 > language R > version.string R version 2.5.1 (2007-06-27) > --------------- > > Is there a reason for this loss of accuracy, or am I missing something here? > Thanks. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, 10 Oct 2007, Duncan Murdoch wrote:>> As Charles Berry told you when this was posted to R-help, it looks as > though it is Mathematica that is inaccurate. For example, I would > expect this plot to be smooth, and it is not in either R or Mathematica, > but R is at least monotone:I did check this on Maple by integrating the density function and got about 11-12 digits accuracy at df=88. I also tried replacing 1e-4 with an exactly-representable number (3/32768), with a similar accuracy level. pbeta() claims 14-digit accuracy and is called by pt() with 1/(1+x^2/n) as its argument. I would have thought that the 1/(1+x^2/n) could easily be responsible for this sort of accuracy loss when x=1e-4. If so, it may not be easy to improve. -thomas