Dear list, I'm computing a p-value for the Student test and discover some inconsistencies with the cdf pt(). The observed statistic is 11.23995 for 95 observations, so the p-value is very small> t_score <- 11.23995 > n <- 95 > print(pt(t_score, df = n-2, lower=FALSE), digits=22)[1] 2.539746620181247991746e-19> print(integrate(dt, lower=t_score, upper=Inf, df=n-2)$value, digits = 22)[1] 2.539746631161970791961e-19 But if I compute with pt(lower=TRUE), I got 0> print(1-pt(t_score, df = n-2, lower=TRUE), digits=22)[1] 0 Indeed, the p-value is lower than the epsilon machine> pt(t_score, df = n-2, lower=FALSE) < .Machine$double.eps[1] TRUE Using the square of t statistic which follows a Fisher distribution, I got the same issue:> print(pf(z, 1, n-2, lower=FALSE), digits=22)[1] 5.079493240362495983491e-19> print(integrate(df, lower=z, upper=Inf, df1=1, df2=n-2)$value, digits = 22)[1] 5.079015231299358486828e-19> print(1-pf(z, 1, n-2, lower=TRUE), digits=22)[1] 0 When using the t.test() function, the p-value is naturally printed : p-value < 2.2e-16. Any comment is welcome. Christophe> R.version_ platform aarch64-apple-darwin20 arch aarch64 os darwin20 system aarch64, darwin20 status major 4 minor 5.1 year 2025 month 06 day 13 svn rev 88306 language R version.string R version 4.5.1 (2025-06-13) nickname Great Square Root ------------------------------------------------- Christophe DUTANG LJK, Ensimag, Grenoble INP, UGA, France ILB research fellow Web: http://dutangc.free.fr ------------------------------------------------- [[alternative HTML version deleted]]
>>>>> Christophe Dutang >>>>> on Sat, 25 Oct 2025 11:45:42 +0200 writes:> Dear list, > I'm computing a p-value for the Student test and discover some inconsistencies with the cdf pt(). > The observed statistic is 11.23995 for 95 observations, so the p-value is very small >> t_score <- 11.23995 >> n <- 95 >> print(pt(t_score, df = n-2, lower=FALSE), digits=22) > [1] 2.539746620181247991746e-19 >> print(integrate(dt, lower=t_score, upper=Inf, df=n-2)$value, digits = 22) > [1] 2.539746631161970791961e-19 > But if I compute with pt(lower=TRUE), I got 0 >> print(1-pt(t_score, df = n-2, lower=TRUE), digits=22) > [1] 0 > Indeed, the p-value is lower than the epsilon machine >> pt(t_score, df = n-2, lower=FALSE) < .Machine$double.eps > [1] TRUE > Using the square of t statistic which follows a Fisher distribution, I got the same issue: >> print(pf(z, 1, n-2, lower=FALSE), digits=22) > [1] 5.079493240362495983491e-19 >> print(integrate(df, lower=z, upper=Inf, df1=1, df2=n-2)$value, digits = 22) > [1] 5.079015231299358486828e-19 >> print(1-pf(z, 1, n-2, lower=TRUE), digits=22) > [1] 0 > When using the t.test() function, the p-value is naturally printed : p-value < 2.2e-16. > Any comment is welcome. > Christophe >> R.version > _ > platform aarch64-apple-darwin20 > arch aarch64 > os darwin20 > system aarch64, darwin20 > status > major 4 > minor 5.1 > year 2025 > month 06 > day 13 > svn rev 88306 > language R > version.string R version 4.5.1 (2025-06-13) > nickname Great Square Root > ------------------------------------------------- > Christophe DUTANG > LJK, Ensimag, Grenoble INP, UGA, France > ILB research fellow > Web: http://dutangc.free.fr > ------------------------------------------------- It seems to me you are wondering about the fact that you cannot distinguish in double precision the number 1 from a number that mathematically is 1 - 2.54*10^{-19}. Are you serious?
? Sat, 25 Oct 2025 11:45:42 +0200 Christophe Dutang <dutangc at gmail.com> ?????:> Indeed, the p-value is lower than the epsilon machine > > > pt(t_score, df = n-2, lower=FALSE) < .Machine$double.eps > [1] TRUEWhich means that for lower=TRUE, there will not be enough digits in R's numeric() type to represent the 5*10^-19 subtracted from 1 and approximately 16 zeroes. Instead, you can verify your answer by asking for the logarithm of the number that is too close to 1, thus retaining more significant digits: print( -expm1(pt(t_score, df = n-2, lower=TRUE, log.p = TRUE)), digits=16 ) # [1] 2.539746620181249e-19 print(pt(t_score, df = n-2, lower=FALSE), digits=16) # [1] 2.539746620181248e-19 expm1(.) computes exp(.)-1 while retaining precision for numbers that are too close to 0, for which exp() would otherwise return 1. See the links in https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f for a more detailed explanation. -- Best regards, Ivan (flipping the "days since referring to R FAQ 7.31" sign back to 0)