I agree with many the sentiments about the wisdom of computing very small p-values (although the example below may win some kind of a prize: I've seen people talking about p-values of the order of 10^(-2000), but never 10^(-(10^8)) !). That said, there are a several tricks for getting more reasonable sums of very small probabilities. The first is to scale the p-values by dividing the *largest* of the probabilities, then do the (p/sum(p)) computation, then multiply the result (I'm sure this is described/documented somewhere). More generally, there are methods for computing sums on the log scale, e.g. https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html I don't know where this has been implemented in the R ecosystem, but this sort of computation is the basis of the "Brobdingnag" package for operating on very large ("Brobdingnagian") and very small ("Lilliputian") numbers. On 2019-06-21 6:58 p.m., jing hua zhao wrote:> Hi Peter, Rui, Chrstophe and Gabriel, > > Thanks for your inputs -- the use of qnorm(., log=TRUE) is a good point in line with pnorm with which we devised log(p) as > > log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE) > > that could do really really well for large z compared to Rmpfr. Maybe I am asking too much since > > z <-20000 >> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE)) > [1] "1.660579603192917090365313727164e-86858901" > > already gives a rarely seen small p value. I gather I also need a multiple precision exp() and their sum since exp(z^2/2) is also a Bayes Factor so I get log(x_i )/sum_i log(x_i) instead. To this point, I am obliged to clarify, see https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf. > > I agree many feel geneticists go to far with small p values which I would have difficulty to argue againston the other hand it is also expected to see these in a non-genetic context. For instance the Framingham study was established in 1948 just got $34m for six years on phenotypewide association which we would be interesting to see. > > Best wishes, > > > Jing Hua > > > ________________________________ > From: peter dalgaard <pdalgd at gmail.com> > Sent: 21 June 2019 16:24 > To: jing hua zhao > Cc: Rui Barradas; r-devel at r-project.org > Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z > > You may want to look into using the log option to qnorm > > e.g., in round figures: > >> log(1e-300) > [1] -690.7755 >> qnorm(-691, log=TRUE) > [1] -37.05315 >> exp(37^2/2) > [1] 1.881797e+297 >> exp(-37^2/2) > [1] 5.314068e-298 > > Notice that floating point representation cuts out at 1e+/-308 or so. If you want to go outside that range, you may need explicit manipulation of the log values. qnorm() itself seems quite happy with much smaller values: > >> qnorm(-5000, log=TRUE) > [1] -99.94475 > > -pd > >> On 21 Jun 2019, at 17:11 , jing hua zhao <jinghuazhao at hotmail.com> wrote: >> >> Dear Rui, >> >> Thanks for your quick reply -- this allows me to see the bottom of this. I was hoping we could have a handle of those p in genmoics such as 1e-300 or smaller. >> >> Best wishes, >> >> >> Jing Hua >> >> ________________________________ >> From: Rui Barradas <ruipbarradas at sapo.pt> >> Sent: 21 June 2019 15:03 >> To: jing hua zhao; r-devel at r-project.org >> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z >> >> Hello, >> >> Well, try it: >> >> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05) >> z <- qnorm(p/2) >> >> pnorm(z) >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 >> #[11] 1.110223e-16 >> p/2 >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 >> #[11] 1.110223e-16 >> >> exp(z*z/2) >> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10 >> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13 >> #[11] 4.314798e+14 >> >> >> p is the smallest possible such that 1 + p != 1 and I couldn't find >> anything to worry about. >> >> >> R version 3.6.0 (2019-04-26) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 19.04 >> >> Matrix products: default >> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 >> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 >> >> locale: >> [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8 >> [5] LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8 >> [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods >> [7] base >> >> other attached packages: >> >> [many packages loaded] >> >> >> Hope this helps, >> >> Rui Barradas >> >> ?s 15:24 de 21/06/19, jing hua zhao escreveu: >>> Dear R-developers, >>> >>> I am keen to calculate exp(z*z/2) with z=qnorm(p/2) and p is very small. I wonder if anyone has experience with this? >>> >>> Thanks very much in advance, >>> >>> >>> Jing Hua >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
include/Rmath.h declares a set of 'logspace' functions for use at the C level. I don't think there are core R functions that call them. /* Compute the log of a sum or difference from logs of terms, i.e., * * log (exp (logx) + exp (logy)) * or log (exp (logx) - exp (logy)) * * without causing overflows or throwing away too much accuracy: */ double Rf_logspace_add(double logx, double logy); double Rf_logspace_sub(double logx, double logy); double Rf_logspace_sum(const double *logx, int nx); Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Jun 23, 2019 at 1:40 AM Ben Bolker <bbolker at gmail.com> wrote:> > I agree with many the sentiments about the wisdom of computing very > small p-values (although the example below may win some kind of a prize: > I've seen people talking about p-values of the order of 10^(-2000), but > never 10^(-(10^8)) !). That said, there are a several tricks for > getting more reasonable sums of very small probabilities. The first is > to scale the p-values by dividing the *largest* of the probabilities, > then do the (p/sum(p)) computation, then multiply the result (I'm sure > this is described/documented somewhere). More generally, there are > methods for computing sums on the log scale, e.g. > > > https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html > > I don't know where this has been implemented in the R ecosystem, but > this sort of computation is the basis of the "Brobdingnag" package for > operating on very large ("Brobdingnagian") and very small > ("Lilliputian") numbers. > > > On 2019-06-21 6:58 p.m., jing hua zhao wrote: > > Hi Peter, Rui, Chrstophe and Gabriel, > > > > Thanks for your inputs -- the use of qnorm(., log=TRUE) is a good point > in line with pnorm with which we devised log(p) as > > > > log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE) > > > > that could do really really well for large z compared to Rmpfr. Maybe I > am asking too much since > > > > z <-20000 > >> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE)) > > [1] "1.660579603192917090365313727164e-86858901" > > > > already gives a rarely seen small p value. I gather I also need a > multiple precision exp() and their sum since exp(z^2/2) is also a Bayes > Factor so I get log(x_i )/sum_i log(x_i) instead. To this point, I am > obliged to clarify, see > https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf > . > > > > I agree many feel geneticists go to far with small p values which I > would have difficulty to argue againston the other hand it is also expected > to see these in a non-genetic context. For instance the Framingham study > was established in 1948 just got $34m for six years on phenotypewide > association which we would be interesting to see. > > > > Best wishes, > > > > > > Jing Hua > > > > > > ________________________________ > > From: peter dalgaard <pdalgd at gmail.com> > > Sent: 21 June 2019 16:24 > > To: jing hua zhao > > Cc: Rui Barradas; r-devel at r-project.org > > Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z > > > > You may want to look into using the log option to qnorm > > > > e.g., in round figures: > > > >> log(1e-300) > > [1] -690.7755 > >> qnorm(-691, log=TRUE) > > [1] -37.05315 > >> exp(37^2/2) > > [1] 1.881797e+297 > >> exp(-37^2/2) > > [1] 5.314068e-298 > > > > Notice that floating point representation cuts out at 1e+/-308 or so. If > you want to go outside that range, you may need explicit manipulation of > the log values. qnorm() itself seems quite happy with much smaller values: > > > >> qnorm(-5000, log=TRUE) > > [1] -99.94475 > > > > -pd > > > >> On 21 Jun 2019, at 17:11 , jing hua zhao <jinghuazhao at hotmail.com> > wrote: > >> > >> Dear Rui, > >> > >> Thanks for your quick reply -- this allows me to see the bottom of > this. I was hoping we could have a handle of those p in genmoics such as > 1e-300 or smaller. > >> > >> Best wishes, > >> > >> > >> Jing Hua > >> > >> ________________________________ > >> From: Rui Barradas <ruipbarradas at sapo.pt> > >> Sent: 21 June 2019 15:03 > >> To: jing hua zhao; r-devel at r-project.org > >> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z > >> > >> Hello, > >> > >> Well, try it: > >> > >> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05) > >> z <- qnorm(p/2) > >> > >> pnorm(z) > >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 > >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 > >> #[11] 1.110223e-16 > >> p/2 > >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 > >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 > >> #[11] 1.110223e-16 > >> > >> exp(z*z/2) > >> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10 > >> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13 > >> #[11] 4.314798e+14 > >> > >> > >> p is the smallest possible such that 1 + p != 1 and I couldn't find > >> anything to worry about. > >> > >> > >> R version 3.6.0 (2019-04-26) > >> Platform: x86_64-pc-linux-gnu (64-bit) > >> Running under: Ubuntu 19.04 > >> > >> Matrix products: default > >> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 > >> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 > >> > >> locale: > >> [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8 > >> [5] LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8 > >> [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods > >> [7] base > >> > >> other attached packages: > >> > >> [many packages loaded] > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> ?s 15:24 de 21/06/19, jing hua zhao escreveu: > >>> Dear R-developers, > >>> > >>> I am keen to calculate exp(z*z/2) with z=qnorm(p/2) and p is very > small. I wonder if anyone has experience with this? > >>> > >>> Thanks very much in advance, > >>> > >>> > >>> Jing Hua > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-devel at r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Martin Maechler
2019-Jun-24 08:37 UTC
[Rd] Calculation of e^{z^2/2} for a normal deviate z
>>>>> William Dunlap via R-devel >>>>> on Sun, 23 Jun 2019 10:34:47 -0700 writes: >>>>> William Dunlap via R-devel >>>>> on Sun, 23 Jun 2019 10:34:47 -0700 writes:> include/Rmath.h declares a set of 'logspace' functions for use at the C > level. I don't think there are core R functions that call them. > /* Compute the log of a sum or difference from logs of terms, i.e., > * > * log (exp (logx) + exp (logy)) > * or log (exp (logx) - exp (logy)) > * > * without causing overflows or throwing away too much accuracy: > */ > double Rf_logspace_add(double logx, double logy); > double Rf_logspace_sub(double logx, double logy); > double Rf_logspace_sum(const double *logx, int nx); > Bill Dunlap > TIBCO Software > wdunlap tibco.com Yes, indeed, thank you, Bill! But they *have* been in use by core R functions for a long time in pgamma, pbeta and related functions. [and I have had changes in *hyper.c where logspace_add() is used too, for several years now (since 2015) but I no longer know which concrete accuracy problem that addresses, so have not yet committed it] Martin Maechler ETH Zurich and R Core Team > On Sun, Jun 23, 2019 at 1:40 AM Ben Bolker <bbolker at gmail.com> wrote: >> >> I agree with many the sentiments about the wisdom of computing very >> small p-values (although the example below may win some kind of a prize: >> I've seen people talking about p-values of the order of 10^(-2000), but >> never 10^(-(10^8)) !). That said, there are a several tricks for >> getting more reasonable sums of very small probabilities. The first is >> to scale the p-values by dividing the *largest* of the probabilities, >> then do the (p/sum(p)) computation, then multiply the result (I'm sure >> this is described/documented somewhere). More generally, there are >> methods for computing sums on the log scale, e.g. >> >> >> https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html >> >> I don't know where this has been implemented in the R ecosystem, but >> this sort of computation is the basis of the "Brobdingnag" package for >> operating on very large ("Brobdingnagian") and very small >> ("Lilliputian") numbers. >> >> >> On 2019-06-21 6:58 p.m., jing hua zhao wrote: >> > Hi Peter, Rui, Chrstophe and Gabriel, >> > >> > Thanks for your inputs -- the use of qnorm(., log=TRUE) is a good point >> in line with pnorm with which we devised log(p) as >> > >> > log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE) >> > >> > that could do really really well for large z compared to Rmpfr. Maybe I >> am asking too much since >> > >> > z <-20000 >> >> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE)) >> > [1] "1.660579603192917090365313727164e-86858901" >> > >> > already gives a rarely seen small p value. I gather I also need a >> multiple precision exp() and their sum since exp(z^2/2) is also a Bayes >> Factor so I get log(x_i )/sum_i log(x_i) instead. To this point, I am >> obliged to clarify, see >> https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf >> . >> > >> > I agree many feel geneticists go to far with small p values which I >> would have difficulty to argue againston the other hand it is also expected >> to see these in a non-genetic context. For instance the Framingham study >> was established in 1948 just got $34m for six years on phenotypewide >> association which we would be interesting to see. >> > >> > Best wishes, >> > >> > >> > Jing Hua >> > >> > >> > ________________________________ >> > From: peter dalgaard <pdalgd at gmail.com> >> > Sent: 21 June 2019 16:24 >> > To: jing hua zhao >> > Cc: Rui Barradas; r-devel at r-project.org >> > Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z >> > >> > You may want to look into using the log option to qnorm >> > >> > e.g., in round figures: >> > >> >> log(1e-300) >> > [1] -690.7755 >> >> qnorm(-691, log=TRUE) >> > [1] -37.05315 >> >> exp(37^2/2) >> > [1] 1.881797e+297 >> >> exp(-37^2/2) >> > [1] 5.314068e-298 >> > >> > Notice that floating point representation cuts out at 1e+/-308 or so. If >> you want to go outside that range, you may need explicit manipulation of >> the log values. qnorm() itself seems quite happy with much smaller values: >> > >> >> qnorm(-5000, log=TRUE) >> > [1] -99.94475 >> > >> > -pd >> > >> >> On 21 Jun 2019, at 17:11 , jing hua zhao <jinghuazhao at hotmail.com> >> wrote: >> >> >> >> Dear Rui, >> >> >> >> Thanks for your quick reply -- this allows me to see the bottom of >> this. I was hoping we could have a handle of those p in genmoics such as >> 1e-300 or smaller. >> >> >> >> Best wishes, >> >> >> >> >> >> Jing Hua >> >> >> >> ________________________________ >> >> From: Rui Barradas <ruipbarradas at sapo.pt> >> >> Sent: 21 June 2019 15:03 >> >> To: jing hua zhao; r-devel at r-project.org >> >> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z >> >> >> >> Hello, >> >> >> >> Well, try it: >> >> >> >> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05) >> >> z <- qnorm(p/2) >> >> >> >> pnorm(z) >> >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 >> >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 >> >> #[11] 1.110223e-16 >> >> p/2 >> >> # [1] 7.450581e-09 1.228888e-09 2.026908e-10 3.343152e-11 5.514145e-12 >> >> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16 >> >> #[11] 1.110223e-16 >> >> >> >> exp(z*z/2) >> >> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10 >> >> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13 >> >> #[11] 4.314798e+14 >> >> >> >> >> >> p is the smallest possible such that 1 + p != 1 and I couldn't find >> >> anything to worry about. >> >> >> >> >> >> R version 3.6.0 (2019-04-26) >> >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> Running under: Ubuntu 19.04 >> >> >> >> Matrix products: default >> >> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 >> >> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 >> >> >> >> locale: >> >> [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8 >> >> [5] LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8 >> >> [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods >> >> [7] base >> >> >> >> other attached packages: >> >> >> >> [many packages loaded] >> >> >> >> >> >> Hope this helps, >> >> >> >> Rui Barradas >> >> >> >> ?s 15:24 de 21/06/19, jing hua zhao escreveu: >> >>> Dear R-developers, >> >>> >> >>> I am keen to calculate exp(z*z/2) with z=qnorm(p/2) and p is very >> small. I wonder if anyone has experience with this? >> >>> >> >>> Thanks very much in advance, >> >>> >> >>> >> >>> Jing Hua >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> ______________________________________________ >> >>> R-devel at r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >>> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> >> R-devel at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > -- >> > Peter Dalgaard, Professor, >> > Center for Statistics, Copenhagen Business School >> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> > Phone: (+45)38153501 >> > Office: A 4.23 >> > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel