Andreï V. Kostyrka
2019-Apr-12 14:52 UTC
[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling
Dear all, This is the first time I am posting to the r-devel list. On StackOverflow, they suggested that the strange behaviour of integrate() was more bug-like. I am providing a short version of the question (full one with plots: https://stackoverflow.com/q/55639401). Suppose one wants integrate a function that is just a product of two density functions (like gamma). The support of the random variable is (-Inf, 0]. The scale parameter of the distribution is quite small (around 0.01), so often, the standard integration routine would fail to integrate a function that is non-zero on a very small section of the negative line (like [-0.02, -0.01], where it takes huge values, and almost 0 everywhere else). R?s integrate would often return the machine epsilon as a result. So I stretch the function around the zero by an inverse of the scale parameter, compute the integral, and then divide it by the scale. Sometimes, this re-scaling also failed, so I did both if the first result was very small. Today when integration of the rescaled function suddenly yielded a value of 1.5 instead of 3.5 (not even zero). The MWE is below. cons <- -0.020374721416129591 sc <- 0.00271245601724757383 sh <- 5.704 f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n") curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n") sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #? Checking by summation: 3.575294 sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294 str(integrate(f, -Inf, 0)) # Gives 3.575294 # $ value?????? : num 3.58 # $ abs.error?? : num 1.71e-06 # $ subdivisions: int 10 str(integrate(f, -Inf, 0, numstab = sc)) # $ value?????? : num 1.5 # What?! # $ abs.error?? : num 0.000145 # What?! # $ subdivisions: int 2 It stop at just two subdivisions! The problem is, I cannot try various stabilising multipliers for the function because I have to compute this integral thousands of times for thousands of parameter values on thousands of sample windows for hundreds on models, so even in the super-computer cluster, this takes weeks. Besides that, reducing the rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether this guarantees success (and reducing it to 1e-7 slowed down the computations in some cases). And I have looked at the Fortran code of the quadrature just to see the integration rule, and was wondering. How can I make sure that the integration routine will not produce such wrong results for such a function, and the integration will still be fast? Yours sincerely, Andre? V. Kostyrka
William Dunlap
2019-Apr-14 16:52 UTC
[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling
integrate(f, xmin, xmax) will have problems when f(x) is 0 over large parts of (xmin,xmax). It doesn't have any clues to where the non-zero regions are. It computes f(x) at 21 points at each step and if all of those are zero (or some other constant?) for a few steps, it calls it a day. If you can narrow down the integration interval to the interesting part of the function's domain you will get better results. By the way, here is a way to see where integrate(f) evaluates f() (the keep.xy=TRUE argument doesn't seem to do anything).> debugIntegrate <- function(f){ n_calls <- 0 x_args <- list() other_args <- list() value <- list() function(x, ...) { n_calls <<- n_calls + 1 x_args[[n_calls]] <<- x other_args[[n_calls]] <<- list(...) v <- f(x, ...) value[[n_calls]] <<- v v } }> str(integrate(DF <- debugIntegrate(f), -Inf, 0, numstab = sc))List of 5 $ value : num 1.5 $ abs.error : num 0.000145 $ subdivisions: int 2 $ message : chr "OK" $ call : language integrate(f = DF <- debugIntegrate(f), lower -Inf, upper = 0, numstab = sc) - attr(*, "class")= chr "integrate"> curve(f(x, sc), min(unlist(environment(DF)$x_args)), 0, n = 501, main "Scaled f", bty = "n") > with(environment(DF), points(unlist(x_args), unlist(value)))Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Apr 14, 2019 at 5:13 AM Andre? V. Kostyrka <andrei.kostyrka at uni.lu> wrote:> Dear all, > > This is the first time I am posting to the r-devel list. On > StackOverflow, they suggested that the strange behaviour of integrate() > was more bug-like. I am providing a short version of the question (full > one with plots: https://stackoverflow.com/q/55639401). > > Suppose one wants integrate a function that is just a product of two > density functions (like gamma). The support of the random variable is > (-Inf, 0]. The scale parameter of the distribution is quite small > (around 0.01), so often, the standard integration routine would fail to > integrate a function that is non-zero on a very small section of the > negative line (like [-0.02, -0.01], where it takes huge values, and > almost 0 everywhere else). R?s integrate would often return the machine > epsilon as a result. So I stretch the function around the zero by an > inverse of the scale parameter, compute the integral, and then divide it > by the scale. Sometimes, this re-scaling also failed, so I did both if > the first result was very small. > > Today when integration of the rescaled function suddenly yielded a value > of 1.5 instead of 3.5 (not even zero). The MWE is below. > > cons <- -0.020374721416129591 > sc <- 0.00271245601724757383 > sh <- 5.704 > f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, > scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab > > curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n") > curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n") > > sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # Checking by summation: 3.575294 > sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294 > str(integrate(f, -Inf, 0)) # Gives 3.575294 > # $ value : num 3.58 > # $ abs.error : num 1.71e-06 > # $ subdivisions: int 10 > str(integrate(f, -Inf, 0, numstab = sc)) > # $ value : num 1.5 # What?! > # $ abs.error : num 0.000145 # What?! > # $ subdivisions: int 2 > > It stop at just two subdivisions! The problem is, I cannot try various > stabilising multipliers for the function because I have to compute this > integral thousands of times for thousands of parameter values on > thousands of sample windows for hundreds on models, so even in the > super-computer cluster, this takes weeks. Besides that, reducing the > rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether > this guarantees success (and reducing it to 1e-7 slowed down the > computations in some cases). And I have looked at the Fortran code of > the quadrature just to see the integration rule, and was wondering. > > How can I make sure that the integration routine will not produce such > wrong results for such a function, and the integration will still be fast? > > Yours sincerely, > Andre? V. Kostyrka > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Rui Barradas
2019-Jun-04 18:41 UTC
[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling
Hello, A solution is to use package cubature, both unscaled and scaled versions return the correct result, 3.575294. But the performance penalty is BIG. This is because the number of evaluations is much bigger. library(cubature) cubintegrate(f, -Inf, 0, method = "hcubature") #$integral #[1] 3.575294 # #$error #[1] 1.494242e-07 # #$neval #[1] 375 # #$returnCode #[1] 0 cubintegrate(f, -Inf, 0, method = "hcubature", numstab = sc) #$integral #[1] 3.575294 # #$error #[1] 1.064195e-05 # #$neval #[1] 255 # #$returnCode #[1] 0 library(microbenchmark) microbenchmark( base1 = integrate(f, -Inf, 0), base2 = integrate(f, -Inf, 0, numstab = sc), cuba1 = cubintegrate(f, -Inf, 0, method = "hcubature"), cuba2 = cubintegrate(f, -Inf, 0, method = "hcubature", numstab = sc) ) Hope this helps, Rui Barradas ?s 15:52 de 12/04/19, Andre? V. Kostyrka escreveu:> Dear all, > > This is the first time I am posting to the r-devel list. On > StackOverflow, they suggested that the strange behaviour of integrate() > was more bug-like. I am providing a short version of the question (full > one with plots: https://stackoverflow.com/q/55639401). > > Suppose one wants integrate a function that is just a product of two > density functions (like gamma). The support of the random variable is > (-Inf, 0]. The scale parameter of the distribution is quite small > (around 0.01), so often, the standard integration routine would fail to > integrate a function that is non-zero on a very small section of the > negative line (like [-0.02, -0.01], where it takes huge values, and > almost 0 everywhere else). R?s integrate would often return the machine > epsilon as a result. So I stretch the function around the zero by an > inverse of the scale parameter, compute the integral, and then divide it > by the scale. Sometimes, this re-scaling also failed, so I did both if > the first result was very small. > > Today when integration of the rescaled function suddenly yielded a value > of 1.5 instead of 3.5 (not even zero). The MWE is below. > > cons <- -0.020374721416129591 > sc <- 0.00271245601724757383 > sh <- 5.704 > f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, > scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab > > curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n") > curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n") > > sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #? Checking by summation: 3.575294 > sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294 > str(integrate(f, -Inf, 0)) # Gives 3.575294 > # $ value?????? : num 3.58 > # $ abs.error?? : num 1.71e-06 > # $ subdivisions: int 10 > str(integrate(f, -Inf, 0, numstab = sc)) > # $ value?????? : num 1.5 # What?! > # $ abs.error?? : num 0.000145 # What?! > # $ subdivisions: int 2 > > It stop at just two subdivisions! The problem is, I cannot try various > stabilising multipliers for the function because I have to compute this > integral thousands of times for thousands of parameter values on > thousands of sample windows for hundreds on models, so even in the > super-computer cluster, this takes weeks. Besides that, reducing the > rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether > this guarantees success (and reducing it to 1e-7 slowed down the > computations in some cases). And I have looked at the Fortran code of > the quadrature just to see the integration rule, and was wondering. > > How can I make sure that the integration routine will not produce such > wrong results for such a function, and the integration will still be fast? > > Yours sincerely, > Andre? V. Kostyrka > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Apparently Analagous Threads
- [LLVMdev] Concerning http://llvm.org/ProjectsWithLLVM
- [LLVMdev] Concerning http://llvm.org/ProjectsWithLLVM
- [LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?
- Unwanted behaviour of bw.nrd: sometimes, zero is returned as a valid bandwidth
- A question on Unit Root Test using "urca" toolbox