While working on 'random walk' applications, I got interested in optimizing noisy objective functions. As an (artificial) example, the following is the Rosenbrock function, where Gaussian noise of standard deviation `sd = 0.01` is added to the function value. fn <- function(x) (1+rnorm(1, sd=0.01)) * adagio::fnRosenbrock(x) To smooth out the noise, define another function `fnk(x, k = 1)` that calls `fn` k times and returns the mean value of those k function applications. fnk <- function(x, k = 1) { # fnk(x) same as fn(x) rv = 0.0 for (i in 1:k) rv <- rv + fn(x) return(rv/n) } When we apply several optimization solvers to this noisy and smoothed noise functions we get for instance the following results: (Starting point is always `rep(0.1, 5)`, maximal number of iterations 5000, relative tolerance 1e-12, and the optimization is successful if the function value at the minimum is below 1e-06.) k nmk anms neldermead ucminf optim_BFGS --------------------------------------------------- 1 0.21 0.32 0.13 0.00 0.00 3 0.52 0.63 0.50 0.00 0.00 10 0.81 0.91 0.87 0.00 0.00 Solvers: nmk = dfoptim::nmk, anms = pracma::anms [both Nelder-Mead codes] neldermead = nloptr::neldermead, ucminf = ucminf::ucminf, optim_BFGS = optim with method "BFGS" Read the table as follows: `nmk` will be successful in 21% of the trials, while for example `optim` will never come close to the true minimum. I think it is reasonable to assume that gradient-based methods do poorly with noisy objectives, though I did not expect to see them fail so clearly. On the other hand, Nelder-Mead implementations do quite well if there is not too much noise. In real-world applications, it will often not be possible to do the same measurement several times. That is, we will then have to live with `k = 1`. In my applications with long 'random walks', doing the calculations several times in a row will really need some time. QUESTION: What could be other approaches to minimize noisy functions? I looked through some "Stochastic Programming" tutorials and did not find them very helpful in this situation. Of course, I might have looked into these works too superficially.
This is a huge topic. Differential evolution (DEoptim package) would be one good starting point; there is a simulated annealing method built into optim() (method = "SANN") but it usually requires significant tuning. Also genetic algorithms. You could look at the NLopt list of algorithms https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/ , focusing on the options for derivative-free global optimization , and then use them via the nloptr package. Good luck ... On 2023-08-13 3:28 p.m., Hans W wrote:> While working on 'random walk' applications, I got interested in > optimizing noisy objective functions. As an (artificial) example, the > following is the Rosenbrock function, where Gaussian noise of standard > deviation `sd = 0.01` is added to the function value. > > fn <- function(x) > (1+rnorm(1, sd=0.01)) * adagio::fnRosenbrock(x) > > To smooth out the noise, define another function `fnk(x, k = 1)` that > calls `fn` k times and returns the mean value of those k function > applications. > > fnk <- function(x, k = 1) { # fnk(x) same as fn(x) > rv = 0.0 > for (i in 1:k) rv <- rv + fn(x) > return(rv/n) > } > > When we apply several optimization solvers to this noisy and smoothed > noise functions we get for instance the following results: > (Starting point is always `rep(0.1, 5)`, maximal number of iterations 5000, > relative tolerance 1e-12, and the optimization is successful if the > function value at the minimum is below 1e-06.) > > k nmk anms neldermead ucminf optim_BFGS > --------------------------------------------------- > 1 0.21 0.32 0.13 0.00 0.00 > 3 0.52 0.63 0.50 0.00 0.00 > 10 0.81 0.91 0.87 0.00 0.00 > > Solvers: nmk = dfoptim::nmk, anms = pracma::anms [both Nelder-Mead codes] > neldermead = nloptr::neldermead, > ucminf = ucminf::ucminf, optim_BFGS = optim with method "BFGS" > > Read the table as follows: `nmk` will be successful in 21% of the > trials, while for example `optim` will never come close to the true > minimum. > > I think it is reasonable to assume that gradient-based methods do > poorly with noisy objectives, though I did not expect to see them fail > so clearly. On the other hand, Nelder-Mead implementations do quite > well if there is not too much noise. > > In real-world applications, it will often not be possible to do the > same measurement several times. That is, we will then have to live > with `k = 1`. In my applications with long 'random walks', doing the > calculations several times in a row will really need some time. > > QUESTION: What could be other approaches to minimize noisy functions? > > I looked through some "Stochastic Programming" tutorials and did not > find them very helpful in this situation. Of course, I might have > looked into these works too superficially. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks, Ben. For certain reasons, I would *not* like to apply global optimization solvers, e.g., for reasons of higher dimensions and longer running times. I was hoping for suggestions from the "Stochastic Programming" side. And please, never suggest `optim` with method "SANN". See the Optimization cheatsheet I wrote with John Nash: "NOTE: CG (John is the author!) and SANN are NOT recommended." https://github.com/hwborchers/CheatSheets/blob/main/Base%20R%20Optim%20Cheatsheet.pdf Hans W. On Sun, 13 Aug 2023 at 21:28, Hans W <hwborchers at gmail.com> wrote:> > While working on 'random walk' applications, I got interested in > optimizing noisy objective functions. As an (artificial) example, the > following is the Rosenbrock function, where Gaussian noise of standard > deviation `sd = 0.01` is added to the function value. > > fn <- function(x) > (1+rnorm(1, sd=0.01)) * adagio::fnRosenbrock(x) > > To smooth out the noise, define another function `fnk(x, k = 1)` that > calls `fn` k times and returns the mean value of those k function > applications. > > fnk <- function(x, k = 1) { # fnk(x) same as fn(x) > rv = 0.0 > for (i in 1:k) rv <- rv + fn(x) > return(rv/n) > } > > When we apply several optimization solvers to this noisy and smoothed > noise functions we get for instance the following results: > (Starting point is always `rep(0.1, 5)`, maximal number of iterations 5000, > relative tolerance 1e-12, and the optimization is successful if the > function value at the minimum is below 1e-06.) > > k nmk anms neldermead ucminf optim_BFGS > --------------------------------------------------- > 1 0.21 0.32 0.13 0.00 0.00 > 3 0.52 0.63 0.50 0.00 0.00 > 10 0.81 0.91 0.87 0.00 0.00 > > Solvers: nmk = dfoptim::nmk, anms = pracma::anms [both Nelder-Mead codes] > neldermead = nloptr::neldermead, > ucminf = ucminf::ucminf, optim_BFGS = optim with method "BFGS" > > Read the table as follows: `nmk` will be successful in 21% of the > trials, while for example `optim` will never come close to the true > minimum. > > I think it is reasonable to assume that gradient-based methods do > poorly with noisy objectives, though I did not expect to see them fail > so clearly. On the other hand, Nelder-Mead implementations do quite > well if there is not too much noise. > > In real-world applications, it will often not be possible to do the > same measurement several times. That is, we will then have to live > with `k = 1`. In my applications with long 'random walks', doing the > calculations several times in a row will really need some time. > > QUESTION: What could be other approaches to minimize noisy functions? > > I looked through some "Stochastic Programming" tutorials and did not > find them very helpful in this situation. Of course, I might have > looked into these works too superficially.
More to provide another perspective, I'll give the citation of some work with Harry Joe and myself from over 2 decades ago. @Article{, author = {Joe, Harry and Nash, John C.}, title = {Numerical optimization and surface estimation with imprecise function evaluations}, journal = {Statistics and Computing}, year = {2003}, volume = {13}, pages = {277--286}, } Essentially this fits a quadratic approximately by regression, assuming the returned objective is imprecise. It is NOT good for high dimension, of course, and is bedeviled by needing to have some idea of the scale of the imprecision i.e., the noise amplitude. However, it does work for some applications. Harry had some success with Monte Carlo evaluation of multidimensional integrals optimizing crude quadratures. That is, multiple crude quadrature could be more efficient than single precise quadrature. However, this approach is not one that can be blindly applied. There are all sorts of issues about what point cloud to keep as the "fit model, move to model minimum, add points, delete points" process evolves. JN On 2023-08-13 15:28, Hans W wrote:> While working on 'random walk' applications, I got interested in > optimizing noisy objective functions. As an (artificial) example, the > following is the Rosenbrock function, where Gaussian noise of standard > deviation `sd = 0.01` is added to the function value. > > fn <- function(x) > (1+rnorm(1, sd=0.01)) * adagio::fnRosenbrock(x) > > To smooth out the noise, define another function `fnk(x, k = 1)` that > calls `fn` k times and returns the mean value of those k function > applications. > > fnk <- function(x, k = 1) { # fnk(x) same as fn(x) > rv = 0.0 > for (i in 1:k) rv <- rv + fn(x) > return(rv/n) > } > > When we apply several optimization solvers to this noisy and smoothed > noise functions we get for instance the following results: > (Starting point is always `rep(0.1, 5)`, maximal number of iterations 5000, > relative tolerance 1e-12, and the optimization is successful if the > function value at the minimum is below 1e-06.) > > k nmk anms neldermead ucminf optim_BFGS > --------------------------------------------------- > 1 0.21 0.32 0.13 0.00 0.00 > 3 0.52 0.63 0.50 0.00 0.00 > 10 0.81 0.91 0.87 0.00 0.00 > > Solvers: nmk = dfoptim::nmk, anms = pracma::anms [both Nelder-Mead codes] > neldermead = nloptr::neldermead, > ucminf = ucminf::ucminf, optim_BFGS = optim with method "BFGS" > > Read the table as follows: `nmk` will be successful in 21% of the > trials, while for example `optim` will never come close to the true > minimum. > > I think it is reasonable to assume that gradient-based methods do > poorly with noisy objectives, though I did not expect to see them fail > so clearly. On the other hand, Nelder-Mead implementations do quite > well if there is not too much noise. > > In real-world applications, it will often not be possible to do the > same measurement several times. That is, we will then have to live > with `k = 1`. In my applications with long 'random walks', doing the > calculations several times in a row will really need some time. > > QUESTION: What could be other approaches to minimize noisy functions? > > I looked through some "Stochastic Programming" tutorials and did not > find them very helpful in this situation. Of course, I might have > looked into these works too superficially. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Sun, 13 Aug 2023, Hans W writes:> While working on 'random walk' applications, I got interested in > optimizing noisy objective functions. As an (artificial) example, the > following is the Rosenbrock function, where Gaussian noise of standard > deviation `sd = 0.01` is added to the function value. > > fn <- function(x) > (1+rnorm(1, sd=0.01)) * adagio::fnRosenbrock(x) > > To smooth out the noise, define another function `fnk(x, k = 1)` that > calls `fn` k times and returns the mean value of those k function > applications. > > fnk <- function(x, k = 1) { # fnk(x) same as fn(x) > rv = 0.0 > for (i in 1:k) rv <- rv + fn(x) > return(rv/n) > } > > When we apply several optimization solvers to this noisy and smoothed > noise functions we get for instance the following results: > (Starting point is always `rep(0.1, 5)`, maximal number of iterations 5000, > relative tolerance 1e-12, and the optimization is successful if the > function value at the minimum is below 1e-06.) > > k nmk anms neldermead ucminf optim_BFGS > --------------------------------------------------- > 1 0.21 0.32 0.13 0.00 0.00 > 3 0.52 0.63 0.50 0.00 0.00 > 10 0.81 0.91 0.87 0.00 0.00 > > Solvers: nmk = dfoptim::nmk, anms = pracma::anms [both Nelder-Mead codes] > neldermead = nloptr::neldermead, > ucminf = ucminf::ucminf, optim_BFGS = optim with method "BFGS" > > Read the table as follows: `nmk` will be successful in 21% of the > trials, while for example `optim` will never come close to the true > minimum. > > I think it is reasonable to assume that gradient-based methods do > poorly with noisy objectives, though I did not expect to see them fail > so clearly. On the other hand, Nelder-Mead implementations do quite > well if there is not too much noise. > > In real-world applications, it will often not be possible to do the > same measurement several times. That is, we will then have to live > with `k = 1`. In my applications with long 'random walks', doing the > calculations several times in a row will really need some time. > > QUESTION: What could be other approaches to minimize noisy functions? > > I looked through some "Stochastic Programming" tutorials and did not > find them very helpful in this situation. Of course, I might have > looked into these works too superficially. >Since Nelder--Mead worked /relatively/ well: in my experience, its results can sometimes be improved by restarting it, i.e. re-initializing the simplex. See this note here: http://enricoschumann.net/R/restartnm.htm , which incidentally also uses the Rosenbrock function. Just for the record, I think Differential Evolution (DE) can handle such problems well, though it would usually need more iterations. As computational "proof" ;-) , I run DE 50 times for the k == 1 case, and each time store whether the resulting objective-function value is below 1e-6. I let DE take way more function evaluations (population of 100 times 500 generations = 50000 function evaluations); but it gets a value below 1e-6 in all 50 cases. library("NMOF") ofv.below.threshold <- logical(50) for (i in seq_along(ofv.below.threshold)) { sol <- DEopt(fnk, algo = list( nP = 100, nG = 500, min = rep( 5, 5), max = rep( 10, 5))) ofv.below.threshold[i] <- sol$OFvalue < 1e-6 } sum(ofv.below.threshold)/length(ofv.below.threshold) (These 50 runs take less than half a minute on my machine.) Note that I have on purpose initialized the population in the range 5 to 10, i.e. way off the optimum. -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net