Recently Marie Boehnstedt reported a bug in the nlm() function for function minimization when both gradient and hessian are provided. She has provided a work-around for some cases and it seems this will get incorporated into the R function eventually. However, despite the great number of packages on CRAN, there does not appear to be a straightforward Newton approach to function minimization. This may be because providing the code for a hessian (the matrix of second derivatives) is a lot of work and error-prone. (R could also use some good tools for Automatic Differentiation). I have also noted that a number of researchers try to implement textbook methods and run into trouble when maths and computing are not quite in sync. Therefore, I wrote a simple safeguarded Newton and put a small package on R-forge at https://r-forge.r-project.org/R/?group_id=395 Note that Newton's method is used to solve nonlinear equations. In fact, for function minimization, we apply it to solve g(x) = 0 where g is the gradient and x is the vector of parameters. In part, safeguards ensure we reduce the function f(x) at each step to avoid some of the difficulties that may arise from a non-positive-definite hessian. In the package, I have a very simple quadratic test, the Rosenbrock test function and the Wood test function. The method fails on the last function -- the hessian is not positive definite where it stops. Before submitting this package to CRAN, I would like to see its behaviour on other test problems, but am lazy enough to wish to avoid creating the hessian code. If anyone has such code, it would be very welcome. Please contact me off-list. If I get some workable examples that are open for public view, I'll report back here. John Nash
Martin Maechler
2017-Apr-19 12:56 UTC
[R] Safeguarded Newton method for function minimization
>>>>> J C Nash <profjcnash at gmail.com> >>>>> on Tue, 18 Apr 2017 13:32:52 -0400 writes:> Recently Marie Boehnstedt reported a bug in the nlm() > function for function minimization when both gradient and > hessian are provided. Indeed, on R's Bugzilla here : https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249 > She has provided a work-around for some cases and it seems > this will get incorporated into the R function eventually. indeed.... the first part -- fixing the wrong choldc() -- in the C code has been in my version of R-devel for a while now. See my follow up https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4 and #c5 including my attachment which is an extended version of Marie's original : https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246 As that mentions _and_ shows: Fixing choldc() solves the problem for the 2D Rosenbrook example, but _not_ the 4D Wood example. > However, despite the great number of packages on CRAN, > there does not appear to be a straightforward Newton > approach to function minimization. This may be because > providing the code for a hessian (the matrix of second > derivatives) is a lot of work and error-prone. > (R could also use some good tools for Automatic Differentiation). The last part of what you say above is not at all true: R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3() and my attachment above _shows_ you how to use deriv3() to get both the gradient and the hessian via a version of automatic differentiation completely effortlessly !! For ease of readers, that part here, with an example: ##' Wood function (4 arguments 'x1' ... 'x4') fwood <- function(x1,x2,x3,x4) { 100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 + 10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4) } ## automatically construct correct gradient and hessian: woodf.gh <- function(x) { stopifnot(is.numeric(x)) woodGH <- deriv3(body(fwood)[[2]], c("x1","x2","x3","x4"), function.arg=TRUE) if(length(x) == 4) woodGH(x[1],x[2],x[3],x[4]) else if(is.matrix(x) && ncol(x) == 4) woodGH(x[,1], x[,2], x[,3], x[,4]) else stop("'x' must have length 4 or be a matrix with 4 columns") } and now get both the function f(x), gradient g(x) and Hessian H(x) for x = (0 0 0 0), x = (1 1 1 1), and x = (1 2 3 4) with such a simple calle : > woodf.gh(rbind(0, 1, 1:4)) [1] 42.0 0.0 2514.4 attr(,"gradient") x1 x2 x3 x4 [1,] -2 -40.0 -2 -40.0 [2,] 0 0.0 0 0.0 [3,] -400 279.6 5404 -819.6 attr(,"hessian") , , x1 x1 x2 x3 x4 [1,] 2 0 0 0 [2,] 802 -400 0 0 [3,] 402 -400 0 0 , , x2 x1 x2 x3 x4 [1,] 0 220.2 0 19.8 [2,] -400 220.2 0 19.8 [3,] -400 220.2 0 19.8 , , x3 x1 x2 x3 x4 [1,] 0 0 2 0 [2,] 0 0 722 -360 [3,] 0 0 8282 -1080 , , x4 x1 x2 x3 x4 [1,] 0 19.8 0 200.2 [2,] 0 19.8 -360 200.2 [3,] 0 19.8 -1080 200.2 > > I have also noted that a number of researchers try to > implement textbook methods and run into trouble when maths > and computing are not quite in sync. Therefore, I wrote a > simple safeguarded Newton and put a small package on > R-forge at > https://r-forge.r-project.org/R/?group_id=395 > Note that Newton's method is used to solve nonlinear > equations. In fact, for function minimization, we apply it > to solve g(x) = 0 where g is the gradient and x is the > vector of parameters. In part, safeguards ensure we reduce > the function f(x) at each step to avoid some of the > difficulties that may arise from a non-positive-definite > hessian. > In the package, I have a very simple quadratic test, the > Rosenbrock test function and the Wood test function. The > method fails on the last function -- the hessian is not > positive definite where it stops. > Before submitting this package to CRAN, I would like to > see its behaviour on other test problems, but am lazy > enough to wish to avoid creating the hessian code. If > anyone has such code, it would be very welcome. Please > contact me off-list. If I get some workable examples that > are open for public view, I'll report back here. > John Nash > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I should have given a more detailed explanation about Automatic Differentiation. As Martin points out, there are some AD elements in deriv etc., but to my view they are related more to symbolic differentiation, and not in the same space as AD Model Builder or its successor whose name eludes me at the moment. Nor like ADIFOR or ADOLC that can provide the derivative of a subprogram. However, Martin's example of the Wood function shows what can be done, and this is a valuable case. Indeed it shows a syntax of which I was previously unaware! Because it is fairly complicated, I'm not sure I would say "effortlessly", but I will certainly be copying it. Thanks. Previously, I've done the steps to get the function partly manually to get to the functional form, as in Chapter 10 of my Nonlinear Parameter Optimization book. Note that there are functions all in R that are similar to the deriv() and deriv3() in package nlsr by Duncan Murdoch and me. There is a facility to extend the derivatives table to new functions. We can also generate the R gradient functions from expressions. I'll take this chance to note that the Wood problem is NOT solved by nlm() with analytic hessian, nor by a safeguarded or unsafeguarded Newton. And I ran a version of the Fortran code UNCMIN that is the origin of nlm() and got a similar early termination. But the hessian when these codes halt is not positive definite, so we are outside the assumed conditions for these methods. Dealing with a non-pos-def hessian in function minimization has created a small industry in Ph Ds for researchers in optimization. Thanks again for Martin's work. JN On 2017-04-19 08:56 AM, Martin Maechler wrote:>>>>>> J C Nash <profjcnash at gmail.com> >>>>>> on Tue, 18 Apr 2017 13:32:52 -0400 writes: > > > Recently Marie Boehnstedt reported a bug in the nlm() > > function for function minimization when both gradient and > > hessian are provided. > Indeed, on R's Bugzilla here : > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249 > > > She has provided a work-around for some cases and it seems > > this will get incorporated into the R function eventually. > > indeed.... the first part -- fixing the wrong choldc() -- in the C code has > been in my version of R-devel for a while now. > > See my follow up > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4 and #c5 > > including my attachment which is an extended version of Marie's original : > https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246 > > As that mentions _and_ shows: Fixing choldc() solves the problem > for the 2D Rosenbrook example, but _not_ the 4D Wood example. > > > However, despite the great number of packages on CRAN, > > there does not appear to be a straightforward Newton > > approach to function minimization. This may be because > > providing the code for a hessian (the matrix of second > > derivatives) is a lot of work and error-prone. > > > (R could also use some good tools for Automatic Differentiation). > > The last part of what you say above is not at all true: > R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3() > and my attachment above _shows_ you how to use deriv3() to get > both the gradient and the hessian via a version of automatic > differentiation completely effortlessly !! > > For ease of readers, that part here, with an example: > > ##' Wood function (4 arguments 'x1' ... 'x4') > fwood <- function(x1,x2,x3,x4) { > 100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 + > 10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4) > } > ## automatically construct correct gradient and hessian: > woodf.gh <- function(x) { > stopifnot(is.numeric(x)) > woodGH <- deriv3(body(fwood)[[2]], > c("x1","x2","x3","x4"), function.arg=TRUE) > if(length(x) == 4) > woodGH(x[1],x[2],x[3],x[4]) > else if(is.matrix(x) && ncol(x) == 4) > woodGH(x[,1], x[,2], x[,3], x[,4]) > else stop("'x' must have length 4 or be a matrix with 4 columns") > } > > and now get both the function f(x), gradient g(x) and Hessian H(x) > for > x = (0 0 0 0), > x = (1 1 1 1), and > x = (1 2 3 4) > > with such a simple calle : > > > woodf.gh(rbind(0, 1, 1:4)) > [1] 42.0 0.0 2514.4 > attr(,"gradient") > x1 x2 x3 x4 > [1,] -2 -40.0 -2 -40.0 > [2,] 0 0.0 0 0.0 > [3,] -400 279.6 5404 -819.6 > attr(,"hessian") > , , x1 > > x1 x2 x3 x4 > [1,] 2 0 0 0 > [2,] 802 -400 0 0 > [3,] 402 -400 0 0 > > , , x2 > > x1 x2 x3 x4 > [1,] 0 220.2 0 19.8 > [2,] -400 220.2 0 19.8 > [3,] -400 220.2 0 19.8 > > , , x3 > > x1 x2 x3 x4 > [1,] 0 0 2 0 > [2,] 0 0 722 -360 > [3,] 0 0 8282 -1080 > > , , x4 > > x1 x2 x3 x4 > [1,] 0 19.8 0 200.2 > [2,] 0 19.8 -360 200.2 > [3,] 0 19.8 -1080 200.2 > > > > > > I have also noted that a number of researchers try to > > implement textbook methods and run into trouble when maths > > and computing are not quite in sync. Therefore, I wrote a > > simple safeguarded Newton and put a small package on > > R-forge at > > > https://r-forge.r-project.org/R/?group_id=395 > > > Note that Newton's method is used to solve nonlinear > > equations. In fact, for function minimization, we apply it > > to solve g(x) = 0 where g is the gradient and x is the > > vector of parameters. In part, safeguards ensure we reduce > > the function f(x) at each step to avoid some of the > > difficulties that may arise from a non-positive-definite > > hessian. > > > In the package, I have a very simple quadratic test, the > > Rosenbrock test function and the Wood test function. The > > method fails on the last function -- the hessian is not > > positive definite where it stops. > > > Before submitting this package to CRAN, I would like to > > see its behaviour on other test problems, but am lazy > > enough to wish to avoid creating the hessian code. If > > anyone has such code, it would be very welcome. Please > > contact me off-list. If I get some workable examples that > > are open for public view, I'll report back here. > > > John Nash > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >