thr3ads.net - R help - [R] Safeguarded Newton method for function minimization [Apr 2017]

If this information is useful, please help other people find it:
Share via:

J C Nash

2017-Apr-18 17:32 UTC

[R] Safeguarded Newton method for function minimization

Recently Marie Boehnstedt reported a bug in the nlm() function for function
minimization
when both gradient and hessian are provided. She has provided a work-around for
some
cases and it seems this will get incorporated into the R function eventually.

However, despite the great number of packages on CRAN, there does not appear to
be a
straightforward Newton approach to function minimization. This may be because
providing
the code for a hessian (the matrix of second derivatives) is a lot of work and
error-prone.
(R could also use some good tools for Automatic Differentiation). I have also
noted that
a number of researchers try to implement textbook methods and run into trouble
when maths
and computing are not quite in sync. Therefore, I wrote a simple safeguarded
Newton and
put a small package on R-forge at

https://r-forge.r-project.org/R/?group_id=395

Note that Newton's method is used to solve nonlinear equations. In fact, for
function
minimization, we apply it to solve g(x) = 0 where g is the gradient and x is the
vector
of parameters. In part, safeguards ensure we reduce the function f(x) at each
step to avoid
some of the difficulties that may arise from a non-positive-definite hessian.

In the package, I have a very simple quadratic test, the Rosenbrock test
function and
the Wood test function. The method fails on the last function -- the hessian is
not
positive definite where it stops.

Before submitting this package to CRAN, I would like to see its behaviour on
other
test problems, but am lazy enough to wish to avoid creating the hessian code. If
anyone
has such code, it would be very welcome. Please contact me off-list. If I get
some workable
examples that are open for public view, I'll report back here.

John Nash

Martin Maechler

2017-Apr-19 12:56 UTC

head link

[R] Safeguarded Newton method for function minimization

>>>>> J C Nash <profjcnash at gmail.com>
>>>>>     on Tue, 18 Apr 2017 13:32:52 -0400 writes:
    > Recently Marie Boehnstedt reported a bug in the nlm()
    > function for function minimization when both gradient and
    > hessian are provided.
Indeed, on R's Bugzilla here :
	https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249

    > She has provided a work-around for some cases and it seems
    > this will get incorporated into the R function eventually.

indeed.... the first part -- fixing the wrong choldc() -- in the C code has
been in my version of R-devel for a while now.

See my follow up
    https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4  and #c5

including my attachment which is an extended version of Marie's original :
    https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246

As that mentions _and_ shows: Fixing choldc() solves the problem
for the 2D Rosenbrook example, but _not_ the  4D Wood example.

    > However, despite the great number of packages on CRAN,
    > there does not appear to be a straightforward Newton
    > approach to function minimization. This may be because
    > providing the code for a hessian (the matrix of second
    > derivatives) is a lot of work and error-prone.

    > (R could also use some good tools for Automatic Differentiation).

The last part of what you say above is not at all true:
R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3()
and my attachment above _shows_ you how to use  deriv3() to get
both the gradient and the hessian  via a version of automatic
differentiation completely effortlessly !!

For ease of readers, that part here, with an example:

##' Wood function (4 arguments 'x1' ... 'x4')
fwood <- function(x1,x2,x3,x4) {
  100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 +
    10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4)
}
## automatically construct correct gradient and hessian:
woodf.gh <- function(x) {
  stopifnot(is.numeric(x))
  woodGH <- deriv3(body(fwood)[[2]],
                  
c("x1","x2","x3","x4"),
function.arg=TRUE)
  if(length(x) == 4)
    woodGH(x[1],x[2],x[3],x[4])
  else if(is.matrix(x) && ncol(x) == 4)
    woodGH(x[,1], x[,2], x[,3], x[,4])
  else stop("'x' must have length 4 or be a matrix with 4
columns")
}

and now get both the function f(x), gradient g(x) and Hessian H(x) 
for
     x = (0 0 0 0),
     x = (1 1 1 1), and
     x = (1 2 3 4)

with such a simple calle :
     
  > woodf.gh(rbind(0, 1, 1:4))
  [1]   42.0    0.0 2514.4
  attr(,"gradient")
	 x1    x2   x3     x4
  [1,]   -2 -40.0   -2  -40.0
  [2,]    0   0.0    0    0.0
  [3,] -400 279.6 5404 -819.6
  attr(,"hessian")
  , , x1

	x1   x2 x3 x4
  [1,]   2    0  0  0
  [2,] 802 -400  0  0
  [3,] 402 -400  0  0

  , , x2

	 x1    x2 x3   x4
  [1,]    0 220.2  0 19.8
  [2,] -400 220.2  0 19.8
  [3,] -400 220.2  0 19.8

  , , x3

       x1 x2   x3    x4
  [1,]  0  0    2     0
  [2,]  0  0  722  -360
  [3,]  0  0 8282 -1080

  , , x4

       x1   x2    x3    x4
  [1,]  0 19.8     0 200.2
  [2,]  0 19.8  -360 200.2
  [3,]  0 19.8 -1080 200.2

  > 

    > I have also noted that a number of researchers try to
    > implement textbook methods and run into trouble when maths
    > and computing are not quite in sync. Therefore, I wrote a
    > simple safeguarded Newton and put a small package on
    > R-forge at

    > https://r-forge.r-project.org/R/?group_id=395

    > Note that Newton's method is used to solve nonlinear
    > equations. In fact, for function minimization, we apply it
    > to solve g(x) = 0 where g is the gradient and x is the
    > vector of parameters. In part, safeguards ensure we reduce
    > the function f(x) at each step to avoid some of the
    > difficulties that may arise from a non-positive-definite
    > hessian.

    > In the package, I have a very simple quadratic test, the
    > Rosenbrock test function and the Wood test function. The
    > method fails on the last function -- the hessian is not
    > positive definite where it stops.

    > Before submitting this package to CRAN, I would like to
    > see its behaviour on other test problems, but am lazy
    > enough to wish to avoid creating the hessian code. If
    > anyone has such code, it would be very welcome. Please
    > contact me off-list. If I get some workable examples that
    > are open for public view, I'll report back here.

    > John Nash

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

J C Nash

2017-Apr-19 14:06 UTC

head link

[R] Safeguarded Newton method for function minimization

I should have given a more detailed explanation about Automatic Differentiation.
As Martin points out, there are some AD elements in deriv etc., but to my view
they are related more to symbolic differentiation, and not in the same space
as AD Model Builder or its successor whose name eludes me at the moment.
Nor like ADIFOR or ADOLC that can provide the derivative of a subprogram.

However, Martin's example of the Wood function shows what can be done, and
this
is a valuable case. Indeed it shows a syntax of which I was previously unaware!
Because it is fairly complicated, I'm not sure I would say
"effortlessly", but I
will certainly be copying it. Thanks. Previously, I've done the steps to get
the
function partly manually to get to the functional form, as in Chapter 10 of my
Nonlinear Parameter Optimization book.

Note that there are functions all in R that are similar to the deriv() and
deriv3()
in package nlsr by Duncan Murdoch and me. There is a facility to extend the
derivatives
table to new functions. We can also generate the R gradient functions from
expressions.

I'll take this chance to note that the Wood problem is NOT solved by nlm()
with
analytic hessian, nor by a safeguarded or unsafeguarded Newton. And I ran a
version
of the Fortran code UNCMIN that is the origin of nlm() and got a similar early
termination.
But the hessian when these codes halt is not positive definite, so we are
outside
the assumed conditions for these methods. Dealing with a non-pos-def hessian in
function minimization has created a small industry in Ph Ds for researchers in
optimization.

Thanks again for Martin's work.

JN




On 2017-04-19 08:56 AM, Martin Maechler wrote:>>>>>> J C Nash <profjcnash at gmail.com>
>>>>>>     on Tue, 18 Apr 2017 13:32:52 -0400 writes:
>
>     > Recently Marie Boehnstedt reported a bug in the nlm()
>     > function for function minimization when both gradient and
>     > hessian are provided.
> Indeed, on R's Bugzilla here :
> 	https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249
>
>     > She has provided a work-around for some cases and it seems
>     > this will get incorporated into the R function eventually.
>
> indeed.... the first part -- fixing the wrong choldc() -- in the C code has
> been in my version of R-devel for a while now.
>
> See my follow up
>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4  and #c5
>
> including my attachment which is an extended version of Marie's
original :
>     https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246
>
> As that mentions _and_ shows: Fixing choldc() solves the problem
> for the 2D Rosenbrook example, but _not_ the  4D Wood example.
>
>     > However, despite the great number of packages on CRAN,
>     > there does not appear to be a straightforward Newton
>     > approach to function minimization. This may be because
>     > providing the code for a hessian (the matrix of second
>     > derivatives) is a lot of work and error-prone.
>
>     > (R could also use some good tools for Automatic Differentiation).
>
> The last part of what you say above is not at all true:
> R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and
deriv3()
> and my attachment above _shows_ you how to use  deriv3() to get
> both the gradient and the hessian  via a version of automatic
> differentiation completely effortlessly !!
>
> For ease of readers, that part here, with an example:
>
> ##' Wood function (4 arguments 'x1' ... 'x4')
> fwood <- function(x1,x2,x3,x4) {
>   100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 +
>     10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4)
> }
> ## automatically construct correct gradient and hessian:
> woodf.gh <- function(x) {
>   stopifnot(is.numeric(x))
>   woodGH <- deriv3(body(fwood)[[2]],
>                   
c("x1","x2","x3","x4"),
function.arg=TRUE)
>   if(length(x) == 4)
>     woodGH(x[1],x[2],x[3],x[4])
>   else if(is.matrix(x) && ncol(x) == 4)
>     woodGH(x[,1], x[,2], x[,3], x[,4])
>   else stop("'x' must have length 4 or be a matrix with 4
columns")
> }
>
> and now get both the function f(x), gradient g(x) and Hessian H(x)
> for
>      x = (0 0 0 0),
>      x = (1 1 1 1), and
>      x = (1 2 3 4)
>
> with such a simple calle :
>
>   > woodf.gh(rbind(0, 1, 1:4))
>   [1]   42.0    0.0 2514.4
>   attr(,"gradient")
> 	 x1    x2   x3     x4
>   [1,]   -2 -40.0   -2  -40.0
>   [2,]    0   0.0    0    0.0
>   [3,] -400 279.6 5404 -819.6
>   attr(,"hessian")
>   , , x1
>
> 	x1   x2 x3 x4
>   [1,]   2    0  0  0
>   [2,] 802 -400  0  0
>   [3,] 402 -400  0  0
>
>   , , x2
>
> 	 x1    x2 x3   x4
>   [1,]    0 220.2  0 19.8
>   [2,] -400 220.2  0 19.8
>   [3,] -400 220.2  0 19.8
>
>   , , x3
>
>        x1 x2   x3    x4
>   [1,]  0  0    2     0
>   [2,]  0  0  722  -360
>   [3,]  0  0 8282 -1080
>
>   , , x4
>
>        x1   x2    x3    x4
>   [1,]  0 19.8     0 200.2
>   [2,]  0 19.8  -360 200.2
>   [3,]  0 19.8 -1080 200.2
>
>   >
>
>     > I have also noted that a number of researchers try to
>     > implement textbook methods and run into trouble when maths
>     > and computing are not quite in sync. Therefore, I wrote a
>     > simple safeguarded Newton and put a small package on
>     > R-forge at
>
>     > https://r-forge.r-project.org/R/?group_id=395
>
>     > Note that Newton's method is used to solve nonlinear
>     > equations. In fact, for function minimization, we apply it
>     > to solve g(x) = 0 where g is the gradient and x is the
>     > vector of parameters. In part, safeguards ensure we reduce
>     > the function f(x) at each step to avoid some of the
>     > difficulties that may arise from a non-positive-definite
>     > hessian.
>
>     > In the package, I have a very simple quadratic test, the
>     > Rosenbrock test function and the Wood test function. The
>     > method fails on the last function -- the hessian is not
>     > positive definite where it stops.
>
>     > Before submitting this package to CRAN, I would like to
>     > see its behaviour on other test problems, but am lazy
>     > enough to wish to avoid creating the hessian code. If
>     > anyone has such code, it would be very welcome. Please
>     > contact me off-list. If I get some workable examples that
>     > are open for public view, I'll report back here.
>
>     > John Nash
>
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>

R help - Apr 2017 - Safeguarded Newton method for function minimization

[R] Safeguarded Newton method for function minimization

[R] Safeguarded Newton method for function minimization

[R] Safeguarded Newton method for function minimization