thr3ads.net - R help - [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Ravi Varadhan

2015-Nov-15 17:46 UTC

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Hi John,
My main point is not about Nelder-Mead per se.  It is *primarily* about the
Nelder-Mead implementation in optim().

The users of optim() should be cautioned regarding the default algorithm and
that they should consider alternatives such as "BFGS" in optim(), or
other implementations of Nelder-Mead.

Best regards,
Ravi
________________________________________
From: ProfJCNash <profjcnash at gmail.com>
Sent: Sunday, November 15, 2015 12:21 PM
To: Ravi Varadhan; 'r-help at r-project.org'; lorenzo.isella at
gmail.com
Cc: bhh at xs4all.nl; Gabor Grothendieck
Subject: Re: Cautioning optim() users about "Nelder-Mead" default -
(originally) Optim instability

Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are
all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low
slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:> Hi,
>
>
>
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have
several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons.
>
>
>
> There is a better implementation of Nelder-Mead in the "dfoptim"
package.
>
>
>
> ?require(dfoptim)
>
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>
> print(mm_def1$par)
>
> print(mm_def2$par)
>
> print(mm_def3$par)
>
>
>
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It
is unfortunate
> that most na?ve users of optimization in R do not recognize this.
> Perhaps, there should be a "message" in the optim help file that
points
> this out to the users.
>
>
>
> Hope this is helpful,
>
> Ravi
>
>

ProfJCNash

2015-Nov-15 19:41 UTC

head link

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Agreed on the default algorithm issue. That is important for users to
know, and I'm happy to underline it. Also that CG (which is based on one
of my codes) should be deprecated. BFGS (also based on one of my codes
from long ago) does much better than I would ever have expected.

Over the years I've tried different Nelder-Mead implementations. Cannot
say I've found any that is always better than that in optim() (also
based on an old code of mine), though nmkb() from dfoptim package seems
to do better a lot of the time, and it has a transformation method for
bounds, which may be useful, but does have the awkwardness that one
cannot start on a bound. For testing a function, I don't think it makes
a lot of difference which variant of NM one uses if the trace is on to
catch never-ending runs. For production use, it is a really good idea to
try different methods on a sample of likely cases and choose a method
that does well. That is the motivation for the optimx package or the
opm() function of the newer optimz (on R-forge) that I'm still
polishing. optimz has a function optimr() that has the same call as
optim() but incorporates over a dozen optimizers via method = "(selected
method)".

As a gradient-free choice, the Powell codes from minqa or other packages
(there are several implementations) can sometimes have spectacular
performance, but they also flub rather more regularly than Nelder-Mead
in my experience. That is, when they are good, they are very very good,
and when they are not they are horrid. (Plagiarism!)

JN

On 15-11-15 12:46 PM, Ravi Varadhan wrote:> Hi John,
> My main point is not about Nelder-Mead per se.  It is *primarily* about the
Nelder-Mead implementation in optim().
> 
> The users of optim() should be cautioned regarding the default algorithm
and that they should consider alternatives such as "BFGS" in optim(),
or other implementations of Nelder-Mead.
> 
> Best regards,
> Ravi
> ________________________________________
> From: ProfJCNash <profjcnash at gmail.com>
> Sent: Sunday, November 15, 2015 12:21 PM
> To: Ravi Varadhan; 'r-help at r-project.org'; lorenzo.isella at
gmail.com
> Cc: bhh at xs4all.nl; Gabor Grothendieck
> Subject: Re: Cautioning optim() users about "Nelder-Mead" default
- (originally) Optim instability
> 
> Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
> "bad" per se. It's issues are that it assumes the parameters
are all on
> the same scale, and the termination (not convergence) test can't use
> gradients, so it tends to get "near" the optimum very quickly --
say
> only 10% of the computational effort -- then spends an awful amount of
> effort deciding it's got there. It often will do poorly when the
> function has nearly "flat" zones e.g., long valley with very low
slope.
> 
> So my message is still that Nelder-Mead is an unfortunate default -- it
> has been chosen I believe because it is generally robust and doesn't
> need gradients. BFGS really should use accurate gradients, preferably
> computed analytically, so it would only be a good default in that case
> or with very good approximate gradients (which are costly
> computationally).
> 
> However, if you understand what NM is doing, and use it accordingly, it
> is a valuable tool. I generally use it as a first try BUT turn on the
> trace to watch what it is doing as a way to learn a bit about the
> function I am minimizing. Rarely would I use it as a production minimizer.
> 
> Best, JN
> 
> On 15-11-15 12:02 PM, Ravi Varadhan wrote:
>> Hi,
>>
>>
>>
>> While I agree with the comments about paying attention to parameter
>> scaling, a major issue here is that the default optimization algorithm,
>> Nelder-Mead, is not very good.  It is unfortunate that the optim
>> implementation chose this as the "default" algorithm.  I have
several
>> instances where people have come to me with poor results from using
>> optim(), because they did not realize that the default algorithm is
>> bad.  We (John Nash and I) have pointed this out before, but the R core
>> has not addressed this issue due to backward compatibility reasons.
>>
>>
>>
>> There is a better implementation of Nelder-Mead in the
"dfoptim" package.
>>
>>
>>
>> ?require(dfoptim)
>>
>> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>>
>> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>>
>> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>>
>> print(mm_def1$par)
>>
>> print(mm_def2$par)
>>
>> print(mm_def3$par)
>>
>>
>>
>> In general, better implementations of optimization algorithms are
>> available in packages such as "optimx", "nloptr". 
It is unfortunate
>> that most na?ve users of optimization in R do not recognize this.
>> Perhaps, there should be a "message" in the optim help file
that points
>> this out to the users.
>>
>>
>>
>> Hope this is helpful,
>>
>> Ravi
>>
>>
>

Mark Leeds

2015-Nov-15 20:05 UTC

head link

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

and just to add to john's comments, since he's too modest, in my
experience,  the algorithm in the rvmmin  package ( written by john ) shows
great improvement compared to the L-BFGS-B  algorithm so I don't use
L-BFGS-B anymore.  L-BFGS-B often has a dangerous convergence issue  in
that it can claim to converge when it hasn't. which, to
me is worse than not converging.  Most likely it has to do with the link
below which was pointed out to me by john a while back.

http://www.ece.northwestern.edu/~morales/PSfiles/acm-remark.pdf


On Sun, Nov 15, 2015 at 2:41 PM, ProfJCNash <profjcnash at gmail.com>
wrote:
> Agreed on the default algorithm issue. That is important for users to
> know, and I'm happy to underline it. Also that CG (which is based on
one
> of my codes) should be deprecated. BFGS (also based on one of my codes
> from long ago) does much better than I would ever have expected.
>
> Over the years I've tried different Nelder-Mead implementations. Cannot
> say I've found any that is always better than that in optim() (also
> based on an old code of mine), though nmkb() from dfoptim package seems
> to do better a lot of the time, and it has a transformation method for
> bounds, which may be useful, but does have the awkwardness that one
> cannot start on a bound. For testing a function, I don't think it makes
> a lot of difference which variant of NM one uses if the trace is on to
> catch never-ending runs. For production use, it is a really good idea to
> try different methods on a sample of likely cases and choose a method
> that does well. That is the motivation for the optimx package or the
> opm() function of the newer optimz (on R-forge) that I'm still
> polishing. optimz has a function optimr() that has the same call as
> optim() but incorporates over a dozen optimizers via method =
"(selected
> method)".
>
> As a gradient-free choice, the Powell codes from minqa or other packages
> (there are several implementations) can sometimes have spectacular
> performance, but they also flub rather more regularly than Nelder-Mead
> in my experience. That is, when they are good, they are very very good,
> and when they are not they are horrid. (Plagiarism!)
>
> JN
>
> On 15-11-15 12:46 PM, Ravi Varadhan wrote:
> > Hi John,
> > My main point is not about Nelder-Mead per se.  It is *primarily*
about
> the Nelder-Mead implementation in optim().
> >
> > The users of optim() should be cautioned regarding the default
algorithm
> and that they should consider alternatives such as "BFGS" in
optim(), or
> other implementations of Nelder-Mead.
> >
> > Best regards,
> > Ravi
> > ________________________________________
> > From: ProfJCNash <profjcnash at gmail.com>
> > Sent: Sunday, November 15, 2015 12:21 PM
> > To: Ravi Varadhan; 'r-help at r-project.org'; lorenzo.isella
at gmail.com
> > Cc: bhh at xs4all.nl; Gabor Grothendieck
> > Subject: Re: Cautioning optim() users about "Nelder-Mead"
default -
> (originally) Optim instability
> >
> > Not contradicting Ravi's message, but I wouldn't say
Nelder-Mead is
> > "bad" per se. It's issues are that it assumes the
parameters are all on
> > the same scale, and the termination (not convergence) test can't
use
> > gradients, so it tends to get "near" the optimum very
quickly -- say
> > only 10% of the computational effort -- then spends an awful amount of
> > effort deciding it's got there. It often will do poorly when the
> > function has nearly "flat" zones e.g., long valley with very
low slope.
> >
> > So my message is still that Nelder-Mead is an unfortunate default --
it
> > has been chosen I believe because it is generally robust and
doesn't
> > need gradients. BFGS really should use accurate gradients, preferably
> > computed analytically, so it would only be a good default in that case
> > or with very good approximate gradients (which are costly
> > computationally).
> >
> > However, if you understand what NM is doing, and use it accordingly,
it
> > is a valuable tool. I generally use it as a first try BUT turn on the
> > trace to watch what it is doing as a way to learn a bit about the
> > function I am minimizing. Rarely would I use it as a production
> minimizer.
> >
> > Best, JN
> >
> > On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> >> Hi,
> >>
> >>
> >>
> >> While I agree with the comments about paying attention to
parameter
> >> scaling, a major issue here is that the default optimization
algorithm,
> >> Nelder-Mead, is not very good.  It is unfortunate that the optim
> >> implementation chose this as the "default" algorithm.  I
have several
> >> instances where people have come to me with poor results from
using
> >> optim(), because they did not realize that the default algorithm
is
> >> bad.  We (John Nash and I) have pointed this out before, but the R
core
> >> has not addressed this issue due to backward compatibility
reasons.
> >>
> >>
> >>
> >> There is a better implementation of Nelder-Mead in the
"dfoptim"
> package.
> >>
> >>
> >>
> >> ?require(dfoptim)
> >>
> >> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
> >>
> >> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
> >>
> >> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
> >>
> >> print(mm_def1$par)
> >>
> >> print(mm_def2$par)
> >>
> >> print(mm_def3$par)
> >>
> >>
> >>
> >> In general, better implementations of optimization algorithms are
> >> available in packages such as "optimx",
"nloptr".  It is unfortunate
> >> that most na?ve users of optimization in R do not recognize this.
> >> Perhaps, there should be a "message" in the optim help
file that points
> >> this out to the users.
> >>
> >>
> >>
> >> Hope this is helpful,
> >>
> >> Ravi
> >>
> >>
> >
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

R help - Nov 2015 - Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability