thr3ads.net - R help - [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Ravi Varadhan

2015-Nov-15 17:02 UTC

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Hi,



While I agree with the comments about paying attention to parameter scaling, a
major issue here is that the default optimization algorithm, Nelder-Mead, is not
very good.  It is unfortunate that the optim implementation chose this as the
"default" algorithm.  I have several instances where people have come
to me with poor results from using optim(), because they did not realize that
the default algorithm is bad.  We (John Nash and I) have pointed this out
before, but the R core has not addressed this issue due to backward
compatibility reasons.



There is a better implementation of Nelder-Mead in the "dfoptim"
package.



?require(dfoptim)

mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)

mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)

mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)

print(mm_def1$par)

print(mm_def2$par)

print(mm_def3$par)



In general, better implementations of optimization algorithms are available in
packages such as "optimx", "nloptr".  It is unfortunate that
most na?ve users of optimization in R do not recognize this.  Perhaps, there
should be a "message" in the optim help file that points this out to
the users.



Hope this is helpful,

Ravi


	[[alternative HTML version deleted]]

lorenzo.isella at gmail.com

2015-Nov-15 17:17 UTC

head link

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Thanks a lot, Ravi.
Indeed you best understood the point of my email.
I am perfectly aware that most of the optimization algorithms find
local rather than global minima and therefore the choice of the
initial parameters plays (at least in principle) a role.
Nevertheless, my optimization problem is rather trivial and I did not
bother to look for anything beyond the most basic tool in R for
optimization.
What surprised me is that an algorithm different from the default one
in optim() is extremely robust to a partially deliberate bad choice
ofthe initial parameters, whereas the standard one is not.
You perfectly answered my question.
Regards

Lorenzo

On Sun, Nov 15, 2015 at 05:02:52PM +0000, Ravi Varadhan
wrote:>Hi,
>
>
>
>While I agree with the comments about paying attention to parameter scaling,
a major issue here is that the default optimization algorithm, Nelder-Mead, is
not very good.  It is unfortunate that the optim implementation chose this as
the "default" algorithm.  I have several instances where people have
come to me with poor results from using optim(), because they did not realize
that the default algorithm is bad.  We (John Nash and I) have pointed this out
before, but the R core has not addressed this issue due to backward
compatibility reasons.
>
>
>
>There is a better implementation of Nelder-Mead in the "dfoptim"
package.
>
>
>
>?require(dfoptim)
>
>mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>
>mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>
>mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>
>print(mm_def1$par)
>
>print(mm_def2$par)
>
>print(mm_def3$par)
>
>
>
>In general, better implementations of optimization algorithms are available
in packages such as "optimx", "nloptr".  It is unfortunate
that most na?ve users of optimization in R do not recognize this.  Perhaps,
there should be a "message" in the optim help file that points this
out to the users.
>
>
>
>Hope this is helpful,
>
>Ravi
>

ProfJCNash

2015-Nov-15 17:21 UTC

head link

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are
all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low
slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:> Hi,
> 
>  
> 
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have
several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons. 
> 
>  
> 
> There is a better implementation of Nelder-Mead in the "dfoptim"
package.
> 
>  
> 
> ?require(dfoptim)
> 
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
> 
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
> 
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
> 
> print(mm_def1$par)
> 
> print(mm_def2$par)
> 
> print(mm_def3$par)
> 
>  
> 
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It
is unfortunate
> that most na?ve users of optimization in R do not recognize this. 
> Perhaps, there should be a "message" in the optim help file that
points
> this out to the users. 
> 
>  
> 
> Hope this is helpful,
> 
> Ravi
> 
>

Ravi Varadhan

2015-Nov-15 17:46 UTC

head link

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

Hi John,
My main point is not about Nelder-Mead per se.  It is *primarily* about the
Nelder-Mead implementation in optim().

The users of optim() should be cautioned regarding the default algorithm and
that they should consider alternatives such as "BFGS" in optim(), or
other implementations of Nelder-Mead.

Best regards,
Ravi
________________________________________
From: ProfJCNash <profjcnash at gmail.com>
Sent: Sunday, November 15, 2015 12:21 PM
To: Ravi Varadhan; 'r-help at r-project.org'; lorenzo.isella at
gmail.com
Cc: bhh at xs4all.nl; Gabor Grothendieck
Subject: Re: Cautioning optim() users about "Nelder-Mead" default -
(originally) Optim instability

Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are
all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low
slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:> Hi,
>
>
>
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have
several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons.
>
>
>
> There is a better implementation of Nelder-Mead in the "dfoptim"
package.
>
>
>
> ?require(dfoptim)
>
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>
> print(mm_def1$par)
>
> print(mm_def2$par)
>
> print(mm_def3$par)
>
>
>
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It
is unfortunate
> that most na?ve users of optimization in R do not recognize this.
> Perhaps, there should be a "message" in the optim help file that
points
> this out to the users.
>
>
>
> Hope this is helpful,
>
> Ravi
>
>

R help - Nov 2015 - Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability