thr3ads.net - R help - [R] convergence=0 in optim and nlminb is real? [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Adelchi Azzalini

2013-Dec-16 15:09 UTC

[R] convergence=0 in optim and nlminb is real?

It must be the case that this issue has already been rised before,
but I did not manage to find it in past posting. 

In some cases, optim() and nlminb() declare a successful convergence,
but the corresponding Hessian is not positive-definite.  A simplified
version of the original problem is given in the code which for
readability is placed below this text.  The example is built making use
of package 'sn', but this is only required to set-up the example: the
question is about the outcome of the optimizers. At the end of the run,
a certain point is declared to correspont to a minimum since
'convergence=0' is reported, but the eigenvalues of the (numerically
evaluated) Hessian matrix at that point are not all positive.

Any views on the cause of the problem? (i) the point does not
correspong to a real minimum, (ii) it does dive a minimum but the
Hessian matrix is wrong, (iii) the eigenvalues are not right. 
...and, in case, how to get the real solution. 


Adelchi Azzalini

#------------------
library(sn) # version 0.4-18
data(ais, package="sn")
attach(ais)
X <- cbind(1,Ht,Wt) 
y <- cbind(bmi, lbm)
dettach(ais)
negLogLik <- function(vdp, x, y)
{
  d <- ncol(y)
  p <- ncol(x)
  beta <- matrix(vdp[1:(d*p)], p, d)
  Omega <- matrix(0, d, d)
  Omega[lower.tri(Omega, TRUE)] <- vdp[(p*d+1):(p*d+d*(d+1)/2)]
  Omega <- Omega + t(Omega) - diag(diag(Omega), d)
  if(any(eigen(Omega)$values <= 0)) return(Inf)
  omega <- sqrt(diag(Omega))
  alpha <- vdp[(p*d+d*(d+1)/2+1):(p*d+d*(d+1)/2+d)]
  nu <- vdp[p*d+d*(d+1)/2+d+1]
  if(nu <= 0) return(Inf)
  logL <- sum(dmst(y, x %*% beta, Omega, alpha, nu, log=TRUE))
  return(-logL)
}
# run 1
vdp <-  c(44, 0, 0, -4, 0, 0, 0.05, -0.5, 35, 0.5, -20, 3.5)
opt <- optim(vdp, negLogLik, method="BFGS", hessian=TRUE, x=X, y=y)
opt$value
# [1] 625.3
opt$convergence
# [1] 0
eigen(opt$hessian)$values
# [1]  7.539e+07  1.523e+06 .... 5.684e-02 -4.516e-01
#---
# run 2
vdp <-  c(44.17,  -0.2441,  0.303, -3.620, 0.04044, 0.8906, 
         0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5)
opt <- optim(vdp, negLogLik, method="BFGS", hessian=TRUE, x=X, y=y)
opt$value
# [1] 599.7
opt$convergence
# [1] 0
eigen(opt$hessian)$values
# [1]  1.235e+08  ....  3.845e-02 -1.311e-03 -6.701e+02
#---
# run 3
vdp <-  c(44.17,  -0.2441,  0.303, -3.620, 0.04044, 0.8906, 
         0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5)
opt <- optim(vdp, negLogLik, method="SANN", hessian=TRUE, x=X, y=y)
opt$value
# [1] 599.8
opt$convergence
# [1] 0
eigen(opt$hessian)$values
# [1]  1.232e+08  ....    3.225e-02 -6.681e-02 -7.513e+02
#--
# run 4
vdp <-  c(44.17,  -0.2441,  0.303, -3.620, 0.04044, 0.8906, 
         0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5)
opt <- optim(vdp, negLogLik, method="Nelder", hessian=TRUE, x=X,
y=y)
opt$value
# [1] 599.7
opt$convergence
# [1] 1
#--
# run 5
vdp <-  c(44.17,  -0.2441,  0.303, -3.620, 0.04044, 0.8906, 
         0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5)
opt <- optim(vdp, negLogLik, method="CG", hessian=TRUE, x=X, y=y)
opt$value
# [1] 599.7
opt$convergence
# [1] 0
eigen(opt$hessian)$values
# [1]  1.236e+08  3.026e+06  .... 3.801e-02 -2.348e-04 -7.344e+02

#--
# run 6
vdp <-  c(44.17,  -0.2441,  0.303, -3.620, 0.04044, 0.8906, 
         0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5)
opt <- nlminb(vdp, negLogLik, x=X, y=y)
opt$obj
# [1] 599.7
H <- optimHess(opt$par, negLogLik, x=X, y=y)
eigen(H)$values
# [1]  1.236e+08  3.041e+06 ... 4.090e-05 -7.176e+02
============            _                           
platform       x86_64-unknown-linux-gnu    
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          0.2                         
year           2013                        
month          09                          
day            25                          
svn rev        63987                       
language       R                           
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing

Prof J C Nash (U30A)

2013-Dec-17 13:27 UTC

head link

[R] convergence=0 in optim and nlminb is real?

If you run all methods in package optimx, you will see results all over 
the western hemisphere. I suspect a problem with some nasty 
computational issues. Possibly the replacement of the function with Inf 
when any eigenvalues < 0 or  nu < 0 is one source of this.

Note that Hessian eigenvalues are not used to determine convergence in 
optimization methods. If they did, nobody would ever get promoted from 
junior lecturer who was under 100 if they needed to do this, because 
determining the Hessian from just the function requires two levels of 
approximate derivatives.

If you want to get this problem reliably solved, I think you will need to
1) sort out a way to avoid the Inf values -- can you constrain the 
parameters away from such areas, or at least not use Inf. This messes up 
the gradient computation and hence the optimizers and also the final 
Hessian.
2) work out an analytic gradient function.

JN




> Date: Mon, 16 Dec 2013 16:09:46 +0100
> From: Adelchi Azzalini <azzalini at stat.unipd.it>
> To: r-help at r-project.org
> Subject: [R] convergence=0 in optim and nlminb is real?
> Message-ID: <20131216160946.91858ff279db26bd65e187bc at
stat.unipd.it>
> Content-Type: text/plain; charset=US-ASCII
>
> It must be the case that this issue has already been rised before,
> but I did not manage to find it in past posting.
>
> In some cases, optim() and nlminb() declare a successful convergence,
> but the corresponding Hessian is not positive-definite.  A simplified
> version of the original problem is given in the code which for
> readability is placed below this text.  The example is built making use
> of package 'sn', but this is only required to set-up the example:
the
> question is about the outcome of the optimizers. At the end of the run,
> a certain point is declared to correspont to a minimum since
> 'convergence=0' is reported, but the eigenvalues of the
(numerically
> evaluated) Hessian matrix at that point are not all positive.
>
> Any views on the cause of the problem? (i) the point does not
> correspong to a real minimum, (ii) it does dive a minimum but the
> Hessian matrix is wrong, (iii) the eigenvalues are not right.
> ...and, in case, how to get the real solution.
>
>
> Adelchi Azzalini

Ravi Varadhan

2013-Dec-17 15:21 UTC

head link

[R] convergence=0 in optim and nlminb is real?

The optimization algorithms did converge to a limit point.  But, not to a
stationary point, i.e. a point in parameter space where the first and second
order KKT conditions are satisfied.  If you check the gradient at the solution,
you will see that it is quite large in magnitude relative to 0.  So, why did the
algorithms declare convergence?  Convergence is based on absolute change in
function value and/or relative change in parameter values between consecutive
iterations.  This does not ensure that the KKT conditions are satisfied.

Now, to the real issue:  your problem is ill-posed.  As you can tell from the
eigenvalues of the hessian, they vary over 9 orders of magnitude.  This may
indicate a problem with the data in that the log-likelihood is over-parametrized
relative to the information in the data set.  Get a better data set or formulate
a simpler model, and the problem will disappear.

Best,
Ravi

	[[alternative HTML version deleted]]

Adelchi Azzalini

2013-Dec-17 16:51 UTC

head link

[R] convergence=0 in optim and nlminb is real?

On Tue, 17 Dec 2013 15:21:57 +0000, Ravi Varadhan wrote:

RV> The optimization algorithms did converge to a limit point.  But,
RV> not to a stationary point, i.e. a point in parameter space where
RV> the first and second order KKT conditions are satisfied.  If you
RV> check the gradient at the solution, you will see that it is quite
RV> large in magnitude relative to 0.  So, why did the algorithms
RV> declare convergence?  Convergence is based on absolute change in
RV> function value and/or relative change in parameter values between
RV> consecutive iterations.  This does not ensure that the KKT
RV> conditions are satisfied.

This makes sense to me. Although I have indicated other possible
explanations, the most plausable was that the selected point is not
at a minimum, as you confirmed.

As in many other cases, the stopping rule of an optimizer can be a
delicate issue. However, since optim computes (on request) the Hessian
matrix, a check on its positive-definiteness seems to me a reasonable
check to be made by optim before declaring successful convergence.

RV> 
RV> Now, to the real issue:  your problem is ill-posed.  As you can
RV> tell from the eigenvalues of the hessian, they vary over 9 orders
RV> of magnitude.  This may indicate a problem with the data in that
RV> the log-likelihood is over-parametrized relative to the information
RV> in the data set.  Get a better data set or formulate a simpler
RV> model, and the problem will disappear.
RV>  

I had noticed this aspect of the relative order of magnitudes of the
eigenvalues. The model is not over-parameterized (in a formal sense),
but in some cases maximization of the log-likelihood can be a delicate
issue, yes. 

I am not specifically interested in fitting these data, nor any other
data. I am working on an update of package 'sn'. 

Thanks for your informative reply.

Adelchi
-- 
Adelchi Azzalini  <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Universit? di Padova, Italia
tel. +39 049 8274147,  http://azzalini.stat.unipd.it/

R help - Dec 2013 - convergence=0 in optim and nlminb is real?

[R] convergence=0 in optim and nlminb is real?

[R] convergence=0 in optim and nlminb is real?

[R] convergence=0 in optim and nlminb is real?

[R] convergence=0 in optim and nlminb is real?