It must be the case that this issue has already been rised before, but I did not manage to find it in past posting. In some cases, optim() and nlminb() declare a successful convergence, but the corresponding Hessian is not positive-definite. A simplified version of the original problem is given in the code which for readability is placed below this text. The example is built making use of package 'sn', but this is only required to set-up the example: the question is about the outcome of the optimizers. At the end of the run, a certain point is declared to correspont to a minimum since 'convergence=0' is reported, but the eigenvalues of the (numerically evaluated) Hessian matrix at that point are not all positive. Any views on the cause of the problem? (i) the point does not correspong to a real minimum, (ii) it does dive a minimum but the Hessian matrix is wrong, (iii) the eigenvalues are not right. ...and, in case, how to get the real solution. Adelchi Azzalini #------------------ library(sn) # version 0.4-18 data(ais, package="sn") attach(ais) X <- cbind(1,Ht,Wt) y <- cbind(bmi, lbm) dettach(ais) negLogLik <- function(vdp, x, y) { d <- ncol(y) p <- ncol(x) beta <- matrix(vdp[1:(d*p)], p, d) Omega <- matrix(0, d, d) Omega[lower.tri(Omega, TRUE)] <- vdp[(p*d+1):(p*d+d*(d+1)/2)] Omega <- Omega + t(Omega) - diag(diag(Omega), d) if(any(eigen(Omega)$values <= 0)) return(Inf) omega <- sqrt(diag(Omega)) alpha <- vdp[(p*d+d*(d+1)/2+1):(p*d+d*(d+1)/2+d)] nu <- vdp[p*d+d*(d+1)/2+d+1] if(nu <= 0) return(Inf) logL <- sum(dmst(y, x %*% beta, Omega, alpha, nu, log=TRUE)) return(-logL) } # run 1 vdp <- c(44, 0, 0, -4, 0, 0, 0.05, -0.5, 35, 0.5, -20, 3.5) opt <- optim(vdp, negLogLik, method="BFGS", hessian=TRUE, x=X, y=y) opt$value # [1] 625.3 opt$convergence # [1] 0 eigen(opt$hessian)$values # [1] 7.539e+07 1.523e+06 .... 5.684e-02 -4.516e-01 #--- # run 2 vdp <- c(44.17, -0.2441, 0.303, -3.620, 0.04044, 0.8906, 0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5) opt <- optim(vdp, negLogLik, method="BFGS", hessian=TRUE, x=X, y=y) opt$value # [1] 599.7 opt$convergence # [1] 0 eigen(opt$hessian)$values # [1] 1.235e+08 .... 3.845e-02 -1.311e-03 -6.701e+02 #--- # run 3 vdp <- c(44.17, -0.2441, 0.303, -3.620, 0.04044, 0.8906, 0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5) opt <- optim(vdp, negLogLik, method="SANN", hessian=TRUE, x=X, y=y) opt$value # [1] 599.8 opt$convergence # [1] 0 eigen(opt$hessian)$values # [1] 1.232e+08 .... 3.225e-02 -6.681e-02 -7.513e+02 #-- # run 4 vdp <- c(44.17, -0.2441, 0.303, -3.620, 0.04044, 0.8906, 0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5) opt <- optim(vdp, negLogLik, method="Nelder", hessian=TRUE, x=X, y=y) opt$value # [1] 599.7 opt$convergence # [1] 1 #-- # run 5 vdp <- c(44.17, -0.2441, 0.303, -3.620, 0.04044, 0.8906, 0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5) opt <- optim(vdp, negLogLik, method="CG", hessian=TRUE, x=X, y=y) opt$value # [1] 599.7 opt$convergence # [1] 0 eigen(opt$hessian)$values # [1] 1.236e+08 3.026e+06 .... 3.801e-02 -2.348e-04 -7.344e+02 #-- # run 6 vdp <- c(44.17, -0.2441, 0.303, -3.620, 0.04044, 0.8906, 0.0487, -0.5072, 36.33, 0.4445, -20.87, 3.5) opt <- nlminb(vdp, negLogLik, x=X, y=y) opt$obj # [1] 599.7 H <- optimHess(opt$par, negLogLik, x=X, y=y) eigen(H)$values # [1] 1.236e+08 3.041e+06 ... 4.090e-05 -7.176e+02 ============ _ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 0.2 year 2013 month 09 day 25 svn rev 63987 language R version.string R version 3.0.2 (2013-09-25) nickname Frisbee Sailing
If you run all methods in package optimx, you will see results all over the western hemisphere. I suspect a problem with some nasty computational issues. Possibly the replacement of the function with Inf when any eigenvalues < 0 or nu < 0 is one source of this. Note that Hessian eigenvalues are not used to determine convergence in optimization methods. If they did, nobody would ever get promoted from junior lecturer who was under 100 if they needed to do this, because determining the Hessian from just the function requires two levels of approximate derivatives. If you want to get this problem reliably solved, I think you will need to 1) sort out a way to avoid the Inf values -- can you constrain the parameters away from such areas, or at least not use Inf. This messes up the gradient computation and hence the optimizers and also the final Hessian. 2) work out an analytic gradient function. JN> Date: Mon, 16 Dec 2013 16:09:46 +0100 > From: Adelchi Azzalini <azzalini at stat.unipd.it> > To: r-help at r-project.org > Subject: [R] convergence=0 in optim and nlminb is real? > Message-ID: <20131216160946.91858ff279db26bd65e187bc at stat.unipd.it> > Content-Type: text/plain; charset=US-ASCII > > It must be the case that this issue has already been rised before, > but I did not manage to find it in past posting. > > In some cases, optim() and nlminb() declare a successful convergence, > but the corresponding Hessian is not positive-definite. A simplified > version of the original problem is given in the code which for > readability is placed below this text. The example is built making use > of package 'sn', but this is only required to set-up the example: the > question is about the outcome of the optimizers. At the end of the run, > a certain point is declared to correspont to a minimum since > 'convergence=0' is reported, but the eigenvalues of the (numerically > evaluated) Hessian matrix at that point are not all positive. > > Any views on the cause of the problem? (i) the point does not > correspong to a real minimum, (ii) it does dive a minimum but the > Hessian matrix is wrong, (iii) the eigenvalues are not right. > ...and, in case, how to get the real solution. > > > Adelchi Azzalini
The optimization algorithms did converge to a limit point. But, not to a stationary point, i.e. a point in parameter space where the first and second order KKT conditions are satisfied. If you check the gradient at the solution, you will see that it is quite large in magnitude relative to 0. So, why did the algorithms declare convergence? Convergence is based on absolute change in function value and/or relative change in parameter values between consecutive iterations. This does not ensure that the KKT conditions are satisfied. Now, to the real issue: your problem is ill-posed. As you can tell from the eigenvalues of the hessian, they vary over 9 orders of magnitude. This may indicate a problem with the data in that the log-likelihood is over-parametrized relative to the information in the data set. Get a better data set or formulate a simpler model, and the problem will disappear. Best, Ravi [[alternative HTML version deleted]]
On Tue, 17 Dec 2013 15:21:57 +0000, Ravi Varadhan wrote: RV> The optimization algorithms did converge to a limit point. But, RV> not to a stationary point, i.e. a point in parameter space where RV> the first and second order KKT conditions are satisfied. If you RV> check the gradient at the solution, you will see that it is quite RV> large in magnitude relative to 0. So, why did the algorithms RV> declare convergence? Convergence is based on absolute change in RV> function value and/or relative change in parameter values between RV> consecutive iterations. This does not ensure that the KKT RV> conditions are satisfied. This makes sense to me. Although I have indicated other possible explanations, the most plausable was that the selected point is not at a minimum, as you confirmed. As in many other cases, the stopping rule of an optimizer can be a delicate issue. However, since optim computes (on request) the Hessian matrix, a check on its positive-definiteness seems to me a reasonable check to be made by optim before declaring successful convergence. RV> RV> Now, to the real issue: your problem is ill-posed. As you can RV> tell from the eigenvalues of the hessian, they vary over 9 orders RV> of magnitude. This may indicate a problem with the data in that RV> the log-likelihood is over-parametrized relative to the information RV> in the data set. Get a better data set or formulate a simpler RV> model, and the problem will disappear. RV> I had noticed this aspect of the relative order of magnitudes of the eigenvalues. The model is not over-parameterized (in a formal sense), but in some cases maximization of the log-likelihood can be a delicate issue, yes. I am not specifically interested in fitting these data, nor any other data. I am working on an update of package 'sn'. Thanks for your informative reply. Adelchi -- Adelchi Azzalini <azzalini at stat.unipd.it> Dipart.Scienze Statistiche, Universit? di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/