Dear R list members, Since today seems to be the day for optimization questions, I have one that has been puzzling me: I've been doing some work on sem, my structural-equation modelling package. The models that the sem function in this package fits are essentially parametrizations of the multinormal distribution. The function uses optim and nlm sequentially to maximize a multinormal likelihood. One of the changes I've introduced is to use an analytic gradient rather than rely on numerical derivatives. (If I can figure it out, I'd like to use an analytic Hessian as well.) I could provide additional details, but the question that I have is straightforward. I expected that using an analytic gradient would make the program faster and more stable. It *is* substantially faster, by up to an order of magnitude on the problems that I've tried. In one case, however, a model that converged (to the published solution) with numerical derivatives failed to converge with analytic derivatives. I can program around the problem, by having the program fall back to numerical derivatives when convergence fails, but I was surprised by this result, and I'm concerned that it reflects a programming problem or an error in my math. I suspect that if I had made such an error, however, the other examples I tried would not have worked so well. So, my question is, is it possible in principle for an optimization to fail using a correct analytic gradient but to converge with a numerical gradient? If this is possible, is it a common occurrence? Any help would be appreciated. John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Sat, 24 Nov 2001, John Fox wrote:> Dear R list members, > > Since today seems to be the day for optimization questions, I have one that > has been puzzling me: > > I've been doing some work on sem, my structural-equation modelling package. > The models that the sem function in this package fits are essentially > parametrizations of the multinormal distribution. The function uses optim > and nlm sequentially to maximize a multinormal likelihood. One of the > changes I've introduced is to use an analytic gradient rather than rely on > numerical derivatives. (If I can figure it out, I'd like to use an analytic > Hessian as well.) > > I could provide additional details, but the question that I have is > straightforward. I expected that using an analytic gradient would make the > program faster and more stable. It *is* substantially faster, by up to an > order of magnitude on the problems that I've tried. In one case, however, a > model that converged (to the published solution) with numerical derivatives > failed to converge with analytic derivatives. I can program around the > problem, by having the program fall back to numerical derivatives when > convergence fails, but I was surprised by this result, and I'm concerned > that it reflects a programming problem or an error in my math. I suspect > that if I had made such an error, however, the other examples I tried would > not have worked so well. > > So, my question is, is it possible in principle for an optimization to fail > using a correct analytic gradient but to converge with a numerical > gradient? If this is possible, is it a common occurrence?It's possible but rare. You don't have a `correct analytic gradient', but a numerical computation of it. Inaccurately computed gradients are a common cause of convergence problems. You may need to adjust the tolerances. It's also possible in principle that the optimizer takes a completely different path from the starting point due to small differences in calculated derivatives. It's worth trying staritng near the expected answer to rule this out. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Brian At 08:26 AM 11/25/2001 +0000, Prof Brian D Ripley wrote:>On Sat, 24 Nov 2001, John Fox wrote: > >. . . > > > So, my question is, is it possible in principle for an optimization to fail > > using a correct analytic gradient but to converge with a numerical > > gradient? If this is possible, is it a common occurrence? > >It's possible but rare. You don't have a `correct analytic gradient', but >a numerical computation of it. Inaccurately computed gradients are a >common cause of convergence problems. You may need to adjust the >tolerances. > >It's also possible in principle that the optimizer takes a completely >different path from the starting point due to small differences in >calculated derivatives. It's worth trying staritng near the expected >answer to rule this out.I didn't describe it in my original post, but I had messed around a fair bit with the problem before posting my question. (I say "messed around" because I'm far from expert in numerical optimization.) Since reading your response, I've checked over some of what I did, to make sure I that I remember it correctly. Without describing everything in tedious detail, here are some of my results: First, if I start the optimization right at the solution, I get the solution back as a result. I took this as evidence that my calculation of the gradient is probably ok. (And, as I said, I get the correct solution to other problems.) If I start reasonably near the solution, optim (which I use first) reports convergence, but doesn't quite reach the solution; nlm (which starts with the parameter values produced by optim) reports a return code of 3, which corresponds to "last global step failed to locate a point lower than estimate. Either estimate is an approximate local minimum of the function or steptol is too small." Changing the steptol (and other arguments to nlm) doesn't seem to help, however. (I do have a question about the fscale and typsize arguments, which default respectively to 1 and a vector of 1's: Why are these available independent of the start values, from which they can be inferred?) So that you can get a more concrete sense of what's going on, here's a table of the different solutions (with the rows corresponding to parameters): optim nlm(1) nlm(2) start(1) start(2) lamb 4.9017942 4.9250519 5.3688830 5.0 18.6204419 gam1 -0.5382103 -0.5912515 -0.6299493 -1.0 -0.4252919 beta 0.6141337 0.6046611 0.5931075 1.0 0.5207707 gam2 -0.1992669 -0.2189711 -0.2408609 -0.2 -0.1805793 the1 3.8618249 3.5585071 3.6077990 4.0 2.0121677 the2 4.3781542 3.6819272 3.5949141 4.0 1.5921868 the3 1.6595465 2.4249510 2.9937057 3.0 2.2103193 the4 299.7290498 299.6466428 259.5756196 300.0 103.5671443 the5 1.2506907 0.8819633 0.9057823 1.0 0.5174292 psi1 5.9471507 5.8307768 5.6705004 61.0 0.8191268 psi2 4.6063684 4.5328785 4.5149762 5.0 0.6161997 phi 9.3785360 7.1702049 6.6162702 7.0 1.0000000 ------------------------------------------------------------- obj fn 55.74172 18.41530 13.48505 Here, the solutions labelled optim and nlm(1) use the supplied expression for the gradient (your point that this too is a numerical approximation seems obvious once stated, but I didn't consider it previously), while the solution labelled nlm(2) uses the default numerical derivatives; the start(1) column gives the start values that I specified "near" to the solution nlm(2); the start(2) column gives the start values that the program calculates itself if start values are not supplied; and the last row gives the values of the objective function for each solution, scaled as a chi-square statistic with 9 df. (When the start values in start(2) are used, the solutions produced by optim and nlm(1) are different from those given above, but the symptoms are the same -- e.g., optim reports convergence, nlm returns a code of 3.) I suspect that the problem is ill-conditioned in some way, but I haven't been able to figure out how. I guess that I should investigate further. I could supply other potentially relevant information, such as the hessian at the solution, but I'm reluctant to impose further on your time, or that of other list members. Thanks for your help, John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- Understanding nonlinear optimization and Rosenbrock's banana valley function?
- Sometimes having problems finding a minimum using optim(), optimize(), and nlm() (while searching for noncentral F parameters)
- nlmin
- Bug in nlm()
- Bug in nlm, found using sem; failure in several flavors (PR#13881)