thr3ads.net - R help - [R] another optimization question [Nov 2001]

If this information is useful, please help other people find it:
Share via:

John Fox

2001-Nov-25 03:57 UTC

[R] another optimization question

Dear R list members,

Since today seems to be the day for optimization questions, I have one that 
has been puzzling me:

I've been doing some work on sem, my structural-equation modelling package. 
The models that the sem function in this package fits are essentially 
parametrizations of the multinormal distribution.  The function uses optim 
and nlm sequentially to maximize a multinormal likelihood. One of the 
changes I've introduced is to use an analytic gradient rather than rely on 
numerical derivatives. (If I can figure it out, I'd like to use an analytic 
Hessian as well.)

I could provide additional details, but the question that I have is 
straightforward. I expected that using an analytic gradient would make the 
program faster and more stable. It *is* substantially faster, by up to an 
order of magnitude on the problems that I've tried. In one case, however, a 
model that converged (to the published solution) with numerical derivatives 
failed to converge with analytic derivatives. I can program around the 
problem, by having the program fall back to numerical derivatives when 
convergence fails, but I was surprised by this result, and I'm concerned 
that it reflects a programming problem or an error in my math. I suspect 
that if I had made such an error, however, the other examples I tried would 
not have worked so well.

So, my question is, is it possible in principle for an optimization to fail 
using a correct analytic gradient but to converge with a numerical 
gradient? If this is possible, is it a common occurrence?

Any help would be appreciated.

John

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian D Ripley

2001-Nov-25 08:26 UTC

head link

[R] another optimization question

On Sat, 24 Nov 2001, John Fox wrote:
> Dear R list members,
>
> Since today seems to be the day for optimization questions, I have one that
> has been puzzling me:
>
> I've been doing some work on sem, my structural-equation modelling
package.
> The models that the sem function in this package fits are essentially
> parametrizations of the multinormal distribution.  The function uses optim
> and nlm sequentially to maximize a multinormal likelihood. One of the
> changes I've introduced is to use an analytic gradient rather than rely
on
> numerical derivatives. (If I can figure it out, I'd like to use an
analytic
> Hessian as well.)
>
> I could provide additional details, but the question that I have is
> straightforward. I expected that using an analytic gradient would make the
> program faster and more stable. It *is* substantially faster, by up to an
> order of magnitude on the problems that I've tried. In one case,
however, a
> model that converged (to the published solution) with numerical derivatives
> failed to converge with analytic derivatives. I can program around the
> problem, by having the program fall back to numerical derivatives when
> convergence fails, but I was surprised by this result, and I'm
concerned
> that it reflects a programming problem or an error in my math. I suspect
> that if I had made such an error, however, the other examples I tried would
> not have worked so well.
>
> So, my question is, is it possible in principle for an optimization to fail
> using a correct analytic gradient but to converge with a numerical
> gradient? If this is possible, is it a common occurrence?
It's possible but rare.  You don't have a `correct analytic
gradient', but
a numerical computation of it.  Inaccurately computed gradients are a
common cause of convergence problems. You may need to adjust the
tolerances.

It's also possible in principle that the optimizer takes a completely
different path from the starting point due to small differences in
calculated derivatives.  It's worth trying staritng near the expected
answer to rule this out.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

John Fox

2001-Nov-25 23:20 UTC

head link

[R] another optimization question

Dear Brian

At 08:26 AM 11/25/2001 +0000, Prof Brian D Ripley wrote:>On Sat, 24 Nov 2001, John Fox wrote:
>
>. . .
>
> > So, my question is, is it possible in principle for an optimization to
fail
> > using a correct analytic gradient but to converge with a numerical
> > gradient? If this is possible, is it a common occurrence?
>
>It's possible but rare.  You don't have a `correct analytic
gradient', but
>a numerical computation of it.  Inaccurately computed gradients are a
>common cause of convergence problems. You may need to adjust the
>tolerances.
>
>It's also possible in principle that the optimizer takes a completely
>different path from the starting point due to small differences in
>calculated derivatives.  It's worth trying staritng near the expected
>answer to rule this out.
I didn't describe it in my original post, but I had messed around a fair 
bit with the problem before posting my question. (I say "messed
around"
because I'm far from expert in numerical optimization.) Since reading your 
response, I've checked over some of what I did, to make sure I that I 
remember it correctly. Without describing everything in tedious detail, 
here are some of my results:

First, if I start the optimization right at the solution, I get the 
solution back as a result. I took this as evidence that my calculation of 
the gradient is probably ok. (And, as I said, I get the correct solution to 
other problems.)

If I start reasonably near the solution, optim (which I use first) reports 
convergence, but doesn't quite reach the solution; nlm (which starts with 
the parameter values produced by optim) reports a return code of 3, which 
corresponds to "last global step failed to locate a point lower than 
estimate. Either estimate is an approximate local minimum of the function 
or steptol is too small." Changing the steptol (and other arguments to nlm)
doesn't seem to help, however. (I do have a question about the fscale and 
typsize arguments, which default respectively to 1 and a vector of 1's: Why 
are these available independent of the start values, from which they can be 
inferred?)

So that you can get a more concrete sense of what's going on, here's a 
table of the different solutions (with the rows corresponding to parameters):

        optim        nlm(1)      nlm(2)    start(1)  start(2)
lamb   4.9017942   4.9250519   5.3688830   5.0    18.6204419
gam1  -0.5382103  -0.5912515  -0.6299493  -1.0    -0.4252919
beta   0.6141337   0.6046611   0.5931075   1.0     0.5207707
gam2  -0.1992669  -0.2189711  -0.2408609  -0.2    -0.1805793
the1   3.8618249   3.5585071   3.6077990   4.0     2.0121677
the2   4.3781542   3.6819272   3.5949141   4.0     1.5921868
the3   1.6595465   2.4249510   2.9937057   3.0     2.2103193
the4 299.7290498 299.6466428 259.5756196 300.0   103.5671443
the5   1.2506907   0.8819633   0.9057823   1.0     0.5174292
psi1   5.9471507   5.8307768   5.6705004  61.0     0.8191268
psi2   4.6063684   4.5328785   4.5149762   5.0     0.6161997
phi    9.3785360   7.1702049   6.6162702   7.0     1.0000000
-------------------------------------------------------------
obj fn 55.74172    18.41530    13.48505

Here, the solutions labelled optim and nlm(1) use the supplied expression 
for the gradient (your point that this too is a numerical approximation 
seems obvious once stated, but I didn't consider it previously), while the 
solution labelled nlm(2) uses the default numerical derivatives; the 
start(1) column gives the start values that I specified "near" to the 
solution nlm(2); the start(2) column gives the start values that the 
program calculates itself if start values are not supplied; and the last 
row gives the values of the objective function for each solution, scaled as 
a chi-square statistic with 9 df. (When the start values in start(2) are 
used, the solutions produced by optim and nlm(1) are different from those 
given above, but the symptoms are the same -- e.g., optim reports 
convergence, nlm returns a code of 3.)

I suspect that the problem is ill-conditioned in some way, but I haven't 
been able to figure out how. I guess that I should investigate further. I 
could supply other potentially relevant information, such as the hessian at 
the solution, but I'm reluctant to impose further on your time, or that of 
other list members.

Thanks for your help,
  John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more reasonably related threads

R help - Nov 2001 - another optimization question

[R] another optimization question

[R] another optimization question

[R] another optimization question

Maybe Matching Threads