thr3ads.net - R devel - [Rd] Optimization in R [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Andrew Clausen

2007-Aug-04 05:12 UTC

[Rd] Optimization in R

Hi all,

I've been working on improving R's optim() command, which does general
purpose
unconstrained optimization.  Obviously, this is important for many statistics
computations, such as maximum likelihood, method of moments, etc.  I have
focused my efforts of the BFGS method, mainly because it best matches my
current projects.

Here's a quick summary of what I've done:
 * implemented my own version of BFGS in R,
	http://www.econ.upenn.edu/~clausen/computing/bfgs.zip
 * written a wrapper for the GNU Scientific Library's optimization function,
multimin(),
	http://www.econ.upenn.edu/~clausen/computing/multimin.zip
 * written some tricky functions to compare implementations,
	http://www.econ.upenn.edu/~clausen/computing/tests.zip

My own implementation has several advantages over optim()'s implementation
(which you can see in the vmmin() function in
	https://svn.r-project.org/R/trunk/src/main/optim.c)
 * the linesearch algorithm (More-Thuente) quickly finds a region of interest
to zoom into.  Moreover, it strikes a much better balance between finding
a point that adequately improves upon the old point, but doesn't waste too
much time finding a much better point.  (Unlike optim(), it uses the standard
Wolfe conditions with weak parameters.)
 * the linesearch algorithm uses interpolation, so it finds an acceptable
point more quickly.
 * implements "box" constraints.
 * easier to understand and modify the code, partly because it's written in
R.

Of course, this comes at the (slight?) overhead cost of being written in R.

The test suite above takes the first few functions from the paper

        Mor??, Garbow, and Hillstrom, "Testing Unconstrained
        Optimization Software", ACM Trans Math Softw 7:1 (March 1981)

The test results appear below, where "*" means "computed the
right solution",
and "!" means "got stuck".

test                    optim           clausen         gsl
--------------------------------------------------------------
bard                                                    !
beale
brown-scaled
freudenstein-roth
gaussian                                                *
helical-valley          *               *
jennrich-sampson                                        *
meyer                                                   *
powell-scaled                           *
rosenbrock                              *

The table indiciates that all three implementations of BFGS failed to compute
the right answer in most cases.  I suppose this means they are all quite
deficient.  Of course, this doesn't imply that they perform badly on real
statistics problems -- but in my limited experience with my crude econometric
models, they do perform badly.   Indeed, that's why I started investigating
in
the first place.

For what it's worth, I think:
 * the optimization algorithms should be written in R -- the overhead is
small compared to the cost of evaluating likelihood functions anyway, and is
easily made up by the better algorithms that are possible.
 * it would be useful to keep a repository of interesting optimization
problems relevant to R users.  Then R developers can evaluate
"improvements".

Cheers,
Andrew

Duncan Murdoch

2007-Aug-04 13:25 UTC

head link

[Rd] Optimization in R

On 04/08/2007 1:12 AM, Andrew Clausen wrote:> Hi all,
> 
> I've been working on improving R's optim() command, which does
general purpose
> unconstrained optimization.  Obviously, this is important for many
statistics
> computations, such as maximum likelihood, method of moments, etc.  I have
> focused my efforts of the BFGS method, mainly because it best matches my
> current projects.
> 
> Here's a quick summary of what I've done:
>  * implemented my own version of BFGS in R,
> 	http://www.econ.upenn.edu/~clausen/computing/bfgs.zip
>  * written a wrapper for the GNU Scientific Library's optimization
function,
> multimin(),
> 	http://www.econ.upenn.edu/~clausen/computing/multimin.zip
>  * written some tricky functions to compare implementations,
> 	http://www.econ.upenn.edu/~clausen/computing/tests.zip
> 
> My own implementation has several advantages over optim()'s
implementation
> (which you can see in the vmmin() function in
> 	https://svn.r-project.org/R/trunk/src/main/optim.c)
>  * the linesearch algorithm (More-Thuente) quickly finds a region of
interest
> to zoom into.  Moreover, it strikes a much better balance between finding
> a point that adequately improves upon the old point, but doesn't waste
too
> much time finding a much better point.  (Unlike optim(), it uses the
standard
> Wolfe conditions with weak parameters.)
>  * the linesearch algorithm uses interpolation, so it finds an acceptable
> point more quickly.
>  * implements "box" constraints.
>  * easier to understand and modify the code, partly because it's
written in R.
> 
> Of course, this comes at the (slight?) overhead cost of being written in R.
> 
> The test suite above takes the first few functions from the paper
> 
>         Mor?, Garbow, and Hillstrom, "Testing Unconstrained
>         Optimization Software", ACM Trans Math Softw 7:1 (March 1981)
> 
> The test results appear below, where "*" means "computed the
right solution",
> and "!" means "got stuck".
> 
> test                    optim           clausen         gsl
> --------------------------------------------------------------
> bard                                                    !
> beale
> brown-scaled
> freudenstein-roth
> gaussian                                                *
> helical-valley          *               *
> jennrich-sampson                                        *
> meyer                                                   *
> powell-scaled                           *
> rosenbrock                              *
> 
> The table indiciates that all three implementations of BFGS failed to
compute
> the right answer in most cases.  I suppose this means they are all quite
> deficient.  Of course, this doesn't imply that they perform badly on
real
> statistics problems -- but in my limited experience with my crude
econometric
> models, they do perform badly.   Indeed, that's why I started
investigating in
> the first place.
> 
> For what it's worth, I think:
>  * the optimization algorithms should be written in R -- the overhead is
> small compared to the cost of evaluating likelihood functions anyway, and
is
> easily made up by the better algorithms that are possible.
>  * it would be useful to keep a repository of interesting optimization
> problems relevant to R users.  Then R developers can evaluate
"improvements".
This is interesting work; thanks for doing it.  Could I make a 
suggestion?  Why not put together a package containing those test 
optimization problems, and offer to include other interesting ones as 
they arise?  You could also include your wrapper for the gsl function 
and your own improvements to optim.

On your first point:  I agree that a prototype implementation in R makes 
sense, but I suspect there exist problems where the overhead would not 
be negligible (e.g. ones where the user has written the objective 
function in C for speed).  So I think you should keep in mind the 
possibility of moving the core of your improvements to C once you are 
happy with them.

Duncan Murdoch

P.S. I dropped the help-gsl mailing list from the distribution, since my 
post is mainly about R.

Andrew Clausen

2007-Aug-04 14:02 UTC

head link

[Rd] Optimization in R

Hi Pat,

On Sat, Aug 04, 2007 at 09:59:57AM +0100, Patrick Burns
wrote:> Sounds like a good project.
Thanks :)
> How much extra overhead are you getting from the
> algorithm being in R?
On the Rosenbrock function (which is very quick to evaluate), here are the
system.time() results:
> system.time(bfgs(x0, f, g))
[1] 0.148 0.000 0.149 0.000 0.000> system.time(optim(x0, f, g, method="BFGS"))[1] 0.008 0.000 0.008 0.000 0.000

and the function evaluation counts:
> bfgs(x0, f, g)$countsfunction gradient 
      95       58 > optim(x0, f, g, method="BFGS")$countsfunction gradient 
     318      100 

So the overhead is clearly much bigger, but still too small to matter for
most (?) applications.

Cheers,
Andrew

PS, my computer is a "Intel(R) Pentium(R) 4 CPU 2.80GHz" with a
1024 KB cache, according to /proc/cpuinfo.

Nicholas Lewin-Koh

2007-Aug-04 15:10 UTC

head link

[Rd] Optimization in R

Hi Andrew,
I have been working quite a bit with optim and friends on automated
nonlinear fitting, mainly for calibration. All of the optimizers
seem to have trouble in the tricky situation of fitting a log-logistic 
model when the upper asymptote is not well defined, and trying to
estimate a variance
parameter. Granted this is not necessarily the optimizer, but a
combination
of objective function, convergence criteria, scaling, ... 
But my point is that for automated fitting where many starting values,
iterations with
pseudo likelihood for the variance function, over multiple curves, the
overhead of
the function written in R vs C can become non-negligible for "simple"
univariate functions. 
So I have to agree with Duncan, that R is very good for prototyping but
C would be preferable.

I can contribute some "hard" problems if you start the package.

Nicholas> Date: Sat, 4 Aug 2007 01:12:31 -0400
> From: Andrew Clausen <clausen at econ.upenn.edu>
> Subject: [Rd] Optimization in R
> To: r-devel at r-project.org
> Cc: help-gsl at gnu.org
> Message-ID: <20070804051231.GB3016 at econ.upenn.edu>
> Content-Type: text/plain; charset=unknown-8bit
> 
> Hi all,
> 
> I've been working on improving R's optim() command, which does
general
> purpose
> unconstrained optimization.  Obviously, this is important for many
> statistics
> computations, such as maximum likelihood, method of moments, etc.  I have
> focused my efforts of the BFGS method, mainly because it best matches my
> current projects.
> 
> Here's a quick summary of what I've done:
>  * implemented my own version of BFGS in R,
> 	http://www.econ.upenn.edu/~clausen/computing/bfgs.zip
>  * written a wrapper for the GNU Scientific Library's optimization
>  function,
> multimin(),
> 	http://www.econ.upenn.edu/~clausen/computing/multimin.zip
>  * written some tricky functions to compare implementations,
> 	http://www.econ.upenn.edu/~clausen/computing/tests.zip
> 
> My own implementation has several advantages over optim()'s
> implementation
> (which you can see in the vmmin() function in
> 	https://svn.r-project.org/R/trunk/src/main/optim.c)
>  * the linesearch algorithm (More-Thuente) quickly finds a region of
>  interest
> to zoom into.  Moreover, it strikes a much better balance between finding
> a point that adequately improves upon the old point, but doesn't waste
> too
> much time finding a much better point.  (Unlike optim(), it uses the
> standard
> Wolfe conditions with weak parameters.)
>  * the linesearch algorithm uses interpolation, so it finds an acceptable
> point more quickly.
>  * implements "box" constraints.
>  * easier to understand and modify the code, partly because it's
written
>  in R.
> 
> Of course, this comes at the (slight?) overhead cost of being written in
> R.
> 
> The test suite above takes the first few functions from the paper
> 
>         Mor??, Garbow, and Hillstrom, "Testing Unconstrained
>         Optimization Software", ACM Trans Math Softw 7:1 (March 1981)
> 
> The test results appear below, where "*" means "computed the
right
> solution",
> and "!" means "got stuck".
> 
> test                    optim           clausen         gsl
> --------------------------------------------------------------
> bard                                                    !
> beale
> brown-scaled
> freudenstein-roth
> gaussian                                                *
> helical-valley          *               *
> jennrich-sampson                                        *
> meyer                                                   *
> powell-scaled                           *
> rosenbrock                              *
> 
> The table indiciates that all three implementations of BFGS failed to
> compute
> the right answer in most cases.  I suppose this means they are all quite
> deficient.  Of course, this doesn't imply that they perform badly on
real
> statistics problems -- but in my limited experience with my crude
> econometric
> models, they do perform badly.   Indeed, that's why I started
> investigating in
> the first place.
> 
> For what it's worth, I think:
>  * the optimization algorithms should be written in R -- the overhead is
> small compared to the cost of evaluating likelihood functions anyway, and
> is
> easily made up by the better algorithms that are possible.
>  * it would be useful to keep a repository of interesting optimization
> problems relevant to R users.  Then R developers can evaluate
> "improvements".
> 
> Cheers,
> Andrew
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-devel at r-project.org mailing list  DIGESTED
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> End of R-devel Digest, Vol 54, Issue 4
> **************************************

Nicholas Lewin-Koh

2007-Aug-04 18:44 UTC

head link

[Rd] Optimization in R

Hi,
In my earlier post I eluded to a situation where that would be useful. 
In nlme, there is a choice of optimizers, minpack.lm has
Levenberg-Marquardt,
while nlminb has the port routines. For the same starting values,
different
optimizers will present different solutions, having a common interface
would make fitting with multiple optimizers very attractive. 

Also the inverse Hessian would be useful, for cases where the Hessian is
ill conditioned
a little regularization goes a long way. I believe the package accuracy 
has a nice solution.

Nicholas

On 04/08/2007 2:23 PM, Gabor Grothendieck wrote:> For the same reason that generic functions exist.  They don't have
> a lot of common code but it makes easier to use.  Perhaps the argument
> is not as strong here since the class tends to be implicit whereas the
> method is explicit but it would still be a convenience.
Can you give other examples where we do this?  The ones I can think of 
(graphics drivers and finalizers) involve a large amount of common 
machinery that it's difficult or impossible for the user to duplicate. 
That's not the case here.

Duncan Murdoch

Christophe Pouzat

2007-Aug-05 10:57 UTC

head link

[Rd] Optimization in R

Hi guys,

I ran into the same problem and solved it by changing one line in Andrew's
Makefile:

instead of:
gsl.so: vector.o multimin.o
    gcc -g -Wall -shared $^ -lgsl -o $@

use:
gsl.so: vector.o multimin.o
    gcc -g -Wall -shared $^  -o $@ -lgsl -lgslcblas -lm

Christophe


Thanks for the functions!>
> I tried installing the multimin function. To get it to compile, I needed
> to change the Makefile to reflect my path and by adding the flags fPIC
> in response to the error: "/usr/bin/ld: vector.o: relocation
R_X86_64_32
> against `a local symbol' can not be used when making a shared object;
> recompile with -fPIC"
>
> However, I get the following running test.R:
>
> Error in dyn.load(x, as.logical(local), as.logical(now)) :
>        unable to load shared library '/home/mmorales/Desktop
> /multimin/gsl.so':
>  /usr/lib64/libgsl.so.0: undefined symbol: cblas_ctrmv
>
> I'm running R-2.5.1 compiled for x86_64 with a custom built ATLAS.
>

-- 
A Master Carpenter has many tools and is expert with most of them. If you
only know how to use a hammer, every problem starts to look like a nail.
Stay away from that trap.
Richard B Johnson.
--

Christophe Pouzat
Laboratoire de Physiologie Cerebrale
CNRS UMR 8118
UFR biomedicale de l'Universite Paris V
45, rue des Saints Peres
75006 PARIS
France

tel: +33 (0)1 42 86 38 28
fax: +33 (0)1 42 86 38 30

	[[alternative HTML version deleted]]

Petr Savicky

2007-Aug-06 06:48 UTC

head link

[Rd] Optimization in R

I am sorry for omitting a citation in my previous post.
The complete message is as follows (my text unchanged). PS

I would like to add a remark and a question.

Remark.

There is a part of R, which allows the user to select
among several methods for the same task and also to add
his own C code: random number generation. However, the interface
for optimization is more complex. In my opinion, looking
for a unified interface for this is desirable, but it is
a research problem, not a suggestion for an immediate
code modification.

Question.

Is there a way how to optimize a function written in C
using optim? This would be very useful, if the optimization
needs a lot of iterations. This may be done by defining 
an R function, which does nothing more than calling .C with
appropriate parameters,
but this looses efficiency. A more efficient solution
could be adding a specified entry point (or several, if derivatives
are also available), similar as in the user defined random number
generator. Then, a parameter of optim could control, whether
the function to be optimized is fn or the C entry point.

Petr Savicky.

On Sat, Aug 04, 2007 at 06:56:47PM -0400, Gabor Grothendieck
wrote:> I don't have an example of that but that does not make it less
> desirable.  If one wants to use method 1, 2 or 3 then one can
> use optim with a method= but if one wants to use methods 4
> or 5 then one must use an entirely different function.  Surely
> it would be better to be consistent from the user's viewpoint
> and allow all of them to work consistently through the same
> interface.
> 
> On 8/4/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > On 04/08/2007 2:53 PM, Gabor Grothendieck wrote:
> > > The example of generic functions.
> >
> > Show me an example where we have a list of ways to do a calculation
> > passed as an argument (analogous to the method argument of optim),
where
> > the user is allowed to add his own function to the list.
> >
> > Duncan Murdoch
> > >

Duncan Murdoch

2007-Aug-07 10:33 UTC

head link

[Rd] Optimization in R

Those are small parts of the calculation, not the whole thing.  The 
original point was that optim() is a very thin wrapper around the code 
to do the optimization.  I just don't see a need to make it more 
complicated so it can be used to wrap other methods.  Authors of new 
optimization methods can just create new functions, following the 
pattern set by optim(), and it will be easier for almost everyone.

Duncan Murdoch

On 06/08/2007 11:18 PM, Andrew Robinson wrote:>  ... Variance and correlation model classes in nlme.
> 
> Cheers
> 
> Andrew
> 
> On Mon, Aug 06, 2007 at 09:55:38PM -0500, hadley wickham wrote:
>> On 8/4/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>> On 04/08/2007 2:53 PM, Gabor Grothendieck wrote:
>>>> The example of generic functions.
>>> Show me an example where we have a list of ways to do a calculation
>>> passed as an argument (analogous to the method argument of optim),
where
>>> the user is allowed to add his own function to the list.
>> Bin width selection in hist?  Family functions for glm?  Those come
>> quickly to my mind, but I'm sure there are others.
>>
>> Hadley
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Apparently Analagous Threads

Search for more maybe matching threads

R devel - Aug 2007 - Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

[Rd] Optimization in R

Apparently Analagous Threads