Hi all, I've been working on improving R's optim() command, which does general purpose unconstrained optimization. Obviously, this is important for many statistics computations, such as maximum likelihood, method of moments, etc. I have focused my efforts of the BFGS method, mainly because it best matches my current projects. Here's a quick summary of what I've done: * implemented my own version of BFGS in R, http://www.econ.upenn.edu/~clausen/computing/bfgs.zip * written a wrapper for the GNU Scientific Library's optimization function, multimin(), http://www.econ.upenn.edu/~clausen/computing/multimin.zip * written some tricky functions to compare implementations, http://www.econ.upenn.edu/~clausen/computing/tests.zip My own implementation has several advantages over optim()'s implementation (which you can see in the vmmin() function in https://svn.r-project.org/R/trunk/src/main/optim.c) * the linesearch algorithm (More-Thuente) quickly finds a region of interest to zoom into. Moreover, it strikes a much better balance between finding a point that adequately improves upon the old point, but doesn't waste too much time finding a much better point. (Unlike optim(), it uses the standard Wolfe conditions with weak parameters.) * the linesearch algorithm uses interpolation, so it finds an acceptable point more quickly. * implements "box" constraints. * easier to understand and modify the code, partly because it's written in R. Of course, this comes at the (slight?) overhead cost of being written in R. The test suite above takes the first few functions from the paper Mor??, Garbow, and Hillstrom, "Testing Unconstrained Optimization Software", ACM Trans Math Softw 7:1 (March 1981) The test results appear below, where "*" means "computed the right solution", and "!" means "got stuck". test optim clausen gsl -------------------------------------------------------------- bard ! beale brown-scaled freudenstein-roth gaussian * helical-valley * * jennrich-sampson * meyer * powell-scaled * rosenbrock * The table indiciates that all three implementations of BFGS failed to compute the right answer in most cases. I suppose this means they are all quite deficient. Of course, this doesn't imply that they perform badly on real statistics problems -- but in my limited experience with my crude econometric models, they do perform badly. Indeed, that's why I started investigating in the first place. For what it's worth, I think: * the optimization algorithms should be written in R -- the overhead is small compared to the cost of evaluating likelihood functions anyway, and is easily made up by the better algorithms that are possible. * it would be useful to keep a repository of interesting optimization problems relevant to R users. Then R developers can evaluate "improvements". Cheers, Andrew
On 04/08/2007 1:12 AM, Andrew Clausen wrote:> Hi all, > > I've been working on improving R's optim() command, which does general purpose > unconstrained optimization. Obviously, this is important for many statistics > computations, such as maximum likelihood, method of moments, etc. I have > focused my efforts of the BFGS method, mainly because it best matches my > current projects. > > Here's a quick summary of what I've done: > * implemented my own version of BFGS in R, > http://www.econ.upenn.edu/~clausen/computing/bfgs.zip > * written a wrapper for the GNU Scientific Library's optimization function, > multimin(), > http://www.econ.upenn.edu/~clausen/computing/multimin.zip > * written some tricky functions to compare implementations, > http://www.econ.upenn.edu/~clausen/computing/tests.zip > > My own implementation has several advantages over optim()'s implementation > (which you can see in the vmmin() function in > https://svn.r-project.org/R/trunk/src/main/optim.c) > * the linesearch algorithm (More-Thuente) quickly finds a region of interest > to zoom into. Moreover, it strikes a much better balance between finding > a point that adequately improves upon the old point, but doesn't waste too > much time finding a much better point. (Unlike optim(), it uses the standard > Wolfe conditions with weak parameters.) > * the linesearch algorithm uses interpolation, so it finds an acceptable > point more quickly. > * implements "box" constraints. > * easier to understand and modify the code, partly because it's written in R. > > Of course, this comes at the (slight?) overhead cost of being written in R. > > The test suite above takes the first few functions from the paper > > Mor?, Garbow, and Hillstrom, "Testing Unconstrained > Optimization Software", ACM Trans Math Softw 7:1 (March 1981) > > The test results appear below, where "*" means "computed the right solution", > and "!" means "got stuck". > > test optim clausen gsl > -------------------------------------------------------------- > bard ! > beale > brown-scaled > freudenstein-roth > gaussian * > helical-valley * * > jennrich-sampson * > meyer * > powell-scaled * > rosenbrock * > > The table indiciates that all three implementations of BFGS failed to compute > the right answer in most cases. I suppose this means they are all quite > deficient. Of course, this doesn't imply that they perform badly on real > statistics problems -- but in my limited experience with my crude econometric > models, they do perform badly. Indeed, that's why I started investigating in > the first place. > > For what it's worth, I think: > * the optimization algorithms should be written in R -- the overhead is > small compared to the cost of evaluating likelihood functions anyway, and is > easily made up by the better algorithms that are possible. > * it would be useful to keep a repository of interesting optimization > problems relevant to R users. Then R developers can evaluate "improvements".This is interesting work; thanks for doing it. Could I make a suggestion? Why not put together a package containing those test optimization problems, and offer to include other interesting ones as they arise? You could also include your wrapper for the gsl function and your own improvements to optim. On your first point: I agree that a prototype implementation in R makes sense, but I suspect there exist problems where the overhead would not be negligible (e.g. ones where the user has written the objective function in C for speed). So I think you should keep in mind the possibility of moving the core of your improvements to C once you are happy with them. Duncan Murdoch P.S. I dropped the help-gsl mailing list from the distribution, since my post is mainly about R.
Hi Pat, On Sat, Aug 04, 2007 at 09:59:57AM +0100, Patrick Burns wrote:> Sounds like a good project.Thanks :)> How much extra overhead are you getting from the > algorithm being in R?On the Rosenbrock function (which is very quick to evaluate), here are the system.time() results:> system.time(bfgs(x0, f, g))[1] 0.148 0.000 0.149 0.000 0.000> system.time(optim(x0, f, g, method="BFGS"))[1] 0.008 0.000 0.008 0.000 0.000 and the function evaluation counts:> bfgs(x0, f, g)$countsfunction gradient 95 58> optim(x0, f, g, method="BFGS")$countsfunction gradient 318 100 So the overhead is clearly much bigger, but still too small to matter for most (?) applications. Cheers, Andrew PS, my computer is a "Intel(R) Pentium(R) 4 CPU 2.80GHz" with a 1024 KB cache, according to /proc/cpuinfo.
Hi Andrew, I have been working quite a bit with optim and friends on automated nonlinear fitting, mainly for calibration. All of the optimizers seem to have trouble in the tricky situation of fitting a log-logistic model when the upper asymptote is not well defined, and trying to estimate a variance parameter. Granted this is not necessarily the optimizer, but a combination of objective function, convergence criteria, scaling, ... But my point is that for automated fitting where many starting values, iterations with pseudo likelihood for the variance function, over multiple curves, the overhead of the function written in R vs C can become non-negligible for "simple" univariate functions. So I have to agree with Duncan, that R is very good for prototyping but C would be preferable. I can contribute some "hard" problems if you start the package. Nicholas> Date: Sat, 4 Aug 2007 01:12:31 -0400 > From: Andrew Clausen <clausen at econ.upenn.edu> > Subject: [Rd] Optimization in R > To: r-devel at r-project.org > Cc: help-gsl at gnu.org > Message-ID: <20070804051231.GB3016 at econ.upenn.edu> > Content-Type: text/plain; charset=unknown-8bit > > Hi all, > > I've been working on improving R's optim() command, which does general > purpose > unconstrained optimization. Obviously, this is important for many > statistics > computations, such as maximum likelihood, method of moments, etc. I have > focused my efforts of the BFGS method, mainly because it best matches my > current projects. > > Here's a quick summary of what I've done: > * implemented my own version of BFGS in R, > http://www.econ.upenn.edu/~clausen/computing/bfgs.zip > * written a wrapper for the GNU Scientific Library's optimization > function, > multimin(), > http://www.econ.upenn.edu/~clausen/computing/multimin.zip > * written some tricky functions to compare implementations, > http://www.econ.upenn.edu/~clausen/computing/tests.zip > > My own implementation has several advantages over optim()'s > implementation > (which you can see in the vmmin() function in > https://svn.r-project.org/R/trunk/src/main/optim.c) > * the linesearch algorithm (More-Thuente) quickly finds a region of > interest > to zoom into. Moreover, it strikes a much better balance between finding > a point that adequately improves upon the old point, but doesn't waste > too > much time finding a much better point. (Unlike optim(), it uses the > standard > Wolfe conditions with weak parameters.) > * the linesearch algorithm uses interpolation, so it finds an acceptable > point more quickly. > * implements "box" constraints. > * easier to understand and modify the code, partly because it's written > in R. > > Of course, this comes at the (slight?) overhead cost of being written in > R. > > The test suite above takes the first few functions from the paper > > Mor??, Garbow, and Hillstrom, "Testing Unconstrained > Optimization Software", ACM Trans Math Softw 7:1 (March 1981) > > The test results appear below, where "*" means "computed the right > solution", > and "!" means "got stuck". > > test optim clausen gsl > -------------------------------------------------------------- > bard ! > beale > brown-scaled > freudenstein-roth > gaussian * > helical-valley * * > jennrich-sampson * > meyer * > powell-scaled * > rosenbrock * > > The table indiciates that all three implementations of BFGS failed to > compute > the right answer in most cases. I suppose this means they are all quite > deficient. Of course, this doesn't imply that they perform badly on real > statistics problems -- but in my limited experience with my crude > econometric > models, they do perform badly. Indeed, that's why I started > investigating in > the first place. > > For what it's worth, I think: > * the optimization algorithms should be written in R -- the overhead is > small compared to the cost of evaluating likelihood functions anyway, and > is > easily made up by the better algorithms that are possible. > * it would be useful to keep a repository of interesting optimization > problems relevant to R users. Then R developers can evaluate > "improvements". > > Cheers, > Andrew > > > > ------------------------------ > > _______________________________________________ > R-devel at r-project.org mailing list DIGESTED > https://stat.ethz.ch/mailman/listinfo/r-devel > > > End of R-devel Digest, Vol 54, Issue 4 > **************************************
Hi, In my earlier post I eluded to a situation where that would be useful. In nlme, there is a choice of optimizers, minpack.lm has Levenberg-Marquardt, while nlminb has the port routines. For the same starting values, different optimizers will present different solutions, having a common interface would make fitting with multiple optimizers very attractive. Also the inverse Hessian would be useful, for cases where the Hessian is ill conditioned a little regularization goes a long way. I believe the package accuracy has a nice solution. Nicholas On 04/08/2007 2:23 PM, Gabor Grothendieck wrote:> For the same reason that generic functions exist. They don't have > a lot of common code but it makes easier to use. Perhaps the argument > is not as strong here since the class tends to be implicit whereas the > method is explicit but it would still be a convenience.Can you give other examples where we do this? The ones I can think of (graphics drivers and finalizers) involve a large amount of common machinery that it's difficult or impossible for the user to duplicate. That's not the case here. Duncan Murdoch
Hi guys, I ran into the same problem and solved it by changing one line in Andrew's Makefile: instead of: gsl.so: vector.o multimin.o gcc -g -Wall -shared $^ -lgsl -o $@ use: gsl.so: vector.o multimin.o gcc -g -Wall -shared $^ -o $@ -lgsl -lgslcblas -lm Christophe Thanks for the functions!> > I tried installing the multimin function. To get it to compile, I needed > to change the Makefile to reflect my path and by adding the flags fPIC > in response to the error: "/usr/bin/ld: vector.o: relocation R_X86_64_32 > against `a local symbol' can not be used when making a shared object; > recompile with -fPIC" > > However, I get the following running test.R: > > Error in dyn.load(x, as.logical(local), as.logical(now)) : > unable to load shared library '/home/mmorales/Desktop > /multimin/gsl.so': > /usr/lib64/libgsl.so.0: undefined symbol: cblas_ctrmv > > I'm running R-2.5.1 compiled for x86_64 with a custom built ATLAS. >-- A Master Carpenter has many tools and is expert with most of them. If you only know how to use a hammer, every problem starts to look like a nail. Stay away from that trap. Richard B Johnson. -- Christophe Pouzat Laboratoire de Physiologie Cerebrale CNRS UMR 8118 UFR biomedicale de l'Universite Paris V 45, rue des Saints Peres 75006 PARIS France tel: +33 (0)1 42 86 38 28 fax: +33 (0)1 42 86 38 30 [[alternative HTML version deleted]]
I am sorry for omitting a citation in my previous post. The complete message is as follows (my text unchanged). PS I would like to add a remark and a question. Remark. There is a part of R, which allows the user to select among several methods for the same task and also to add his own C code: random number generation. However, the interface for optimization is more complex. In my opinion, looking for a unified interface for this is desirable, but it is a research problem, not a suggestion for an immediate code modification. Question. Is there a way how to optimize a function written in C using optim? This would be very useful, if the optimization needs a lot of iterations. This may be done by defining an R function, which does nothing more than calling .C with appropriate parameters, but this looses efficiency. A more efficient solution could be adding a specified entry point (or several, if derivatives are also available), similar as in the user defined random number generator. Then, a parameter of optim could control, whether the function to be optimized is fn or the C entry point. Petr Savicky. On Sat, Aug 04, 2007 at 06:56:47PM -0400, Gabor Grothendieck wrote:> I don't have an example of that but that does not make it less > desirable. If one wants to use method 1, 2 or 3 then one can > use optim with a method= but if one wants to use methods 4 > or 5 then one must use an entirely different function. Surely > it would be better to be consistent from the user's viewpoint > and allow all of them to work consistently through the same > interface. > > On 8/4/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote: > > On 04/08/2007 2:53 PM, Gabor Grothendieck wrote: > > > The example of generic functions. > > > > Show me an example where we have a list of ways to do a calculation > > passed as an argument (analogous to the method argument of optim), where > > the user is allowed to add his own function to the list. > > > > Duncan Murdoch > > >
Those are small parts of the calculation, not the whole thing. The original point was that optim() is a very thin wrapper around the code to do the optimization. I just don't see a need to make it more complicated so it can be used to wrap other methods. Authors of new optimization methods can just create new functions, following the pattern set by optim(), and it will be easier for almost everyone. Duncan Murdoch On 06/08/2007 11:18 PM, Andrew Robinson wrote:> ... Variance and correlation model classes in nlme. > > Cheers > > Andrew > > On Mon, Aug 06, 2007 at 09:55:38PM -0500, hadley wickham wrote: >> On 8/4/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote: >>> On 04/08/2007 2:53 PM, Gabor Grothendieck wrote: >>>> The example of generic functions. >>> Show me an example where we have a list of ways to do a calculation >>> passed as an argument (analogous to the method argument of optim), where >>> the user is allowed to add his own function to the list. >> Bin width selection in hist? Family functions for glm? Those come >> quickly to my mind, but I'm sure there are others. >> >> Hadley >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >