benedikt.gehr at ieu.uzh.ch
2010-Jun-16 05:35 UTC
[R] an alternative to R for nonlinear stat models
Hi I implemented the age-structure model in Gove et al (2002) in R, which is a nonlinear statistical model. However running the model in R was very slow. So Dave Fournier suggested to use the AD Model Builder Software package and helped me implement the model there. ADMB was incredibly fast in running the model: While running the model in R took 5-10 minutes, depending on the settings, in ADMB it took 1-2 seconds! I'm reporting this so that people who have performance issues with nonlinear statistical models in R will know that there is a good free alternative for more difficult problems. There is also a help platfrom equivalent to the one for R, and people running it are extremley helpful. I hope this might help someone cheers Beni
On 06/16/2010 07:35 AM, benedikt.gehr at ieu.uzh.ch wrote:> Hi > I implemented the age-structure model in Gove et al (2002) in R, which is a > nonlinear statistical model. However running the model in R was very slow. > So Dave Fournier suggested to use the AD Model Builder Software package and > helped me implement the model there. > ADMB was incredibly fast in running the model: > While running the model in R took 5-10 minutes, depending on the settings, > in ADMB it took 1-2 seconds! > I'm reporting this so that people who have performance issues with nonlinear > statistical models in R will know that there is a good free alternative for > more difficult problems. > There is also a help platfrom equivalent to the one for R, and people > running it are extremley helpful. > I hope this might help someone > cheers > Beni > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Hi Beni, Thanks for posting information that might be useful for people on the list. The only thing is that without a little more detail on how exactly you implemented things in R, we are left to guess if the performance issues are a problem of R, or that your particular implementation was the problem. There are was of implementing R code in two ways, where the first takes minutes and the second 1-2 seconds. Furthermore, you are giving us no option to defend R ;). cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
I'd echo Paul's sentiments that we need to know where and when and how R goes slowly. I've been working on optimization in many environments for many years (try ?optim and you'll find me mentioned!). So has Dave Fournier. AD Model Builder has some real strengths and we need Automatic Differentiation in R -- hence the Google Summer of Code project in which I'm mentoring Chillu (see the R Wiki for some info on this). But it will be a while before we really have really good interfaces to such capability. There is already some capability for ADMB. Dave F. may even agree with me when I suggest that most users find ADMB and indeed most optimization tools quite an effort to use. Moreover, there's a general feeling that to "go fast" you need to "go C". Yet my package Rcgmin is all in R, and does large-n optimization via a more modern CG method than optim's CG (I've primary rights to complain as the latter is based on my own code.) Yet on some fairly common test problems it often goes extremely fast. We're talking 3 seconds to minimize a function of 5000 parameters on an Asus netbook (with analytic derivatives). BUT ... sometimes I can slow it down by just changing the starting vector. We're statisticians -- so let's make sure the sample is large enough and representative enough. But my main messages here: - we need to know about the problems and have them available to use as tests - we should aim for easy-to-use and consistent interfaces to the R tools so we are not having to constantly write "glue" code to get things to work and additionally put ourselves at risk of introducing errors. JN> Message: 119 Date: Wed, 16 Jun 2010 09:36:24 +0200 > From: Paul Hiemstra <p.hiemstra at geo.uu.nl> > To: benedikt.gehr at ieu.uzh.ch Cc: r-help at r-project.org, davef at otter-rsch.com Subject: > Re: [R] an alternative to R for nonlinear stat models > Message-ID: <4C187EF8.1050005 at geo.uu.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > On 06/16/2010 07:35 AM, benedikt.gehr at ieu.uzh.ch wrote: >> > Hi >> > I implemented the age-structure model in Gove et al (2002) in R, which is a >> > nonlinear statistical model. However running the model in R was very slow. >> > So Dave Fournier suggested to use the AD Model Builder Software package and >> > helped me implement the model there. >> > ADMB was incredibly fast in running the model: >> > While running the model in R took 5-10 minutes, depending on the settings, >> > in ADMB it took 1-2 seconds! >> > I'm reporting this so that people who have performance issues with nonlinear >> > statistical models in R will know that there is a good free alternative for >> > more difficult problems. >> > There is also a help platfrom equivalent to the one for R, and people >> > running it are extremley helpful. >> > I hope this might help someone >> > cheers >> > Beni >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > Hi Beni, > > Thanks for posting information that might be useful for people on the > list. The only thing is that without a little more detail on how exactly > you implemented things in R, we are left to guess if the performance > issues are a problem of R, or that your particular implementation was > the problem. There are was of implementing R code in two ways, where the > first takes minutes and the second 1-2 seconds. Furthermore, you are > giving us no option to defend R ;) . > > cheers, > Paul > > -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 ------------------------------
As far as I can tell, Gove et al. (2002) might be a good example for benchmarking the optimization performance of R vs. ADMB. It would be great if expert R users/developers could tweak Beni's model so that the performance comparison is valid. The main purpose is not to see which is faster or more reliable, but to quantify how much performance is gained by moving from R to ADMB. I have just typed and uploaded an excerpt from an old but thorough benchmark, where ADMB outperformed Gauss, Matlab, and S-Plus (http://admb-project.org/community/benchmarks/optimization). The benchmark was performed in 1997 and an update would be very valuable, as software and hardware have improved since then. R is an interpreted language that can do just about anything. ADMB is a compiled language (thin layer on top of C++) that does optimization and nothing else. I don't think it is a realistic goal for R to match the optimizing performance of ADMB. I use R for most of my work, but crunch numbers with ADMB when computational speed and flexibility become an issue. As an analogy, data frames in R are great, along with functions like aggregate(), apply(), merge(), and xtabs(). But when the tables are too large and too many, it's time to delegate the problem to a relational database. In the case of model X, which is already implemented in R, it may take half an hour or half a month to convert it to ADMB, and the payoff depends on how often the model needs to be run. In the case of model Y, which is not yet implemented, some users may be quicker to implement it in R than in ADMB, so again it would be good to have an idea about the relative performance gain. Model Z may not run at all in R, due to its size and complexity. Although I started this email with "R vs. ADMB", my daily working environment is better described as "R and ADMB". I'm a regular contributor to both R (4 packages and a couple of functions in the base packages) and ADMB (dev core team). Others have contributed R packages to interface with ADMB (http://admb-project.org/community/admb-meeting-march-29-31/InterfacingADMBwithR.pdf/at_download/file), but for many the interface is just reading and writing text files. I fully appreciate the comfort and efficiency of the R working environment, and the benefits of performing most tasks inside the same environment. But the R community has no need to be on the defense against ADMB, any more than against relational databases. If you work with computationally intensive models, I encourage you to try out ADMB (admb-project.org, free software) and hopefully end up contributing ideas and/or code. Best regards, Arni P.S. Phew. Future emails mentioning ADMB on r-help can be more brief and to the point, citing this message for details. Windows ADMB-IDE installer http://code.google.com/p/admb-project/downloads/list?q=ide*exe Windows, Linux, and Mac OS standalone http://code.google.com/p/admb-project/downloads/list?q=windows http://code.google.com/p/admb-project/downloads/list?q=linux http://code.google.com/p/admb-project/downloads/list?q=macos ADMB modes for various editors, including Emacs and Vim http://admb-project.org/community/editing-tools
It has been brought to my attention that the 1997 benchmark was updated in 2006, using R instead of S-Plus. and a newer computer, obviously. The result was that ADMB converged more than 1000 times faster than R. The model is an ecological population model, not unlike Gove et al. (2002). Anyone should be able to replicate the results, since the updated http://admb-project.org/community/benchmarks/optimization page has links to the ADMB code, R code, and the input data. The 2006 benchmark used the R optimizer nlminb(), so there are many other R optimizers that could be benchmarked today. Any improvements are of course welcomed, although the current discussion may be focusing on the similar Gove et al. (2002) example. I should also correct what I said about ADMB doing "optimization and nothing else", since ADMB features like random effects and MCMC are beyond the standard features of optimizers. Arni