dave fournier
2006-Nov-24 21:06 UTC
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
There has recently been some discussion on the list about AD Model builder and the suitability of R for constructing the types of models used in fisheries management. https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html I think that many R users understimate the numerical challenges that some of the typical nonlinear statistical model used in different fields present. R may not be a suitable platform for development for such models. Around 10 years ago John Schnute, Laura Richards, and Norm Olsen with Canadian federal fisheries undertook an investigation comparing various statistical modeling packages for a simple age-structured statistical model of the type commonly used in fisheries. They compared AD Mdel Builder, Gauss, Matlab, and Splus. Unfortunately a working model could not be produced with Splus so its times could not be included in the comparison. It is possible to produce a working model with the present day version of R so that R can now be directly compared with AD Model Builder for this type of model. I have put the results of the test together with the original Schnute and Richards paper and the working R and AD Model Builder codes on Otter's web site http://otter-rsch.ca/tresults.htm The results are that AD Model builder is roughly 1000 times faster than R for this problem. ADMB takes about 2 seconds to converge while R takes over 90 minutes. This is a simple toy example. Real fisheries models are often hundred of times more computationally intensive as this one. Cheers, Dave ~ -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com
Mike Prager
2006-Nov-24 22:06 UTC
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
dave fournier <otter at otter-rsch.com> wrote:> I think that many R users understimate the numerical challenges > that some of the typical nonlinear statistical model used in different > fields present. R may not be a suitable platform for development for > such models. > > Around 10 years ago John Schnute, Laura Richards, and Norm Olsen > with Canadian federal fisheries undertook an investigation > comparing various statistical modeling packages for a simple > age-structured statistical model of the type commonly used in > fisheries. [...] It is possible > to produce a working model with the present day version of R so that > R can now be directly compared with AD Model Builder for this type of model. > > The results are that AD Model builder is roughly 1000 times faster than > R for this problem. ADMB takes about 2 seconds to converge while > R takes over 90 minutes.Our group's experiences reflect, at least qualitatively, what Dave says above. We use R for analyzing results from models written in his AD Model Builder, and a couple of years ago, we started programming one of our models directly in R. We quickly abandoned that idea because of lengthy execution time under R. That is not a judgement of either piece of software. R and ADMB are designed for different types of task, and it seems to me that they complement each other well. That experience was in part the genesis of our X2R software (now at CRAN -- pardon the plug), which saves results from ADMB models into a format that R can read as a list. We feel that now we have the best of both worlds -- fast execution with ADMB, followed by the programming ease and excellent graphics of R for analysis of results and projections under numerous scenarios. -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement.
Tony Plate
2006-Nov-24 22:19 UTC
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
Did you try supplying gradient information to nlminb? (I note that nlminb is used for the optimization, but I don't see any gradient information supplied to it.) I would suspect that supplying gradient information would greatly speed up the computation (as you note in comments at http://otter-rsch.ca/tresults.htm.) I'm curious -- when you say "R may not be a suitable platform for development for such models", what aspect of R do you feel is lacking? Is it the specific optimization routines available, or is it some other more general aspect? Also, another optimization algorithm available in R is the "L-BFGS-B" method for optim() in the MASS package. I've had extremely good experiences with using this code in S-PLUS. It can take box constraints, and can use gradient information. It is my first choice for most optimization problems, and I believe it is very widely used. Did you try using that optimization routine with this problem? -- Tony Plate dave fournier wrote:> There has recently been some discussion on the list about > AD Model builder and the suitability of R for constructing the > types of models used in fisheries management. > > https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html > > https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html > > I think that many R users understimate the numerical challenges > that some of the typical nonlinear statistical model used in different > fields present. R may not be a suitable platform for development for > such models. > > Around 10 years ago John Schnute, Laura Richards, and Norm Olsen > with Canadian federal fisheries undertook an investigation > comparing various statistical modeling packages for a simple > age-structured statistical model of the type commonly used in > fisheries. They compared AD Mdel Builder, Gauss, Matlab, and > Splus. Unfortunately a working model could not be produced with Splus > so its times could not be included in the comparison. It is possible > to produce a working model with the present day version of R so that > R can now be directly compared with AD Model Builder for this type of model. > > I have put the results of the test together with the original > Schnute and Richards paper and the working R and AD Model Builder > codes on Otter's web site > > http://otter-rsch.ca/tresults.htm > > The results are that AD Model builder is roughly 1000 times faster than > R for this problem. ADMB takes about 2 seconds to converge while > R takes over 90 minutes. > > This is a simple toy example. Real fisheries models are often hundred of > times more computationally intensive as this one. > > Cheers, > > Dave > ~
dave fournier
2006-Nov-24 22:33 UTC
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
Dave > Did you try supplying gradient information to nlminb? (I note that nlminb is used for the optimization, but I don't see any gradient information supplied to it.) I would suspect that supplying gradient information would greatly speed up the computation (as you note in comments at http://otter-rsch.ca/tresults.htm.) Actually you should probably ask Norm Olsen these questions. I am not proficient in R and am just using his code. However I can say that providing derivatives for such a model is a highly nontrivial exercise. As I said in my posting, the R script and data are available to anyone who feels that the exercise was not carried out properly and would like to improve on it. Also one does not need to provide derivatives to the AD Model Builder program. Finally suppose that you are very good at calculating derivatives and manage to get them right. Then someone else comes along who wants to modify the model. Unless they are also very good at calculating derivatives there will be trouble. > > I'm curious -- when you say "R may not be a suitable platform for development for such models", what aspect of R do you feel is lacking? Is it the specific optimization routines available, or is it some other more general aspect? 2 seconds vs 90 minutes. For a real problem of tihs type the timings would probably be something like 10 minutes vs more than 2,700 minutes. > > Also, another optimization algorithm available in R is the "L-BFGS-B" method for optim() in the MASS package. I've had extremely good experiences with using this code in S-PLUS. It can take box constraints, and can use gradient information. It is my first choice for most optimization problems, and I believe it is very widely used. Did you try using that optimization routine with this problem? > > -- Tony Plate > > dave fournier wrote: >> There has recently been some discussion on the list about >> AD Model builder and the suitability of R for constructing the >> types of models used in fisheries management. >> >> https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html >> >> https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html >> >> I think that many R users understimate the numerical challenges >> that some of the typical nonlinear statistical model used in different >> fields present. R may not be a suitable platform for development for >> such models. >> >> Around 10 years ago John Schnute, Laura Richards, and Norm Olsen >> with Canadian federal fisheries undertook an investigation >> comparing various statistical modeling packages for a simple >> age-structured statistical model of the type commonly used in >> fisheries. They compared AD Mdel Builder, Gauss, Matlab, and >> Splus. Unfortunately a working model could not be produced with Splus >> so its times could not be included in the comparison. It is possible >> to produce a working model with the present day version of R so that >> R can now be directly compared with AD Model Builder for this type of model. >> >> I have put the results of the test together with the original >> Schnute and Richards paper and the working R and AD Model Builder >> codes on Otter's web site >> >> http://otter-rsch.ca/tresults.htm >> >> The results are that AD Model builder is roughly 1000 times faster than >> R for this problem. ADMB takes about 2 seconds to converge while >> R takes over 90 minutes. >> >> This is a simple toy example. Real fisheries models are often hundred of >> times more computationally intensive as this one. >> >> Cheers, >> >> Dave >> ~ > -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com -- David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com
H. Skaug
2006-Nov-26 11:02 UTC
[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder
Spencer, I tried the mixed effects approach you suggest using the random effects module of AD Model Builder: (http://www.otter-rsch.ca/admbre/admbre.html). What are 94 unbounded parameters in Schnute et al (1998), now become realizations of a Gaussian random variable, with the corresponding standard deviation being estimated as a parameter. The approach works, but the computation time is increased substantially. This is however understandable as the computational problem is a very different one. The likelihood function now involves an integral in dimension 94, which I believe cannot be broken into a product of lower dimensional integrals as is usual for clustered data (the reason being the recursive nature of the population dynamics). hans _______________________________ Spencer Graves wrote:> Have you considered nonlinear mixed effects models for the types >of problems considered in the comparison paper you cite? Those >"benchmark trials" consider "T years of data ... for A age classes and >the total number of parameters is m = T+A+5". Without knowing more >about the problem, I suspect that the T year parameters and the A age >class parameters might be better modeled as random effects. If this >were done, the optimization problem would then involve 7 parameters, the >5 fixed-effect parameters suggested by the computation of "m" plus two >variance parameters, one for the random "year" effects and another for >the random "age class" effect. This would replace the problem of >maximizing, e.g., a likelihood over T+A+5 parameters with one of >maximizing a marginal likelihood over 2+5 parameters after integrating >out the T and A random effects. > > These integrations may not be easy, and I might stick with the >fixed-effects solution if I couldn't get answers in the available time >using a model I thought would be theoretically more appropriate. Also, >I might use the fixed-effects solution to get starting values for an >attempt to maximize a more appropriate marginal likelihood. For the >latter, I might first try 'nlmle'. If that failed, I might explore >Markov Chain Monte Carlo (MCMC). I have not done MCMC myself, but the >"MCMCpack" R package looks like it might make it feasible for the types >of problems considered in this comparison. The CRAN summary of that >package led me to an Adobe Acrobat version of a PPT slide presentation >that seemed to consider just this type of problem (e.g., > http://mcmcpack.wustl.edu/files/MartinQuinnMCMCpackslides.pdf). > > Have you considered that? > Hope this helps. > Spencer Graves[[alternative HTML version deleted]]