Dear R-list, maybe some of you could point me in the right direction: Are you aware of any FREE Fortran or Java libraries/actual pieces of code that are VERY efficient (time-wise) in running the regular linear least-squares multiple regression? More specifically, I have to run small regression models (between 1 and 15 predictors) on samples of up to N=700 but thousands and thousands of them. I am designing a simulation in R and running those regressions and R itself is way too slow. So, I am thinking of compiling the regression run itself in Fortran and Java and then calling it from R. Thank you very much for any advice! Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
I would test the speed before making such as assumption. Note that lm.fit is faster than lm and if they have the same x matrix then you can do many in one call by having y be a matrix. On Mon, Sep 8, 2008 at 12:05 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Dear R-list, > maybe some of you could point me in the right direction: > > Are you aware of any FREE Fortran or Java libraries/actual pieces of > code that are VERY efficient (time-wise) in running the regular linear > least-squares multiple regression? > More specifically, I have to run small regression models (between 1 > and 15 predictors) on samples of up to N=700 but thousands and > thousands of them. > > I am designing a simulation in R and running those regressions and R > itself is way too slow. So, I am thinking of compiling the regression > run itself in Fortran and Java and then calling it from R. > > Thank you very much for any advice! > > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Are you sure R's ways are not fast enough (there are many layers underneath lm)? For an example of how you might do this at C/Fortran level, see the function lqs() in MASS. On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:> Dear R-list, > maybe some of you could point me in the right direction: > > Are you aware of any FREE Fortran or Java libraries/actual pieces of > code that are VERY efficient (time-wise) in running the regular linear > least-squares multiple regression?A lot of the effort is in getting the right answer fast, including for e.g. collinear inputs.> More specifically, I have to run small regression models (between 1 > and 15 predictors) on samples of up to N=700 but thousands and > thousands of them. > > I am designing a simulation in R and running those regressions and R > itself is way too slow. So, I am thinking of compiling the regression > run itself in Fortran and Java and then calling it from R.I think Java is unlikely to be fast compared to the Fortran R itself uses. Have you profiled to find where the time is really being spent (both R and C/Fortran profiling if necessary).> > Thank you very much for any advice! > > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you for reminding me, Gabor. I forgot to mention: So far, I have run one test set of regressions using lm. It took R 270 sec. I need to run 1,800,000 of those, which would imply 15.4 years of computing time :) I have not done the same for lm.fit because I am not sure how to get model R squared from lm.fit. Dimitri On Mon, Sep 8, 2008 at 12:17 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> I would test the speed before making such as assumption. Note that > lm.fit is faster than lm and if they have the same x matrix then > you can do many in one call by having y be a matrix. > > On Mon, Sep 8, 2008 at 12:05 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote: >> Dear R-list, >> maybe some of you could point me in the right direction: >> >> Are you aware of any FREE Fortran or Java libraries/actual pieces of >> code that are VERY efficient (time-wise) in running the regular linear >> least-squares multiple regression? >> More specifically, I have to run small regression models (between 1 >> and 15 predictors) on samples of up to N=700 but thousands and >> thousands of them. >> >> I am designing a simulation in R and running those regressions and R >> itself is way too slow. So, I am thinking of compiling the regression >> run itself in Fortran and Java and then calling it from R. >> >> Thank you very much for any advice! >> >> Dimitri Liakhovitski >> MarketTools, Inc. >> Dimitri.Liakhovitski at markettools.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
Try: sum(lm.fit(x, y)$residuals^2) On Mon, Sep 8, 2008 at 12:52 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Thank you for reminding me, Gabor. I forgot to mention: So far, I have > run one test set of regressions using lm. It took R 270 sec. I need to > run 1,800,000 of those, which would imply 15.4 years of computing time > :) > > I have not done the same for lm.fit because I am not sure how to get > model R squared from lm.fit. > > Dimitri > > On Mon, Sep 8, 2008 at 12:17 PM, Gabor Grothendieck > <ggrothendieck at gmail.com> wrote: >> I would test the speed before making such as assumption. Note that >> lm.fit is faster than lm and if they have the same x matrix then >> you can do many in one call by having y be a matrix. >> >> On Mon, Sep 8, 2008 at 12:05 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote: >>> Dear R-list, >>> maybe some of you could point me in the right direction: >>> >>> Are you aware of any FREE Fortran or Java libraries/actual pieces of >>> code that are VERY efficient (time-wise) in running the regular linear >>> least-squares multiple regression? >>> More specifically, I have to run small regression models (between 1 >>> and 15 predictors) on samples of up to N=700 but thousands and >>> thousands of them. >>> >>> I am designing a simulation in R and running those regressions and R >>> itself is way too slow. So, I am thinking of compiling the regression >>> run itself in Fortran and Java and then calling it from R. >>> >>> Thank you very much for any advice! >>> >>> Dimitri Liakhovitski >>> MarketTools, Inc. >>> Dimitri.Liakhovitski at markettools.com >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > -- > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com >
Although I along with the other believe there probably is an efficient R solution, the answer to your direct question can perhaps be found at http://www.fortran.com/. The free GNU G95 fortran compiler is at http://www.g95.org/ Joe -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri Liakhovitski Sent: Monday, September 08, 2008 11:05 AM To: R-Help List Subject: [R] Question about multiple regression Dear R-list, maybe some of you could point me in the right direction: Are you aware of any FREE Fortran or Java libraries/actual pieces of code that are VERY efficient (time-wise) in running the regular linear least-squares multiple regression? More specifically, I have to run small regression models (between 1 and 15 predictors) on samples of up to N=700 but thousands and thousands of them. I am designing a simulation in R and running those regressions and R itself is way too slow. So, I am thinking of compiling the regression run itself in Fortran and Java and then calling it from R. Thank you very much for any advice! Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks a lot, everybody! On Mon, Sep 8, 2008 at 3:11 PM, Lucke, Joseph F <Joseph.F.Lucke at uth.tmc.edu> wrote:> Although I along with the other believe there probably is an efficient R > solution, the answer to your direct question can perhaps be found at > http://www.fortran.com/. The free GNU G95 fortran compiler is at > http://www.g95.org/ > Joe > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Dimitri Liakhovitski > Sent: Monday, September 08, 2008 11:05 AM > To: R-Help List > Subject: [R] Question about multiple regression > > Dear R-list, > maybe some of you could point me in the right direction: > > Are you aware of any FREE Fortran or Java libraries/actual pieces of > code that are VERY efficient (time-wise) in running the regular linear > least-squares multiple regression? > More specifically, I have to run small regression models (between 1 and > 15 predictors) on samples of up to N=700 but thousands and thousands of > them. > > I am designing a simulation in R and running those regressions and R > itself is way too slow. So, I am thinking of compiling the regression > run itself in Fortran and Java and then calling it from R. > > Thank you very much for any advice! > > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
Hi Dimitri, On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote:> Dear R-list, > maybe some of you could point me in the right direction: > > Are you aware of any FREE Fortran or Java libraries/actual pieces of > code that are VERY efficient (time-wise) in running the regular linear > least-squares multiple regression?You almost certainly want the LAPACK fortran libraries, avail at http://www.netlib.org/lapack/ ...the function of interest to you is probably called "dgels": http://www.netlib.org/lapack/explore-html/dgels.f.html ...of course, this runs faster if you have a fast BLAS library installed. These exist in many forms, and may already be installed on your system. --Adam> More specifically, I have to run small regression models (between 1 > and 15 predictors) on samples of up to N=700 but thousands and > thousands of them. > > I am designing a simulation in R and running those regressions and R > itself is way too slow. So, I am thinking of compiling the regression > run itself in Fortran and Java and then calling it from R. > > Thank you very much for any advice! > > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >