dear R experts---now I have a case where I want to estimate very large regression models with many fixed effects---not just the mean type, but cross-fixed effects---years, months, locations, firms. Many millions of observations, a few thousand variables (most of these variables are interaction fixed effects). could someone please point me to packages, if any, that would help me estimate such models? (can these problems be split over many different cores?) advice appreciated. /iaw ---- Ivo Welch (ivo.welch@brown.edu, ivo.welch@gmail.com) CV Starr Professor of Economics (Finance), Brown University http://welch.econ.brown.edu/ [[alternative HTML version deleted]]
Hi Ivo, You might check out biglm. It is not clear to me how to parallelize a single model, but if you are running several, of course you can (but you already know that). The one thing that may help is to link R against an optimized, multithreaded BLAS such as Atlas (I think you have to do this at compile time, but I could be gravely mistaken). Another possibly very silly idea is that if you are running many models with different combinations of your variables (sort of a model selection type thing), rather than fitting the model every time, what about creating a dataset with all variables (including interactions) of interest, and calculation one huge covariances matrix and the means. Then you just fit all your models based off the covariances matrix. That could still be huge and maybe not anymore computationally efficient, but it would effectively reduce your working data from n x k to k x k (+ 1 x k for the vector of means if you care about those). Josh On May 8, 2012, at 20:30, ivo welch <ivowel at gmail.com> wrote:> dear R experts---now I have a case where I want to estimate very large > regression models with many fixed effects---not just the mean type, but > cross-fixed effects---years, months, locations, firms. Many millions of > observations, a few thousand variables (most of these variables are > interaction fixed effects). could someone please point me to packages, if > any, that would help me estimate such models? (can these problems be split > over many different cores?) > > advice appreciated. > > /iaw > > ---- > Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com) > CV Starr Professor of Economics (Finance), Brown University > http://welch.econ.brown.edu/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Your query is far too vague to answer -- probably 90% of R packages qualify. As you are an economist, obvious question: Have you looked at the CRAN econometrics task view? -- Bert On Tue, May 8, 2012 at 8:30 PM, ivo welch <ivowel at gmail.com> wrote:> dear R experts---now I have a case where I want to estimate very large > regression models with many fixed effects---not just the mean type, but > cross-fixed effects---years, months, locations, firms. ?Many millions of > observations, a few thousand variables (most of these variables are > interaction fixed effects). ?could someone please point me to packages, if > any, that would help me estimate such models? ?(can these problems be split > over many different cores?) > > advice appreciated. > > /iaw > > ---- > Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com) > CV Starr Professor of Economics (Finance), Brown University > http://welch.econ.brown.edu/ > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm