A few weeks ago I sent a note to R-help inquiring about strategies for implementing lm fitting with large datasets using one or another of the new database schemes. Having received some encouragement, but no very concrete suggestions, I decided to proceed with a rather naive implementation using RMySQL. The objective was to produce a version of lm(), say LM() that would be able to estimate a model of infant birthweight based on a sample of 2.4 Million observations on 18 variables from the 1997 U.S. Natality Survey, using only modest memory requirements, say 150Mb. Obviously, trying to work with the entire dataset as one object in a case like this would require vastly more memory, hence the "object disorientation" orientation of our approach. Alvaro Novo and I now have a working version of such an LM() function that satisfies our original objectives. It is available and described in detail at: http://www.econ.uiuc.edu/~roger/research/rq/LM.html The function provides quite a complete lm() functionality, formulae, weights, subsets, etc. Unfortunately, the interaction with MySQL is not as quick as we had hoped, so the above test case takes about 10 minutes on one of our linux boxes. This is about equivalent to what would be required to read the data in ascii using scan. We would greatly appreciate hearing from anyone who might have suggestions about alternative strategies, particularly if they might be expected to yield significant efficiency gains in getting data from MySQL into R. In fact the LM implementation is really just a stalking horse for a parallel development of a quantile regression function of this type. And for this, an efficient way of passing through the data to check signs of residuals is essential. url: http://www.econ.uiuc.edu Roger Koenker email roger at ysidro.econ.uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._