I'm looking for a formula for memory usage in standard regression; that is, if I have X rows with Y predictors, how much memory is needed? I'm speccing out a system, and I'd like to be able to get enough memory that we can do some fairly large regressions. ==Ed Freeman [[alternative HTML version deleted]]
The size of the model matrix X can be estimated approximately. It depends on the kind of data in the model matrix. For instance, floating points require more memory than integers (which I think is 8 bits per cell). If your model matrix is sparse, you can use some hidden functions in the matrix package for sparse model matrices and save a lot of memory in doing so, though I am not certain how to estimate memory requirements under such conditions. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of efreeman [efreeman at blarg.net] Sent: Monday, January 10, 2011 5:28 PM To: r-help at r-project.org Subject: [R] Memory Needed for Regression I'm looking for a formula for memory usage in standard regression; that is, if I have X rows with Y predictors, how much memory is needed? I'm speccing out a system, and I'd like to be able to get enough memory that we can do some fairly large regressions. ==Ed Freeman [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Jan 10, 2011, at 5:28 PM, efreeman wrote:> I'm looking for a formula for memory usage in standard regression; > that > is, if I have X rows with Y predictors, how much memory is needed? I'm > speccing out a system, and I'd like to be able to get enough memory > that we can do some fairly large regressions.figure 10-12 bytes times X * Y as the size of the matrix or dataframe and you will need 4-5 times that amount to do useful work, You can check my guesstimate on one of my objects: > object.size(set1HLI) 5907427736 bytes > nrow(set1HLI) [1] 5325006 > length(set1HLI) [1] 166 > 5907427736/5325006 [1] 1109.375 > 1109.375/166 [1] 6.682982 So I might have been a bit on the high side with my estimate for number of bytes per cell. I have a bunch of constructed factor variables that only take 4 bytes per "cell". The byte-to-cell ratio is 8 for "numeric" variables and 4 for "factor" or "integer" variables, plus variable amounts for character variables and "overhead". With my other computer activities I end up needing about 24 GB which can holds probably 10 regression models ... needing space for vectors of predicted values and residuals that are as long as the input, and they typically run around 300-500MB.> > ==Ed Freeman > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Mon, 10 Jan 2011, efreeman wrote:> I'm looking for a formula for memory usage in standard regression; that > is, if I have X rows with Y predictors, how much memory is needed? I'm > speccing out a system, and I'd like to be able to get enough memory > that we can do some fairly large regressions. >install.packages("biglm") require(biglm) Then see ?biglm "biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory." If you want to get serious about this look in Golub and Van Loan* (Sorry, my copy is not at hand so I cannot be more specific. Maybe there is a section like "Updating Matrix Factorizations" that says what is needed.) Also, see Algorithm AS274 Applied Statistics (1992) Vol.41, No. 2 which is what biglm() refers to. And maybe read the source code of biglm() if you are planning on using that package. HTH, Chuck * @book{golub1996matrix, title={{Matrix computations}}, author={Golub, G.H. and Van Loan, C.F.}, isbn={0801854148}, year={1996}, publisher={Johns Hopkins Univ Pr} }> ==Ed Freeman > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry Dept of Family/Preventive Medicine cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901