hi - i just started using R as i am trying to figure out how perform a linear regression on a huge matrix. i am sure this topic has passed through the email list before but could not find anything in the archives. i have a matrix that is 2,000,000 x 170,000 the values right now are arbitray. i try to allocate this on a x86_64 machine with 16G of ram and i get the following: > x <- matrix(0,2000000,170000); Error in matrix(0, 2e+06, 170000) : too many elements specified > is R capable of handling data of this size? am i doing it wrong? cheers paul
On Wed, Jul 14, 2010 at 4:23 PM, paul s <r-project.org at queuemail.com> wrote:> hi -> i just started using R as i am trying to figure out how perform a linear > regression on a huge matrix.> i am sure this topic has passed through the email list before but could not > find anything in the archives.> i have a matrix that is 2,000,000 x 170,000 the values right now are > arbitray.> i try to allocate this on a x86_64 machine with 16G of ram and i get the > following:>> x <- matrix(0,2000000,170000); > Error in matrix(0, 2e+06, 170000) : too many elements specifiedR stores matrices and other data objects in memory. A matrix of that size would require> 2e+06*170000*8/2^30[1] 2533.197 gigabytes of memory. Start looking for a machine with at least 5 terabytes of memory (you will need more than one copy of the matrix to be stored) or, probably easier, rethink your problem. Results from a linear regression producing 170,000 coefficient estimates are unlikely to be useful. The model matrix is essentially guaranteed to be rank deficient.>> > > is R capable of handling data of this size? am i doing it wrong? > > cheers > paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
paul s wrote:> hi - > > i just started using R as i am trying to figure out how perform a linear > regression on a huge matrix. > > i am sure this topic has passed through the email list before but could > not find anything in the archives. > > i have a matrix that is 2,000,000 x 170,000 the values right now are > arbitray. > > i try to allocate this on a x86_64 machine with 16G of ram and i get the > following: > > > x <- matrix(0,2000000,170000); > Error in matrix(0, 2e+06, 170000) : too many elements specified > > > > is R capable of handling data of this size? am i doing it wrong?A quick calculation reveals that a matrix of that size requires about 2.7 TERAbytes of storage, so I'm a bit confused as to how you might expect to fit it into 16GB of RAM... However, even with terabytes of memory, you would be running into the (current) limitation that a single vector in R can have at most 2^31-1 ca. 2 trillion elements. Yes, you could be doing it wrong, but what is "it"? If the matrix is sparse, there are sparse matrix tools around... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 14/07/2010 5:23 PM, paul s wrote:> hi - > > i just started using R as i am trying to figure out how perform a linear > regression on a huge matrix. > > i am sure this topic has passed through the email list before but could > not find anything in the archives. > > i have a matrix that is 2,000,000 x 170,000 the values right now are > arbitray. > > i try to allocate this on a x86_64 machine with 16G of ram and i get the > following: > > > x <- matrix(0,2000000,170000); > Error in matrix(0, 2e+06, 170000) : too many elements specified > > > > is R capable of handling data of this size? am i doing it wrong?It is capable of handling large data, but not that large in a single matrix. The limit on the number of entries in any vector (and matrices are stored as vectors) is about 2^31 ~ 2 billion. Your matrix needs about 340 billion entries, so it's too big. (It won't all fit in memory, either: you've only got space for 2 billion numeric values in your 16G of RAM, and you also need space for the OS, etc. But the OS can use disk space as virtual memory, so you might be able to get that much memory, it would just be very, very slow.) You need to break up the work into smaller pieces. Duncan Murdoch> > cheers > paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 07/14/2010 06:10 PM, Douglas Bates wrote:> R stores matrices and other data objects in memory. A matrix of that > size would require >> 2e+06*170000*8/2^30 > [1] 2533.197great, that is my understanding as well..> probably easier, rethink your problem.yes. i am starting to do that now as i have run into the same memory issue with my code and wanted to look at statical packages for crunching huge amounts of data.> Results from a linear regression producing 170,000 coefficient > estimates are unlikely to be useful. The model matrix is essentially > guaranteed to be rank deficient.interesting, i also stated this point as observations grew and plotting the regression had minimal impact on the characteristics, however i am working with an academic that claims all observations are needed to reflect a pure hedonic index. is this what you mean by rank deficient? http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29 thank you for your response. cheers paul
If the system is sparse and you have a really large cluster to play with, then maybe (emphasis) PETSc/TAO is the right combination of tools for your problem. http://www.mcs.anl.gov/petsc/petsc-as/ Christos _________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. [[alternative HTML version deleted]]