Dimitri Liakhovitski
2010-Aug-03 17:10 UTC
[R] limits of a data frame size for reading into R
I understand the question I am about to ask is rather vague and depends on the task and my PC memory. However, I'll give it a try: Let's assume the goal is just to read in the data frame into R and then do some simple analyses with it (e.g., multiple regression of some variables onto some - just a few - variables). Is there a limit to the number of columns of a data frame that R can handle? I am asking because where I work many use SAS and they are running into the limit of >~13,700columns there. Since I am asking - is there a limit to the number of rows? Or is the correct way of asking the question: my PC's memory is X. The .txt tab-delimited file I am trying to read in has the size of YYY Mb, can I read it in? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com
You probably don't want an object that is larger than about 25% of the physical memory so that copies can be made during some processing. If you are running on a 32-bit system which will limit you to at most 3GB of memory, then your largest object should not be greater than 800MB. If you want to have 13,700 columns of numeric data (takes 8 bytes per element), then each row would require about 100KB and that would mean you would probably have an object with about 8000 rows. 64-bit is probably limited by how much you want to spend for memory. On Tue, Aug 3, 2010 at 1:10 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> I understand the question I am about to ask is rather vague and > depends on the task and my PC memory. However, I'll give it a try: > > Let's assume the goal is just to read in the data frame into R and > then do some simple analyses with it (e.g., multiple regression of > some variables onto some - just a few - variables). > > Is there a limit to the number of columns of a data frame that R can > handle? I am asking because where I work many use SAS and they are > running into the limit of >~13,700columns there. > > Since I am asking - is there a limit to the number of rows? > > Or is the correct way of asking the question: my PC's memory is X. The > .txt tab-delimited file I am trying to read in has the size of YYY Mb, > can I read it in? > > Thanks a lot! > > -- > Dimitri Liakhovitski > Ninah Consulting > www.ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:> I understand the question I am about to ask is rather vague and > depends on the task and my PC memory. However, I'll give it a try: > > Let's assume the goal is just to read in the data frame into R and > then do some simple analyses with it (e.g., multiple regression of > some variables onto some - just a few - variables). > > Is there a limit to the number of columns of a data frame that R can > handle? I am asking because where I work many use SAS and they are > running into the limit of >~13,700columns there. > > Since I am asking - is there a limit to the number of rows? > > Or is the correct way of asking the question: my PC's memory is X. The > .txt tab-delimited file I am trying to read in has the size of YYY Mb, > can I read it in? >Besides what Jim said, there is a 2^31-1 limit on the number of elements in a vector. Dataframes are vectors of vectors, so you can have at most 2^31-1 rows and 2^31-1 columns. Matrices are vectors, so they're limited to 2^31-1 elements in total. This is only likely to be a limitation on a 64 bit machine; in 32 bits you'll run out of memory first. Duncan Murdoch
Maybe Matching Threads
- weird problem - R is not finding the data for the factor level present in the data
- Code is too slow: mean-centering variables in a data frame by subgroup
- Suppressing printing in the function
- transposing a data frame from horizontal to vertical (stacking)
- preventing repeat in "paste"