Hello All, I have a very large data set (1.1GB) that I am trying to read into R. The file is tab delimited and contains headers; there are over 800 columns and almost 700,000 rows. I am using the Ubuntu 7.10 Gutsy Gibbon version of R. I am using Kernel Linux 2.6.22-14-generic. I have 3.1GB of RAM with the AMD Athlon(tm) 64 Processor 3200+. I downloaded R using the instructions from cran under Linux-Ubuntu. I need to be able to read the whole data set into R, but when I try right now, it will only use 4.2GB of the swap space (50% of the 8.5GB currently available) and won't go any further. I am new to Linux, but anxious to learn. Is there a memory constraint with this build of R? or is this something that can be fixed with hardware (like more RAM)? I thought that a 64bit version of R would be able to handle data of this magnitude. Is there a different version of Linux that is better for reading in large data sets such as this one? I know that databases can be used for large data, but i need run discriminant analysis or randomForest on all of the variables. Any of your suggestions would be very much appreciated. Sincerely, Randy Griffiths [[alternative HTML version deleted]]
What type of data do you have? Will it be numeric or factors? If it is all numeric, then you will need over 4GB just to hold one copy of the object (700,000 * 800 * 8). That is to hold the final object; I don't know how much additional space is required during the processing. What are you going to do with all of it at once? Can you read it in in parts and store it in a database and then just retieve the columns you need for processing? So your machine is probably not large enough to hold a single copy and you would have to be using a 64 - bit version of R. On 3/4/08, Randy Griffiths <rgriff77 at gmail.com> wrote:> Hello All, > > I have a very large data set (1.1GB) that I am trying to read into R. The > file is tab delimited and contains headers; there are over 800 columns and > almost 700,000 rows. I am using the Ubuntu 7.10 Gutsy Gibbon version of R. I > am using Kernel Linux 2.6.22-14-generic. I have 3.1GB of RAM with the AMD > Athlon(tm) 64 Processor 3200+. I downloaded R using the instructions from > cran under Linux-Ubuntu. > > I need to be able to read the whole data set into R, but when I try right > now, it will only use 4.2GB of the swap space (50% of the 8.5GB currently > available) and won't go any further. I am new to Linux, but anxious to > learn. Is there a memory constraint with this build of R? or is this > something that can be fixed with hardware (like more RAM)? I thought that a > 64bit version of R would be able to handle data of this magnitude. Is there > a different version of Linux that is better for reading in large data sets > such as this one? > > I know that databases can be used for large data, but i need run > discriminant analysis or randomForest on all of the variables. > > Any of your suggestions would be very much appreciated. > > Sincerely, > > Randy Griffiths > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
A 64-bit version of R would be able to handle this (preferably with more RAM), but you don't appear to have one. Check what .Machine$size.pointer says: I expect 4. On Tue, 4 Mar 2008, Randy Griffiths wrote:> Hello All, > > I have a very large data set (1.1GB) that I am trying to read into R. The > file is tab delimited and contains headers; there are over 800 columns and > almost 700,000 rows. I am using the Ubuntu 7.10 Gutsy Gibbon version of R. I > am using Kernel Linux 2.6.22-14-generic. I have 3.1GB of RAM with the AMD > Athlon(tm) 64 Processor 3200+. I downloaded R using the instructions from > cran under Linux-Ubuntu.That's too vague. Do you have an ix86 or x86_66 OS? I see i386 and amd64 builds on that page: which did you install?> I need to be able to read the whole data set into R, but when I try right > now, it will only use 4.2GB of the swap space (50% of the 8.5GB currently > available) and won't go any further. I am new to Linux, but anxious to > learn. Is there a memory constraint with this build of R? or is this > something that can be fixed with hardware (like more RAM)? I thought that a > 64bit version of R would be able to handle data of this magnitude. Is there > a different version of Linux that is better for reading in large data sets > such as this one? > > I know that databases can be used for large data, but i need run > discriminant analysis or randomForest on all of the variables. > > Any of your suggestions would be very much appreciated. > > Sincerely, > > Randy Griffiths > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595