Hi all, Have a problem. Trying to read in a data set that has about 112,000,000 rows and 8 columns and obviously enough it was too big for R to handle. The columns are mode up of 2 integer columns and 6 logical columns. The text file is about 4.2 Gb in size. Also I have 4 Gb of RAM and 218 Gb of available space on the hard drive. I tried the dumpDF function but it was too big. Also tried bring in the data is 10 sets of about 12,000,000. Are there are other ways of getting around the size of the data. Regards, Lorcan [[alternative HTML version deleted]]
First of all, try to determine the smallest file you can read with an empty workspace. Once you have done that, then break up your file into that size sets and read them in. The next question is what do you want to do with 112M rows of data. Can you process them a set a time and then aggregate the results. I have no problem in reading in files with 10M rows on a 32-bit version of R on Windows with 3GB of memory. So a little more information on "what is the problem you are trying to solve" would be useful. On Mon, Jul 23, 2012 at 8:02 AM, Lorcan Treanor <lorcan.treanor at idiro.com> wrote:> Hi all, > > Have a problem. Trying to read in a data set that has about 112,000,000 > rows and 8 columns and obviously enough it was too big for R to handle. The > columns are mode up of 2 integer columns and 6 logical columns. The text > file is about 4.2 Gb in size. Also I have 4 Gb of RAM and 218 Gb of > available space on the hard drive. I tried the dumpDF function but it was > too big. Also tried bring in the data is 10 sets of about 12,000,000. Are > there are other ways of getting around the size of the data. > > Regards, > > Lorcan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
HI, You can try dbLoad() from hash package.? Not sure whether it will be successful. A.K. ----- Original Message ----- From: Lorcan Treanor <lorcan.treanor at idiro.com> To: r-help at r-project.org Cc: Sent: Monday, July 23, 2012 8:02 AM Subject: [R] Large data set Hi all, Have a problem. Trying to read in a data set that has about 112,000,000 rows and 8 columns and obviously enough it was too big for R to handle. The columns are mode up of 2 integer columns and 6 logical columns. The text file is about 4.2 Gb in size. Also I have 4 Gb of RAM and 218 Gb of available space on the hard drive. I tried the dumpDF function but it was too big. Also tried bring in the data is 10 sets of about 12,000,000. Are there are other ways of getting around the size of the data. Regards, Lorcan ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.