Hi there, I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) - to then delete a substantial number of rows & then convert to a .csv file. Upon the first attempt the computer crashed (at some point last night). I'm rerunning this now & am closely monitoring Processor/CPU/Memory. Apart from this crash being a computer issue alone (possibly), is R equipped to handle this much data? I read up on the FAQs page that 64-bit R can handle larger data sets than 32-bit. I'm using the read.fwf function to read in the data. I don't have access to a database program (SQL, for instance). Advice is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4457220.html Sent from the R help mailing list archive at Nabble.com.
My opinion is that you should be spending your effort on setting up a SQL engine and importing it there. If you have 32GB of RAM your current direction might work, but working with sampled data rather than population data seems pretty typical for statistical analysis. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. RHelpPlease <rrumple at trghcsolutions.com> wrote:>Hi there, >I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows >machine) >- to then delete a substantial number of rows & then convert to a .csv >file. >Upon the first attempt the computer crashed (at some point last night). > >I'm rerunning this now & am closely monitoring Processor/CPU/Memory. > >Apart from this crash being a computer issue alone (possibly), is R >equipped >to handle this much data? I read up on the FAQs page that 64-bit R can >handle larger data sets than 32-bit. > >I'm using the read.fwf function to read in the data. I don't have >access to >a database program (SQL, for instance). > >Advice is most appreciated! > > > >-- >View this message in context: >http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4457220.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi, On Thu, Mar 8, 2012 at 1:19 PM, RHelpPlease <rrumple at trghcsolutions.com> wrote:> Hi there, > I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) > - to then delete a substantial number of rows & then convert to a .csv file. > Upon the first attempt the computer crashed (at some point last night). > > I'm rerunning this now & am closely monitoring Processor/CPU/Memory. > > Apart from this crash being a computer issue alone (possibly), is R equipped > to handle this much data? ?I read up on the FAQs page that 64-bit R can > handle larger data sets than 32-bit. > > I'm using the read.fwf function to read in the data. ?I don't have access to > a database program (SQL, for instance).Keep in mind that sqlite3 is just a `install.packages('RSQLite')` away ... and this SO thread might be useful w.r.t sqlite performance and big db files: http://stackoverflow.com/questions/784173 HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On Thu, Mar 8, 2012 at 6:19 PM, RHelpPlease <rrumple at trghcsolutions.com> wrote:> Hi there, > I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) > - to then delete a substantial number of rows & then convert to a .csv file. > Upon the first attempt the computer crashed (at some point last night).If you are trying to delete a substantial number of rows as a one-off operation to get a small dataset, then you might be better filtering it with a tool like perl, awk, or sed - something that reads a line at a time, processes it, and then perhaps writes a line output. For example, suppose you only want lines where the 25th character in each line is an 'X'. Then all you do is: awk 'substr($0,25,1)=="X"' < bigfile.dat >justX.dat Here I've used awk to filter input based on a condition. It never reads in the whole file so memory usage isn't a problem. Awk for windows is available, possibly as a native version or as part of Cygwin. You could do a similar thing in R by opening a text connection to your file and reading one line at a time, writing the modified or selected lines to a new file. Barry
Gabor Grothendieck
2012-Mar-09 00:04 UTC
[R] Reading in 9.6GB .DAT File - OK with 64-bit R?
On Thu, Mar 8, 2012 at 1:19 PM, RHelpPlease <rrumple at trghcsolutions.com> wrote:> Hi there, > I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) > - to then delete a substantial number of rows & then convert to a .csv file. > Upon the first attempt the computer crashed (at some point last night). > > I'm rerunning this now & am closely monitoring Processor/CPU/Memory. > > Apart from this crash being a computer issue alone (possibly), is R equipped > to handle this much data? ?I read up on the FAQs page that 64-bit R can > handle larger data sets than 32-bit. > > I'm using the read.fwf function to read in the data. ?I don't have access to > a database program (SQL, for instance).# next line installs the sqldf package and all its dependencies including sqlite install.packages("sqldf") library(sqldf) DF <- read.csv.sql("bigfile.csv", sql = "select * from file where a> 3", ...other args...)The single line creates an sqlite database, creates an appropriate table layout for your data, reads your data into the table, performs the sql statement and then only after all that reads it into R. It then destroys the database it created. Replace "bigfile.csv" with the name of your file and where a > 3 with your condition. Also the ...other args... parts should specify the format of your file. See ?read.csv.sql and also http://sqldf.googlecode.com -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com