A friend of mine recently mentioned that he had painlessly imported a data file with 8 columns and 500,000 rows into matlab. When I tried the same thing in R (both Unix and Windows variants) I had little success. The Windows version hung for a very long time, until I eventually more or less ran out of virtual memory; I tried to set the proper memory allocations for the Unix version, but it never seemed satisfied :] I used read.table -- should I have used something else? Is it even possible to work with this large files? I assume a memory-mapped binary file would have been quite efficient (as opposed to an in-memory parsed text file) -- is something like that even possible in R? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, You should use scan() to read large ASCII tables. If you save a dataframe using save(), you get a binary file which works pretty fast. Note that similar problems arise if you try to save big dataframes in ASCII (you may consider my package savetable at http://www.obs.ee/~siim/savetable_0.1.0.tar.gz in order to do that). Best wishes, Ott On Wed, 28 Aug 2002, Magnus Lie Hetland wrote: |A friend of mine recently mentioned that he had painlessly imported a |data file with 8 columns and 500,000 rows into matlab. When I tried |the same thing in R (both Unix and Windows variants) I had little |success. The Windows version hung for a very long time, until I |eventually more or less ran out of virtual memory; I tried to set the |proper memory allocations for the Unix version, but it never seemed |satisfied :] | |I used read.table -- should I have used something else? Is it even |possible to work with this large files? I assume a memory-mapped |binary file would have been quite efficient (as opposed to an |in-memory parsed text file) -- is something like that even possible in |R? | | -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 28 Aug 2002, Magnus Lie Hetland wrote:> A friend of mine recently mentioned that he had painlessly imported a > data file with 8 columns and 500,000 rows into matlab. When I tried > the same thing in R (both Unix and Windows variants) I had little > success. The Windows version hung for a very long time, until I > eventually more or less ran out of virtual memory; I tried to set the > proper memory allocations for the Unix version, but it never seemed > satisfied :]That's not big: if numeric it is a 32Mb object. People do do that quite often (on machines with 512Mb or more, but memory is cheap). So it is hard to know what the problem is, but ?read.table gives some hints (including using scan()). I've just done an experiment. I generated 4m rnorms, made a matrix, wrote them out. Then. AA <- read.table("foo.dat", nrows=5e5, comment.char="", colClasses=rep("numeric", 8), header=T) worked for me in about 20secs, using less than 150Mb. That was painless, and all the speed-ups are documented in ?read.table.> I used read.table -- should I have used something else? Is it even > possible to work with this large files? I assume a memory-mapped > binary file would have been quite efficient (as opposed to an > in-memory parsed text file) -- is something like that even possible in > R?Certainly possible to read binary files. That's what load/save do, and see ?readBin to read binary files written by other formats. Having a file that size in memory is not a problem. Doing useful analyses may be (especially in Matlab). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
..perhaps you have more access when you use the Rodbc Package and MySQL as Data-Store !? channel <- odbcConnect("db_name","login","pass") data <- sqlFetch(channel, data, errors = TRUE, as = "data frame", nullstring = "sysmis", na.strings = "NA") ..or sqlQuery ! good luck ;-) regards,christian Magnus Lie Hetland <magnus at hetland.org> schrieb am 28.08.02 07:44:33:> A friend of mine recently mentioned that he had painlessly imported a > data file with 8 columns and 500,000 rows into matlab. When I tried > the same thing in R (both Unix and Windows variants) I had little > success. The Windows version hung for a very long time, until I > eventually more or less ran out of virtual memory; I tried to set the > proper memory allocations for the Unix version, but it never seemed > satisfied :] > > I used read.table -- should I have used something else? Is it even > possible to work with this large files? I assume a memory-mapped > binary file would have been quite efficient (as opposed to an > in-memory parsed text file) -- is something like that even possible in > R? > > -- > Magnus Lie Hetland The Anygui Project > http://hetland.org http://anygui.org > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, Aug 28, 2002 at 05:38:50AM +0200, Magnus Lie Hetland wrote:> A friend of mine recently mentioned that he had painlessly imported a > data file with 8 columns and 500,000 rows into matlab. When I tried > the same thing in R (both Unix and Windows variants) I had little > success. The Windows version hung for a very long time, until I > eventually more or less ran out of virtual memory; I tried to set the > proper memory allocations for the Unix version, but it never seemed > satisfied :] > > I used read.table -- should I have used something else? Is it eventry 'scan'> possible to work with this large files? I assume a memory-mapped > binary file would have been quite efficient (as opposed to an > in-memory parsed text file) -- is something like that even possible in > R? >have a look at the package 'rhdf5' (at least at www.bioconductor.org, not sure it's on CRAN). Not exactly what you describe but could be relevant. Regards, L.> -- > Magnus Lie Hetland The Anygui Project > http://hetland.org http://anygui.org > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._