Dear R users we're having a problem reading a largish data file using read.table(). The file consists of 175000 lines of 4 floating pt numbers. Here's what happens:> dat_read.table('sst.dat')Error: memory exhausted (This is line 358 of src/main/memory.c). Cutting down the file to around 15000 lines allows read.table() to work OK. I edited the memory limits in Platform.h and re-compiled and now read.table() can manage up to around 125000 lines. #define R_VSIZE 30000000L /* 15 times original figure (Defn.h) */ #define R_NSIZE 1000000L /* 5 times original figure (Defn.h) */ #define R_PPSSIZE 100000L /* 10 times original figure (Defn.h) */ Clearly I can keep upping these values until it works, but has the side-effect of making the running R binary pretty big. What can I do? Is the answer a better memory management system ? Any help appreciated. Yours Ian -- Ian Thurlbeck http://www.stams.strath.ac.uk/ Statistics and Modelling Science, University of Strathclyde Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Ian Thurlbeck <ian at stams.strath.ac.uk> writes:> > Dear R users > > we're having a problem reading a largish data file using > read.table(). The file consists of 175000 lines of 4 > floating pt numbers. Here's what happens:...> I edited the memory limits in Platform.h and re-compiled > and now read.table() can manage up to around 125000 lines. > > #define R_VSIZE 30000000L /* 15 times original figure (Defn.h) */ > #define R_NSIZE 1000000L /* 5 times original figure (Defn.h) */ > #define R_PPSSIZE 100000L /* 10 times original figure (Defn.h) */The first two of those are settable via command line options, e.g. R -v 50 should get you a 50M memory heap.> Clearly I can keep upping these values until it works, but has > the side-effect of making the running R binary pretty big. > > What can I do? Is the answer a better memory management > system ?That wouldn't hurt, but... The actual numbers require only about 6M of storage, so the real trouble is only there during the read.table. What you could do is to read the data (in a Large process), save it to a binary file and read that into a smaller process. Another thing you could do is to switch to reading the values with scan(). Read.table() is trying to be intelligent about data types and soforth, which tends to make it inefficient on large data sets. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Ian Thurlbeck writes: > Dear R users > > we're having a problem reading a largish data file using > read.table(). The file consists of 175000 lines of 4 > floating pt numbers. Here's what happens: "read.table" uses vast amounts of memory. First it reads everything as as string and then converts to numbers. By using "scan" instead you can cut down your memory demands. If you know you have exactly 175000 observations using something like x <- scan(what=list(0,0,0,0), nmax=175000) you will cut you memory demands to near the minimum. > What can I do? Is the answer a better memory management > system ? The memory management definitely needs some work (well actually it needs a quick bullet between the eyes). When we got started on R we didn't anticipate that it would be see much use outside of teaching here at Auckland. I certainly have plans to revisit the memory management at some point and to look at some of the scalability problems too. But don't hold your breath ... Ross -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._