Hi, I am mediocre at R, maybe 1000 hours experience, but I received an 8GB dataset and I don't know what to do with it. I have to do extensive analysis over it for my Honours thesis. I can't even import it. I've tried; - Splitting it up using the free csv-splitter-1.1.zip that seems to be working for everyone else (it doesn't work for me, it just outputs 1 single line). - Splitting it with Text Splitter doesn't work because you have to load it into memory first. - Importing using BigMemory's big.matrix(), however my computer just freezes. - Importing using ff's read.table.ffdf(), however I get the error message " in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5 did not have 9 elements" Thanks for any ideas and assistance. Can R do this on a computer with 4 GB of memory and a dual core i5xx ? ----- ---- Isaac Research Assistant Quantitative Finance Faculty, UTS -- View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4500971.html Sent from the R help mailing list archive at Nabble.com.
Despair not! Malcom Gladwell would say you are 1/10 of the way to becoming the next MozaRt! You need to say how your data set is designed. Your problem with ff seems to be that the lines are not of constant length: if they aren't of a consistent CSV format, I wouldn't be surprised if a CSV splitter had problems with them as well. If you are on a Unix-alike system, this (the splitting) could be pretty easily done with awk/sed/perl, but you need to define your problem much more clearly. If things aren't nicely structured, you will almost certainly benefit from doing a little bit of data preparation work with Unix utilities before loading into R. Michael On Sat, Mar 24, 2012 at 4:08 AM, iliketurtles <isaacm200 at gmail.com> wrote:> Hi, > > I am mediocre at R, maybe 1000 hours experience, but I received an 8GB > dataset and I don't know what to do with it. I have to do extensive analysis > over it for my Honours thesis. > > I can't even import it. I've tried; > - Splitting it up using the free csv-splitter-1.1.zip that seems to be > working for everyone else (it doesn't work for me, it just outputs 1 single > line). > - Splitting it with Text Splitter doesn't work because you have to load it > into memory first. > - Importing using BigMemory's big.matrix(), however my computer just > freezes. > - Importing using ff's read.table.ffdf(), however I get the error message > " in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?line 5 did not have 9 elements" > > Thanks for any ideas and assistance. > > Can R do this on a computer with 4 GB of memory and a dual core i5xx ? > > ----- > ---- > > Isaac > Research Assistant > Quantitative Finance Faculty, UTS > -- > View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4500971.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks to all the suggestions. To the first individual that replied, I can't do any stuff with unix or perl. All I know is R. @KEN: I'm using Windows 7, 64 bit. @Steve: Here's the readLines output.. As we can see, lines 1-3 are empty and line 5 is empty, and there's also empty elements after line 5!. [1] " " [2] " " [3] " " [4] " PERMNO DATE TICKER PERMCO PRC VOL NUMTRD vwretd ewretd" [5] "" [6] " 10000 06/01/1986 7952 . . . -0.000138 0.001926" [7] " 10000 07/01/1986 OMFGA 7952 -2.56250 1000 . 0.013809 0.011061" [8] " 10000 08/01/1986 OMFGA 7952 -2.50000 12800 . -0.020744 -0.005117" [9] " 10000 09/01/1986 OMFGA 7952 -2.50000 1400 . -0.011219 -0.011588" [10] " 10000 10/01/1986 OMFGA 7952 -2.50000 8500 . 0.000083 0.003651" [11] " 10000 13/01/1986 OMFGA 7952 -2.62500 5450 . 0.002749 0.002433" ----- ---- Isaac Research Assistant Quantitative Finance Faculty, UTS -- View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html Sent from the R help mailing list archive at Nabble.com.
On 24/03/12 09:08, iliketurtles wrote:> Hi, > > I am mediocre at R, maybe 1000 hours experience, but I received an 8GB > dataset and I don't know what to do with it. I have to do extensive analysis > over it for my Honours thesis. > > I can't even import it. I've tried; > - Splitting it up using the free csv-splitter-1.1.zip that seems to be > working for everyone else (it doesn't work for me, it just outputs 1 single > line). > - Splitting it with Text Splitter doesn't work because you have to load it > into memory first. > - Importing using BigMemory's big.matrix(), however my computer just > freezes. > - Importing using ff's read.table.ffdf(), however I get the error message > " in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 5 did not have 9 elements" > > Thanks for any ideas and assistance.1) you should look if you really need to load the complete dataset - you might be able to load a subset, sample it for the analysis, discard columns, ... There are many things possible 2) With csv files this size, it usually pays off to covert them into a database - sqlite coming to mind as an easy to use one with sql support to select columns and rows to load. sqlite has a tool to import a csv file into a sqlite database. Concerning the general format of the csv, see the other suggestions. Cheers, Rainer> > Can R do this on a computer with 4 GB of memory and a dual core i5xx ? > > ----- > ---- > > Isaac > Research Assistant > Quantitative Finance Faculty, UTS > -- > View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4500971.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug