Hello! A question about reading large CSV files I need to analyse several files with sizes larger than 3 GB. Those files have more than 10million rows (and up to 25 million) and 9 columns. Since I don?t have a large RAM memory, I think that the ff package can really help me. I am trying to use read.csv.ffdf but I have some questions: How can I read the files in several chunks?with an automatic way of calculating the number of rows to include in each chunk? (my problem is that the files have different number of rows) For instance?. I have used read.csv.ffdf(NULL, ?file.csv?, sep="|", dec=".",header = T,row.names NULL,colClasses = c(rep("integer", 3), rep("integer", 10), rep("integer", 6))) But with this way I am reading the whole file....I would prefer to read it in chunks....but I don?t know how to read it in chunks I have read the ff documentation but I am not good with R! Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html Sent from the R help mailing list archive at Nabble.com.
Your question is not completely clear. read.csv.ffdf automatically reads in the data in chunks. You don?t have to do anything for that. You can specify the size of the chunks using the next.rows option. Jan On 03/24/2012 09:29 PM, Mav wrote:> Hello! > A question about reading large CSV files > > I need to analyse several files with sizes larger than 3 GB. Those files > have more than 10million rows (and up to 25 million) and 9 columns. Since I > don?t have a large RAM memory, I think that the ff package can really help > me. I am trying to use read.csv.ffdf but I have some questions: > > How can I read the files in several chunks?with an automatic way of > calculating the number of rows to include in each chunk? (my problem is that > the files have different number of rows) > > For instance?. I have used > read.csv.ffdf(NULL, ?file.csv?, sep="|", dec=".",header = T,row.names > NULL,colClasses = c(rep("integer", 3), rep("integer", 10), rep("integer", > 6))) > But with this way I am reading the whole file....I would prefer to read it > in chunks....but I don?t know how to read it in chunks > > I have read the ff documentation but I am not good with R! > > Thanks in advance! > > -- > View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you Jan My problem is the following: For instance, I have 2 files with different number of rows (15 million and 8 million of rows each). I would like to read the first one in chunks of 5 million each. However between the first and second chunk, I would like to analyze those first 5 million of rows, write the analysis in a new csv and then proceed to read and analyze the second chunk and so on until the third chunk. With the second file, I would like to do the same...read the first chunk, analyze it and continue to read the second and analyze it. Basically my problem is that I manage to read the files....but with so many rows...I cannot do any analyses (even filtering the rows) because of the RAM restrictions. Sorry if is still not clear. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html Sent from the R help mailing list archive at Nabble.com.