thr3ads.net - R help - [R] Reading big files in chunks-ff package [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Mav

2012-Mar-24 20:29 UTC

[R] Reading big files in chunks-ff package

Hello!
A question about reading large CSV files

I need to analyse several files with  sizes larger than 3 GB. Those files
have more than 10million rows (and up to 25 million) and 9 columns. Since I
don?t have a large RAM memory,  I think that the ff package can really help
me. I am trying to use read.csv.ffdf but I have some questions:

How can I read the files in several chunks?with an automatic way of
calculating the number of rows to include in each chunk? (my problem is that
the files have different number of rows)

For instance?. I have used
read.csv.ffdf(NULL, ?file.csv?, sep="|", dec=".",header =
T,row.names NULL,colClasses = c(rep("integer", 3),
rep("integer", 10), rep("integer",
6)))
 But with this way I am reading the whole file....I would prefer to read it
in chunks....but I don?t know  how to read it in chunks

I have read the ff documentation but I am not good with R!
 
Thanks in advance!

--
View this message in context:
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html
Sent from the R help mailing list archive at Nabble.com.

Jan van der Laan

2012-Mar-25 10:24 UTC

head link

[R] Reading big files in chunks-ff package

Your question is not completely clear. read.csv.ffdf  automatically 
reads in the data in chunks. You don?t have to do anything for that. You 
can specify the size of the chunks using the next.rows option.

Jan


On 03/24/2012 09:29 PM, Mav wrote:> Hello!
> A question about reading large CSV files
>
> I need to analyse several files with  sizes larger than 3 GB. Those files
> have more than 10million rows (and up to 25 million) and 9 columns. Since I
> don?t have a large RAM memory,  I think that the ff package can really help
> me. I am trying to use read.csv.ffdf but I have some questions:
>
> How can I read the files in several chunks?with an automatic way of
> calculating the number of rows to include in each chunk? (my problem is
that
> the files have different number of rows)
>
> For instance?. I have used
> read.csv.ffdf(NULL, ?file.csv?, sep="|", dec=".",header
= T,row.names > NULL,colClasses = c(rep("integer", 3),
rep("integer", 10), rep("integer",
> 6)))
>   But with this way I am reading the whole file....I would prefer to read
it
> in chunks....but I don?t know  how to read it in chunks
>
> I have read the ff documentation but I am not good with R!
>
> Thanks in advance!
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Mav

2012-Mar-25 17:52 UTC

head link

[R] Reading big files in chunks-ff package

Thank you Jan

My problem is the following:
For instance, I have 2 files with different number of rows (15 million and 8
million of rows each).
I would like to read the first one in chunks of 5 million each. However
between the first and second chunk, I would like to analyze those first 5
million of rows, write the analysis in a new csv and then proceed to read
and analyze the second chunk and so on until the third chunk. With the
second file, I would like to do the same...read the first chunk, analyze it
and continue to read the second and analyze it.

Basically my problem is that I manage to read the files....but with so many
rows...I cannot do any analyses (even filtering the rows) because of the RAM
restrictions.

Sorry if is still not clear.

Thank you

--
View this message in context:
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html
Sent from the R help mailing list archive at Nabble.com.

R help - Mar 2012 - Reading big files in chunks-ff package

[R] Reading big files in chunks-ff package

[R] Reading big files in chunks-ff package

[R] Reading big files in chunks-ff package