thr3ads.net - R help - [R] How to import BIG csv files with separate "map"? [Jul 2009]

If this information is useful, please help other people find it:
Share via:

giusto

2009-Jul-14 17:53 UTC

[R] How to import BIG csv files with separate "map"?

Hi all,

I am having problems importing a VERY large dataset in R. I have looked into
the package ff, and that seems to suit me, but also, from all the examples I
have seen, it either requires a manual creation of the database, or it needs
a read.table kind of step. Being a survey kind of data the file is big (like
20,000 times 50,000 for a total of about 1.2Gb in plain text) the memory I
have isn't enough to do a read.table and my computer freezes every time :( 

This far I have managed to import the required subset of the data by using a
"cheat": I used GRETL to read an equivalent Stata file (released by
the same
source that offered the csv file), manipulate it and export it in a format
that R can read into memory. Easy! But I am wondering, how is it possible to
do this in R entirely from scratch?

Thanks
-- 
View this message in context:
http://www.nabble.com/How-to-import-BIG-csv-files-with-separate-%22map%22--tp24484588p24484588.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2009-Jul-14 19:48 UTC

head link

[R] How to import BIG csv files with separate "map"?

Either of the following can be done in one line of code:

Using the nrows and skip arguments to read.table one
can read in a subset of rows.   Using the colClasses argument
of read.table the class "NULL" will suppress reading in the
corresponding column.

read.csv.sql from the sqldf package will create a database on
the fly, read in the data, extract it to R according to whatever
SQL statement you give to its sql argument and then destroy
the database so that you have all the flexiblity of SQL in
selecting a portion of data. See http://sqldf.googlecode.com
and the example here:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql

On Tue, Jul 14, 2009 at 1:53 PM, giusto<giusto at uoregon.edu>
wrote:>
> Hi all,
>
> I am having problems importing a VERY large dataset in R. I have looked
into
> the package ff, and that seems to suit me, but also, from all the examples
I
> have seen, it either requires a manual creation of the database, or it
needs
> a read.table kind of step. Being a survey kind of data the file is big
(like
> 20,000 times 50,000 for a total of about 1.2Gb in plain text) the memory I
> have isn't enough to do a read.table and my computer freezes every time
:(
>
> This far I have managed to import the required subset of the data by using
a
> "cheat": I used GRETL to read an equivalent Stata file (released
by the same
> source that offered the csv file), manipulate it and export it in a format
> that R can read into memory. Easy! But I am wondering, how is it possible
to
> do this in R entirely from scratch?
>
> Thanks
> --
> View this message in context:
http://www.nabble.com/How-to-import-BIG-csv-files-with-separate-%22map%22--tp24484588p24484588.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Steve Lianoglou

2009-Jul-14 19:50 UTC

head link

[R] How to import BIG csv files with separate "map"?

Hi,

On Jul 14, 2009, at 1:53 PM, giusto wrote:
>
> Hi all,
>
> I am having problems importing a VERY large dataset in R. I have  
> looked into
> the package ff, and that seems to suit me, but also, from all the  
> examples I
> have seen, it either requires a manual creation of the database, or  
> it needs
> a read.table kind of step. Being a survey kind of data the file is  
> big (like
> 20,000 times 50,000 for a total of about 1.2Gb in plain text) the  
> memory I
> have isn't enough to do a read.table and my computer freezes every  
> time :(
Look at the documentation near the end of ?read.table:

"""Note that unless colClasses is specified, all columns are read
as
character columns and then converted. This means that quotes are  
interpreted in all fields and that a column of values like "42" will  
result in an integer column."""

So all the data is read in as characters, then R tries to convert it  
to the appropriate data type (uses mucho memory).

Perhaps specifying the types of each column in the colClasses param  
can get you where you need to be.
> This far I have managed to import the required subset of the data by  
> using a
> "cheat": I used GRETL to read an equivalent Stata file (released
by
> the same
> source that offered the csv file), manipulate it and export it in a  
> format
> that R can read into memory.
I'm not sure if you're suggesting that R can read in the whole data  
file when stored in some SPSS binary format. If so, perhaps the  
colClass trick above might work.

If the read.table w/ colClasses doesn't work (and you know you can  
load the entire dataset into R via some binary format), perhaps you'll  
have to parse the file line by line by opening it with a "file(..,  
'r')" command, and using "scan" (or readChar?) to run
through the file
w/o having to load it all into memory at once.

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jul 2009 - How to import BIG csv files with separate "map"?

[R] How to import BIG csv files with separate "map"?

[R] How to import BIG csv files with separate "map"?

[R] How to import BIG csv files with separate "map"?

Possibly Parallel Threads