thr3ads.net - R help - [R] Dealing With Extremely Large Files [Sep 2008]

If this information is useful, please help other people find it:
Share via:

zerfetzen

2008-Sep-26 19:55 UTC

[R] Dealing With Extremely Large Files

Hi,
I'm sure that a large fixed width file, such as 300 million rows and 1,000
columns, is too large for R to handle on a PC, but are there ways to deal
with it?

For example, is there a way to combine some sampling method with read.fwf so
that you can read in a sample of 100,000 records, for example?

Something like this may make analysis possible.

Once analyzed, is there a way to, say, read in only x rows at a time, save
and score each subset separately, and finally append them back together?

I haven't seen any information on this, if it is possible.  Thank you for
reading, and sorry if the information was easily available and I simply
didn't find it.
-- 
View this message in context:
http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19695311.html
Sent from the R help mailing list archive at Nabble.com.

Charles C. Berry

2008-Sep-26 21:24 UTC

head link

[R] Dealing With Extremely Large Files

Try

 	RSiteSearch("biglm")

for some threads that discuss strategy for analyzing big datasets.

HTH,

Chuck


On Fri, 26 Sep 2008, zerfetzen wrote:
>
> Hi,
> I'm sure that a large fixed width file, such as 300 million rows and
1,000
> columns, is too large for R to handle on a PC, but are there ways to deal
> with it?
>
> For example, is there a way to combine some sampling method with read.fwf
so
> that you can read in a sample of 100,000 records, for example?
>
> Something like this may make analysis possible.
>
> Once analyzed, is there a way to, say, read in only x rows at a time, save
> and score each subset separately, and finally append them back together?
>
> I haven't seen any information on this, if it is possible.  Thank you
for
> reading, and sorry if the information was easily available and I simply
> didn't find it.
> -- 
> View this message in context:
http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19695311.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

jim holtman

2008-Sep-26 23:36 UTC

head link

[R] Dealing With Extremely Large Files

You can always setup a "connection" and then read in the number of
lines you need for the analysis, write out the results and then read
in the next ones.  I have also used 'filehash' to initially read in
portions of a file and then write the objects into the database.
These are quickly retrieved if I want to make subsequent passes
through the data.

A 100,000 rows will also probably tax your machine since if these are
numeric, you will need 800MB to store a since copy of the object and
you will probably need 3-4X that amount (a total of 4GB of physical
memory) if you are doing any processing that might make copies.
Hopefully you are running on a 64-bit system with lots of memory.

On Fri, Sep 26, 2008 at 3:55 PM, zerfetzen <zerfetzen at yahoo.com>
wrote:>
> Hi,
> I'm sure that a large fixed width file, such as 300 million rows and
1,000
> columns, is too large for R to handle on a PC, but are there ways to deal
> with it?
>
> For example, is there a way to combine some sampling method with read.fwf
so
> that you can read in a sample of 100,000 records, for example?
>
> Something like this may make analysis possible.
>
> Once analyzed, is there a way to, say, read in only x rows at a time, save
> and score each subset separately, and finally append them back together?
>
> I haven't seen any information on this, if it is possible.  Thank you
for
> reading, and sorry if the information was easily available and I simply
> didn't find it.
> --
> View this message in context:
http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19695311.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gabor Grothendieck

2008-Sep-27 00:56 UTC

head link

[R] Dealing With Extremely Large Files

Not sure if it applies to your file or not but if it does then the
sqldf package facilitates reading a large file into an SQLite database.
Its a front end to RSQLite which is a front end to SQLite and it
reads the data straight into the database without going through
R so R does not limit it in any way -- its only actuated from R.
The code to do this is basically just two lines of code.  You don;t
have to install database software (its included with RSQLite package)
and you don't have to set up a database at all -- it does that for you
automatically.

See example 6e on the home page which creates a database
transparently, reads in the data and extracts random rows from
the database into R:
http://sqldf.googlecode.com

On Fri, Sep 26, 2008 at 3:55 PM, zerfetzen <zerfetzen at yahoo.com>
wrote:>
> Hi,
> I'm sure that a large fixed width file, such as 300 million rows and
1,000
> columns, is too large for R to handle on a PC, but are there ways to deal
> with it?
>
> For example, is there a way to combine some sampling method with read.fwf
so
> that you can read in a sample of 100,000 records, for example?
>
> Something like this may make analysis possible.
>
> Once analyzed, is there a way to, say, read in only x rows at a time, save
> and score each subset separately, and finally append them back together?
>
> I haven't seen any information on this, if it is possible.  Thank you
for
> reading, and sorry if the information was easily available and I simply
> didn't find it.
> --
> View this message in context:
http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19695311.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Sep 2008 - Dealing With Extremely Large Files

[R] Dealing With Extremely Large Files

[R] Dealing With Extremely Large Files

[R] Dealing With Extremely Large Files

[R] Dealing With Extremely Large Files

Apparently Analagous Threads