thr3ads.net - R help - [R] reading large csv data sets efficiently [May 2013]

If this information is useful, please help other people find it:
Share via:

ivo welch

2013-May-22 16:31 UTC

[R] reading large csv data sets efficiently

I have a couple of large data sets, on the order of 4GB.  they come in .csv
files, with about 50 columns and lots of rows.  a couple have weird NA
values, such as "C" and "B", in numeric columns.

I am wondering how good read.csv() is dealing with this stuff on the first
pass.

d<-(read.csv("t.csv", colClasses=c(NA, NA, "NULL",
"NULL",
"numeric","numeric", "numeric",
"numeric"), na.strings=c("C","B")))

does R first read the entire file and then worry about colClasses and
na.strings, or does it handle this line by line as it goes?

(if it does the former, I can write a perl pre-filter)

/iaw

----
Ivo Welch (ivo.welch@gmail.com)

	[[alternative HTML version deleted]]

Whit Armstrong

2013-May-22 20:48 UTC

head link

[R] reading large csv data sets efficiently

http://cran.r-project.org/web/packages/data.table/index.html


On Wed, May 22, 2013 at 12:31 PM, ivo welch
<ivo.welch@anderson.ucla.edu>wrote:
> I have a couple of large data sets, on the order of 4GB.  they come in .csv
> files, with about 50 columns and lots of rows.  a couple have weird NA
> values, such as "C" and "B", in numeric columns.
>
> I am wondering how good read.csv() is dealing with this stuff on the first
> pass.
>
> d<-(read.csv("t.csv", colClasses=c(NA, NA, "NULL",
"NULL",
> "numeric","numeric", "numeric",
"numeric"), na.strings=c("C","B")))
>
> does R first read the entire file and then worry about colClasses and
> na.strings, or does it handle this line by line as it goes?
>
> (if it does the former, I can write a perl pre-filter)
>
> /iaw
>
> ----
> Ivo Welch (ivo.welch@gmail.com)
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - May 2013 - reading large csv data sets efficiently

[R] reading large csv data sets efficiently

[R] reading large csv data sets efficiently