The following seems to work:
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
500,
next.rows = 1005,sep=",",colClasses =
c("integer","factor","logical"))
'character' doesn't work because ff does not support character
vectors. Character vector need to be stored as factors. The
disadvantage of that is that the levels are stored in memory, so if
the number of levels is very large (e.g. with unique strings) you
might still run into memory problems.
'integer' doesn't work because read.csv.ffdf passes the colClasses
on
to read.table, which then tries to converts your second column to
integer which it can't.
Jan
Nick McClure <nfmcclure at gmail.com> schreef:
> I've spent some time trying to wrap my head around reading in large csv
> files with the ff-package. I think I know how to do it, but am bumping
> into some problems. I've tried to recreate the issues as best as I can
> with a smaller example and maybe someone can help explain the problems.
>
> The following code just creates a csv file with an integer column,
> character column and logical column.
> -------------------------------------------------
> library(ff)
> #Create data
> size = 2000
> fake.data >
data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T))
>
> #Write to csv
> write.csv(fake.data,"data.csv",row.names=F)
> -------------------------------------------------
>
> Now to read it in as a 'ffdf' class, I can do the following:
>
> -------------------------------------------------
> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows
= 500,
> next.rows = 1005,sep=",")
> -------------------------------------------------
>
> That works. But with my current large data set, read.csv.ffdf is debating
> with me about the classes it's importing. I was also messing around
with
> the first.rows/next.rows, but that's a question for another time. So
I'll
> try to load the data in, specifying the column types (same exact command,
> except with specifying colClasses):
>
> -------------------------------------------------
>
>> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
>> 500, next.rows = 1005,sep=",",colClasses =
>> c("integer","integer","logical"))Error in
scan(file, what, nmax,
>> sep, dec, quote, skip, nlines, na.strings, :
> scan() expected 'an integer', got '"J"'>
data > read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
500,
> next.rows = 1005,sep=",",colClasses >
c("integer","character","logical"))Error in
ff(initdata = initdata,
> length = length, levels = levels, ordered = ordered, :
> vmode 'character' not implemented> data >
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses =
rep("character",3))Error in
> ff(initdata = initdata, length = length, levels = levels, ordered >
ordered, :
> vmode 'character' not implemented> data >
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses =
rep("raw",3))Error in scan(file,
> what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> scan() expected 'a raw', got '8601'
>
> -------------------------------------------------
> I just can't find a combination of classes that will result in this
reading
> in. I really don't understand why the classes 'character'
won't work for
> all of them. Any thoughts as to why? I appreciate the help and time.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.