Hello fellow R users,
I am trying to read a 6.9 million row text file with 26 columns separated by
spaces into R using ff. When I specify a small number for first.rows,
next.rows and nrows it is read with no issue. However, when I try to specify
larger next.rows values and no nrows parameter to read the entire file, I
keep getting errors. Please see code below.
I am trying to this on a m1.large EC2 machine running R with 14.8 GB of
memory. I haven't been able to read the entire dataset into memory using
traditional read.table.
I guess I am not sure given the error message if I need to specify further
parameters.
Thank you,
Marck Vaisman
marck@vaisman.us
http://www.linkedin.com/in/marckvaisman
http://twitter.com/#!/wahalulu <http://twitter.com/#%21/wahalulu>
> results.five <- read.table("./results/results.txt",
+ header = F, nrows = 5) # read 5 lines for
structure> classes <- sapply(results.five, class) # to specify colClasses
> classes
V1 V2 V3 V4 V5 V6 V7
V8
"integer" "factor" "integer" "integer"
"integer" "integer" "integer"
"numeric"
V9 V10 V11 V12 V13 V14 V15
V16
"numeric" "numeric" "numeric" "integer"
"numeric" "numeric" "numeric"
"numeric"
V17 V18 V19 V20 V21 V22 V23
V24
"integer" "numeric" "numeric" "numeric"
"numeric" "factor" "numeric"
"numeric"
V25 V26
"numeric" "numeric"> library(ff)
> results.ff <- read.table.ffdf(file = "./results/results.txt",
+ header = F,
+ colClasses = classes,
+ first.rows = 1000,
+ next.rows = 1000,
+ nrows = 10000)> dim(results.ff)
[1] 10000 26> results.ff <- read.table.ffdf(file = "./results/results.txt",
+ header = F,
+ colClasses = classes,
+ first.rows = 10000,
+ next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
scan() expected 'an integer', got
'3e+05'> rff <- read.table.ffdf(file = "./results/results.txt",
+ header = F,
+ colClasses = classes,
+ first.rows = 10000,
+ next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
scan() expected 'an integer', got
'3e+05'>
[[alternative HTML version deleted]]