thr3ads.net - R help - [R] scan() problem [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Paul Bayer

2003-Sep-10 17:27 UTC

[R] scan() problem

Dear R-helpers,

I have to read some large csv-files into R (30 - 100MB).
Since reading with read.csv leads to "memory exhausted", I tried
with scan(), skipping not needed columns by NULL-elements in
"what".

When these skipped elements are quoted strings with commata inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.

A little test will show what I mean. I have the following "test.csv":

"col.A","col.B","col.C","col.D"
1,"quoted string","again, again again",123
2,"nice quotes, isnt it","you got it",456

First I read all elements:

 > tst <- scan("test.csv",
what=list(a=0,b="",c="",d=0), sep=",", skip=1)
Read 2 records
 > tst
$a
[1] 1 2

$b
[1] "quoted string"        "nice quotes, isnt it"

$c
[1] "again, again again" "you got it"

$d
[1] 123 456

Everything is fine. Then I try to skip the 2nd column by giving b=NULL:

 > tst <- scan("test.csv",
what=list(a=0,b=NULL,c="",d=0), sep=",",
skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
 > tst
$a
[1] 1 2

$b
NULL

$c
[1] "again, again again"            " isnt it,you got
it,456\n\n\n"

$d
[1] 123  NA

 >

I got garbage.
Isn't this a bug?
Or did I something wrong?
Is there a workaround?

Thank you all,

Paul Bayer,
Feldafing, Germany

Gabor Grothendieck

2003-Sep-11 01:24 UTC

head link

[R] scan() problem

If the records are always of the form:

number,"...","...",number 

where ... may contain commas but not double quotes then here is a 
kludgy solution.  Perhaps its sufficient?

# scan in data using " as the delimiter and keep first and last fields
s <-
scan("clipboard",skip=1,what=list("",NULL,NULL,NULL,""),sep="\"")

# remove commas from fields, convert to numeric and reshape into matrix
matrix(as.numeric(sub(",","",unlist(s))),nc=2)



--- Paul Bayer <Paul.Bayer@gleichsam.de> wrote:>Dear R-helpers,
>
>I have to read some large csv-files into R (30 - 100MB).
>Since reading with read.csv leads to "memory exhausted", I tried
>with scan(), skipping not needed columns by NULL-elements in
>"what".
>
>When these skipped elements are quoted strings with commata inside,
>R interprets each such quoted comma as element separator
>leading to wrong records in the rest of the line.
>
>A little test will show what I mean. I have the following
"test.csv":
>
>"col.A","col.B","col.C","col.D"
>1,"quoted string","again, again again",123
>2,"nice quotes, isnt it","you got it",456
>
>First I read all elements:
>
> > tst <- scan("test.csv",
what=list(a=0,b="",c="",d=0), sep=",", skip=1)
>Read 2 records
> > tst
>$a
>[1] 1 2
>
>$b
>[1] "quoted string"        "nice quotes, isnt it"
>
>$c
>[1] "again, again again" "you got it"
>
>$d
>[1] 123 456
>
>Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
>
> > tst <- scan("test.csv",
what=list(a=0,b=NULL,c="",d=0), sep=",",
>skip=1)
>Read 2 records
>Warning message:
>number of items read is not a multiple of the number of columns
> > tst
>$a
>[1] 1 2
>
>$b
>NULL
>
>$c
>[1] "again, again again"            " isnt it,you got
it,456\n\n\n"
>
>$d
>[1] 123  NA
>
> >
>
>I got garbage.
>Isn't this a bug?
>Or did I something wrong?
>Is there a workaround?
>
>Thank you all,
>
>Paul Bayer,
>Feldafing, Germany
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Maybe Matching Threads

Search for more maybe matching threads

R help - Sep 2003 - scan() problem

[R] scan() problem

[R] scan() problem

Maybe Matching Threads