thr3ads.net - R devel - [Rd] Problem in scan() (PR#4128) [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Paul.Bayer@gleichsam.de

2003-Sep-11 22:07 UTC

[Rd] Problem in scan() (PR#4128)

Full_Name: Paul Bayer
Version: 1.7.1
OS: Windows + Linux
Submission from: (NULL) (217.235.105.54)


I tried to read some large csv-files into R (30 - 100MB).
with scan(), skipping not needed columns by NULL-elements in
"what".

When these skipped elements are quoted strings with commas inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.

A little test will show what I mean. I have the following "test.csv":

"col.A","col.B","col.C","col.D"
1,"quoted string","again, again again",123
2,"nice quotes, isnt it","you got it",456

First I read all elements:
> tst <- scan("test.csv",
what=list(a=0,b="",c="",d=0), sep=",", skip=1)
Read 2 records> tst$a
[1] 1 2

$b
[1] "quoted string"        "nice quotes, isnt it"

$c
[1] "again, again again" "you got it"

$d
[1] 123 456

Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
> tst <- scan("test.csv",
what=list(a=0,b=NULL,c="",d=0), sep=",", skip=1)Read 2 records
Warning message:
number of items read is not a multiple of the number of
columns> tst$a
[1] 1 2

$b
NULL

$c
[1] "again, again again"            " isnt it,you got
it,456\n\n\n"

$d
[1] 123  NA
>
I got garbage.

Prof Brian Ripley

2003-Sep-11 22:40 UTC

head link

[Rd] Problem in scan() (PR#4128)

Quotes are only interpreted in character columns (scan.c line 240), and
NULL is not character.  So this was intentional.

If you would like this changed, please supply a patch (which looks to be  
a good exercise).

On Thu, 11 Sep 2003 Paul.Bayer@gleichsam.de wrote:
> Full_Name: Paul Bayer
> Version: 1.7.1
> OS: Windows + Linux
> Submission from: (NULL) (217.235.105.54)
> 
> 
> I tried to read some large csv-files into R (30 - 100MB).
> with scan(), skipping not needed columns by NULL-elements in
> "what".
> 
> When these skipped elements are quoted strings with commas inside,
> R interprets each such quoted comma as element separator
> leading to wrong records in the rest of the line.
> 
> A little test will show what I mean. I have the following
"test.csv":
> 
> "col.A","col.B","col.C","col.D"
> 1,"quoted string","again, again again",123
> 2,"nice quotes, isnt it","you got it",456
> 
> First I read all elements:
> 
> > tst <- scan("test.csv",
what=list(a=0,b="",c="",d=0), sep=",", skip=1)
> Read 2 records
> > tst
> $a
> [1] 1 2
> 
> $b
> [1] "quoted string"        "nice quotes, isnt it"
> 
> $c
> [1] "again, again again" "you got it"
> 
> $d
> [1] 123 456
> 
> Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
> 
> > tst <- scan("test.csv",
what=list(a=0,b=NULL,c="",d=0), sep=",", skip=1)
> Read 2 records
> Warning message:
> number of items read is not a multiple of the number of columns
> > tst
> $a
> [1] 1 2
> 
> $b
> NULL
> 
> $c
> [1] "again, again again"            " isnt it,you got
it,456\n\n\n"
> 
> $d
> [1] 123  NA
> 
> >
> 
> I got garbage.
> 
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 
> 
-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

R devel - Sep 2003 - Problem in scan() (PR#4128)

[Rd] Problem in scan() (PR#4128)

[Rd] Problem in scan() (PR#4128)