When reading comma-delimited files as saved from a spreadsheet (unfortunately
many of my scientific collaborators give me these) in read.table(), missing
values are spotted most of the time. Unfortunately when comma is the first
character on the line it gets it wrong. For example, reading the file
1,,3
,5,6
,8,9
with
read.table("test.dat", header=F, sep=",")
R gives an error:
row.lens[1] 3 3 2
Error: all rows must have the same length.
Splus handles this OK and returns
V1 V2 V3
1 1 NA 3
2 NA 5 6
3 NA 8 9
as expected.
Note that this discrepancy does NOT occur with scan(..., sep=","),
which
returns
1 NA 3 NA 5 6 NA 8 9
in both R and Splus.
David Clayton
MRC Biostatistics Unit
Institute of Public Health
University Forvie Site
Robinson Way
Cambridge CB2 2SR
Telephone: (0) 1223 330375
Fax: (0) 1223 330388
e-mail: david.clayton at mrc-bsu.cam.ac.uk
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
David Clayton <david.clayton at mrc-bsu.cam.ac.uk> writes:> 1,,3 > ,5,6 > ,8,9 > > with > > read.table("test.dat", header=F, sep=",") > > R gives an error: > row.lens> [1] 3 3 2 > Error: all rows must have the same length.Blimey! How did *that* go unnoticed for so long?? I think I've traced it to the following section in do_countfields else if (sepchar) { if (c == sepchar) nfields++; else if (nfields == 0) nfields++; } I don't think we want that 2nd 'else' there... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
David Clayton <david.clayton at mrc-bsu.cam.ac.uk> writes:> 1,,3 > ,5,6 > ,8,9..> Splus handles this OK and returns > > V1 V2 V3 > 1 1 NA 3 > 2 NA 5 6 > 3 NA 8 9 > > as expected.Hum. With my suggested change PLUS interchanging the 2 'if's, I now get:> read.table('junk',sep=',')V1 V2 V3 1 1 NA 3 2 5 6 3 NA 8 9 which on closer thought actually *is* what I'd expect - if you have a space in a comma-separated field, the column should be of type character/factor, no? BTW: "r-testers"?? That address got superseded by the r-help/r-devel lists something like two years ago.... Arguably, this discussion should have gone to r-devel, not r-help or even better to r-bugs, so it would get an official Problem Report number. (I'm leaving it in r-help for now, since that where the previous messages went, but if we need to discuss it more extensively, we should probably move it to r-devel) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._