When reading comma-delimited files as saved from a spreadsheet (unfortunately many of my scientific collaborators give me these) in read.table(), missing values are spotted most of the time. Unfortunately when comma is the first character on the line it gets it wrong. For example, reading the file 1,,3 ,5,6 ,8,9 with read.table("test.dat", header=F, sep=",") R gives an error: row.lens[1] 3 3 2 Error: all rows must have the same length. Splus handles this OK and returns V1 V2 V3 1 1 NA 3 2 NA 5 6 3 NA 8 9 as expected. Note that this discrepancy does NOT occur with scan(..., sep=","), which returns 1 NA 3 NA 5 6 NA 8 9 in both R and Splus. David Clayton MRC Biostatistics Unit Institute of Public Health University Forvie Site Robinson Way Cambridge CB2 2SR Telephone: (0) 1223 330375 Fax: (0) 1223 330388 e-mail: david.clayton at mrc-bsu.cam.ac.uk -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
David Clayton <david.clayton at mrc-bsu.cam.ac.uk> writes:> 1,,3 > ,5,6 > ,8,9 > > with > > read.table("test.dat", header=F, sep=",") > > R gives an error: > row.lens> [1] 3 3 2 > Error: all rows must have the same length.Blimey! How did *that* go unnoticed for so long?? I think I've traced it to the following section in do_countfields else if (sepchar) { if (c == sepchar) nfields++; else if (nfields == 0) nfields++; } I don't think we want that 2nd 'else' there... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
David Clayton <david.clayton at mrc-bsu.cam.ac.uk> writes:> 1,,3 > ,5,6 > ,8,9..> Splus handles this OK and returns > > V1 V2 V3 > 1 1 NA 3 > 2 NA 5 6 > 3 NA 8 9 > > as expected.Hum. With my suggested change PLUS interchanging the 2 'if's, I now get:> read.table('junk',sep=',')V1 V2 V3 1 1 NA 3 2 5 6 3 NA 8 9 which on closer thought actually *is* what I'd expect - if you have a space in a comma-separated field, the column should be of type character/factor, no? BTW: "r-testers"?? That address got superseded by the r-help/r-devel lists something like two years ago.... Arguably, this discussion should have gone to r-devel, not r-help or even better to r-bugs, so it would get an official Problem Report number. (I'm leaving it in r-help for now, since that where the previous messages went, but if we need to discuss it more extensively, we should probably move it to r-devel) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._