joehl@gmx.de
2003-Nov-05  16:13 UTC
[Rd] read.table leaves out data when reading multiple-line records (PR#4955)
Dear all, I discovered that read.table (RW1.8.0) leaves out data when reading multiple-line records. Replication code at the end Best regards Jens Oehlschlägel> filename <- "c:/tmp/c2.csv" > > data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quotedsimpleline"'))> > #look at the data > write.table(data, sep=",", row.names=FALSE)"a","b" "c","d" "e newline","\"quoted simpleline\""> > # write it out > write.table(data, sep=",", row.names=FALSE, file=filename) > > # reading it in a line is missing > read.csv(filename)a b 1 e\nnewline \\quoted simpleline\\> > fc <- file(filename, open="r") > > # the problem seems to be > # readTableHead erroneously counts 3 lines as 4 > lines <- .Internal(readTableHead(fc, 4, "", TRUE)) > lines[1] "\"a\",\"b\"" "\"c\",\"d\"" "\"e" [4] "newline\",\"\\\"quoted simpleline\\\"\""> > # double pushback is fine > pushBack(c(lines,lines), fc) > > # but nlines tells us we had 4 lines, which in fact are only 3 > nlines <- length(lines) > nlines[1] 4> > # and the first scan eats up more than the first pushback > scan(fc, what="string", sep=",", nlines=nlines)Read 8 items [1] "a" "b" "c" "d" "e\nnewline" [6] "\\quoted simpleline\\" "a" "b"> > # thus the real scan misses data > scan(fc, what="string", sep=",")Read 4 items [1] "c" "d" "e\nnewline" "\\quoted simpleline\\"> > close(fc) > > version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 8.0 year 2003 month 10 day 08 language R filename <- "c:/tmp/c2.csv" data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted simpleline"')) #look at the data write.table(data, sep=",", row.names=FALSE) # write it out write.table(data, sep=",", row.names=FALSE, file=filename) # reading it in a line is missing read.csv(filename) fc <- file(filename, open="r") # the problem seems to be # readTableHead erroneously counts 3 lines as 4 lines <- .Internal(readTableHead(fc, 4, "", TRUE)) lines # double pushback is fine pushBack(c(lines,lines), fc) # but nlines tells us we had 4 lines, which in fact are only 3 nlines <- length(lines) nlines # and the first scan eats up more than the first pushback scan(fc, what="string", sep=",", nlines=nlines) # thus the real scan misses data scan(fc, what="string", sep=",") close(fc) version --
Prof Brian Ripley
2003-Nov-11  08:57 UTC
(PR#4955) [Rd] read.table leaves out data when reading multiple-line records
That isn't a file in table format, and so is not supported. Use scan if you want multi-line records. On Wed, 5 Nov 2003 joehl@gmx.de wrote:> > > Dear all, > > I discovered that read.table (RW1.8.0) leaves out data when reading > multiple-line records. > > Replication code at the end > > Best regards > > > Jens Oehlschl?gel > > > > filename <- "c:/tmp/c2.csv" > > > > data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted > simpleline"')) > > > > #look at the data > > write.table(data, sep=",", row.names=FALSE) > "a","b" > "c","d" > "e > newline","\"quoted simpleline\"" > > > > # write it out > > write.table(data, sep=",", row.names=FALSE, file=filename) > > > > # reading it in a line is missing > > read.csv(filename) > a b > 1 e\nnewline \\quoted simpleline\\ > > > > fc <- file(filename, open="r") > > > > # the problem seems to be > > # readTableHead erroneously counts 3 lines as 4 > > lines <- .Internal(readTableHead(fc, 4, "", TRUE)) > > lines > [1] "\"a\",\"b\"" "\"c\",\"d\"" > "\"e" > [4] "newline\",\"\\\"quoted simpleline\\\"\"" > > > > # double pushback is fine > > pushBack(c(lines,lines), fc) > > > > # but nlines tells us we had 4 lines, which in fact are only 3 > > nlines <- length(lines) > > nlines > [1] 4 > > > > # and the first scan eats up more than the first pushback > > scan(fc, what="string", sep=",", nlines=nlines) > Read 8 items > [1] "a" "b" "c" > "d" "e\nnewline" > [6] "\\quoted simpleline\\" "a" "b" > > > > # thus the real scan misses data > > scan(fc, what="string", sep=",") > Read 4 items > [1] "c" "d" "e\nnewline" > "\\quoted simpleline\\" > > > > close(fc) > > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 8.0 > year 2003 > month 10 > day 08 > language R > > > > > filename <- "c:/tmp/c2.csv" > > data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted simpleline"')) > > #look at the data > write.table(data, sep=",", row.names=FALSE) > > # write it out > write.table(data, sep=",", row.names=FALSE, file=filename) > > # reading it in a line is missing > read.csv(filename) > > fc <- file(filename, open="r") > > # the problem seems to be > # readTableHead erroneously counts 3 lines as 4 > lines <- .Internal(readTableHead(fc, 4, "", TRUE)) > lines > > # double pushback is fine > pushBack(c(lines,lines), fc) > > # but nlines tells us we had 4 lines, which in fact are only 3 > nlines <- length(lines) > nlines > > # and the first scan eats up more than the first pushback > scan(fc, what="string", sep=",", nlines=nlines) > > # thus the real scan misses data > scan(fc, what="string", sep=",") > > close(fc) > > version > > > -- > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595