John Muschelli
2016-Nov-14 15:32 UTC
[Rd] Read.dcf with no newline ending: gzfile drops last line
I don't know if this is a bug per se, but an undesired behavior in read.dcf. read.dcf takes a file argument and passes it to gzfile if it's a character: if (is.character(file)) { file <- gzfile(file) on.exit(close(file)) } This gzfile connection is passed to readLines (line #39): lines <- readLines(file) If no newline is at the end of the file, readLines doesn't give a warning (I think appropriate behavior). If a DESCRIPTION file doesn't happen to have a newline at the end of it (odd, but it may happen), then the last tag is dropped:> x = "Package: test+ Type: Package"> > ###################################### > # No Newline in file > ###################################### > fname = tempfile() > writeLines(x, fname, sep = "") > > ### readlines with character - warning but all fields > readLines(fname)[1] "Package: test" "Type: Package" Warning message: In readLines(fname) : incomplete final line found on '/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'> ### readlines with file connection - warning but all fields > file_con <- file(fname) > readLines(file_con)[1] "Package: test" "Type: Package" Warning message: In readLines(file_con) : incomplete final line found on '/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'> > ### readlines with gzfile connection > ## no warning and drops last field > gz_con = gzfile(fname) > readLines(gz_con) # ONLY 1 lines![1] "Package: test"> > ###################################### > # No Newline in file - fine > ###################################### > ### readlines with gzfile connection > ## no warning and drops last field but OK > writeLines(x, fname, sep = "\n") > gz_con = gzfile(fname) > readLines(gz_con)[1] "Package: test" "Type: Package" Currently I use file(fname) before read.dcf to be sure a warning occurs, but all fields are read. I didn't see anything in read.dcf help about this. readLines states clearly: "If the final line is incomplete (no final EOL marker) the behaviour depends on whether the connection is blocking or not", but it's not 100% clear that read.dcf uses gzfile if the file is not compressed. Thanks John