ripley at stats.ox.ac.uk
2007-Jul-18 07:43 UTC
[Rd] (PR#9796) write.dcf/read.dcf cycle converts missing entry
BIll, Thanks. I am seeing some problems here, for example when all the fields are missing, or all the fields in a row are missing. I've fixes for those, and will commit to R-devel shortly. On Tue, 17 Jul 2007, bill at insightful.com wrote:> Full_Name: Bill Dunlap > Version: 2.5.0 > OS: Red Hat Enterprise Linux WS release 3 (Taroon Update 6) > Submission from: (NULL) (24.17.60.30) > > > If you read a dcf file with read.dcf(file,fields=c("Field",...)) > and the file does not contain the desired field "Field", > read.dcf puts a character NA for that entry in its output > matrix. If you then call write.dcf, passing it the output > of read.dcf(), it will write the entry "Field: NA". A subsequent > read.dcf() on write.dcf's output file will then have a "NA", > not a character NA, in the entry for "Field". I think that > write.dcf() should not write lines in the output file where > the input matrix contains a character NA. > > Here is a test function to demonstrate the problem. It returns > TRUE when a write.dcf/read.dcf cycle does not change the data. > > test.write.dcf <- function () { > origFile <- tempfile() > copyFile <- tempfile() > on.exit(unlink(c(origFile, copyFile))) > writeLines(c("Package: testA", "Version: 0.1-1", "Depends:", "", > "Package: testB", "Version: 2.1" , "Suggests: testA", "", > "Package: testC", "Version: 1.3.1", ""), > origFile) > orig <- read.dcf(origFile, > fields=c("Package","Version","Depends","Suggests")) > write.dcf(orig, copyFile, width = 72) > copy <- read.dcf(copyFile, > fields=c("Package","Version","Depends","Suggests")) > value <- all.equal(orig, copy) > if (!identical(value, TRUE)) { > attr(value, "orig") <- orig > attr(value, "copy") <- copy > } > value > } > Currently we get > > test.write.dcf() > [1] "'is.NA' value mismatch: 0 in current 4 in target" > attr(,"orig") > Package Version Depends Suggests > [1,] "testA" "0.1-1" "" NA > [2,] "testB" "2.1" NA "testA" > [3,] "testC" "1.3.1" NA NA > attr(,"copy") > Package Version Depends Suggests > [1,] "testA" "0.1-1" "" "NA" > [2,] "testB" "2.1" "NA" "testA" > [3,] "testC" "1.3.1" "NA" "NA" > With the attached write.dcf() it returns TRUE. > > The diff would be > 19,22c19,24 > < eor <- character(nr * nc) > < eor[seq.int(1, nr - 1) * nc] <- "\n" > < writeLines(paste(formatDL(rep.int(colnames(x), nr), c(t(x)), > < style = "list", width = width, indent = indent), eor, > --- >> tx <- t(x) >> not.na <- c(!is.na(tx)) >> eor <- character(sum(not.na)) >> eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n" >> writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx), >> style = "list", width = width, indent = indent)[not.na], eor, > > and the entire function would be > > `write.dcf` <- > function (x, file = "", append = FALSE, indent = 0.1 * getOption("width"), > width = 0.9 * getOption("width")) > { > if (!is.data.frame(x)) > x <- data.frame(x) > x <- as.matrix(x) > mode(x) <- "character" > if (file == "") > file <- stdout() > else if (is.character(file)) { > file <- file(file, ifelse(append, "a", "w")) > on.exit(close(file)) > } > if (!inherits(file, "connection")) > stop("'file' must be a character string or connection") > nr <- nrow(x) > nc <- ncol(x) > tx <- t(x) > not.na <- c(!is.na(tx)) > eor <- character(sum(not.na)) > eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n" > writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx), > style = "list", width = width, indent = indent)[not.na], eor, > sep = ""), file) > } > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Bill Dunlap
2007-Jul-18 23:32 UTC
[Rd] (PR#9796) write.dcf/read.dcf cycle converts missing entry
On Wed, 18 Jul 2007 ripley at stats.ox.ac.uk wrote:> I am seeing some problems here, for example when all the fields are > missing, or all the fields in a row are missing. I've fixes for those, > and will commit to R-devel shortly.write.dcf() is also used by print.packageDesription() so this change affects how missing fields are printed there. In 2.5 we got > packageDescription("base", fields=c("Version","NoSuchField")) Version: 2.5.0 NoSuchField: NA -- File: /dept/devel/sw/R/R.linux.2.5.0/lib/R/library/base/DESCRIPTION -- Fields read: Version, NoSuchField and now it doesn't print anything about the non-existant field > packageDescription("base", fields=c("Version","NoSuchField")) Version: 2.6.0 -- File: /homes/bill/R-svn/r-devel/R/library/base/DESCRIPTION -- Fields read: Version, NoSuchField If this isn't acceptable then write.dcf will need a new argument to control the printing of missing lines. The missing trailing blank line is an inadvertant change (although it makes appending fields to a single record dcf file possible).> > If you read a dcf file with read.dcf(file,fields=c("Field",...)) > > and the file does not contain the desired field "Field", > > read.dcf puts a character NA for that entry in its output > > matrix. If you then call write.dcf, passing it the output > > of read.dcf(), it will write the entry "Field: NA". A subsequent > > read.dcf() on write.dcf's output file will then have a "NA", > > not a character NA, in the entry for "Field". I think that > > write.dcf() should not write lines in the output file where > > the input matrix contains a character NA. > > ... > > The diff would be > > 19,22c19,24 > > < eor <- character(nr * nc) > > < eor[seq.int(1, nr - 1) * nc] <- "\n" > > < writeLines(paste(formatDL(rep.int(colnames(x), nr), c(t(x)), > > < style = "list", width = width, indent = indent), eor, > > --- > >> tx <- t(x) > >> not.na <- c(!is.na(tx)) > >> eor <- character(sum(not.na)) > >> eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n" > >> writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx), > >> style = "list", width = width, indent = indent)[not.na], eor,[ The ==1 should be >=1 ]
Possibly Parallel Threads
- write.dcf/read.dcf cycle converts missing entry to "NA" (PR#9796)
- Lack of final newline in write.dcf changes append usage
- [Questions] About small files performance
- write.dcf does not quote as Debian would like it to (PR#12816)
- Bug in read.dcf(all = TRUE)?