bill at insightful.com
2007-Jul-17 16:58 UTC
[Rd] write.dcf/read.dcf cycle converts missing entry to "NA" (PR#9796)
Full_Name: Bill Dunlap Version: 2.5.0 OS: Red Hat Enterprise Linux WS release 3 (Taroon Update 6) Submission from: (NULL) (24.17.60.30) If you read a dcf file with read.dcf(file,fields=c("Field",...)) and the file does not contain the desired field "Field", read.dcf puts a character NA for that entry in its output matrix. If you then call write.dcf, passing it the output of read.dcf(), it will write the entry "Field: NA". A subsequent read.dcf() on write.dcf's output file will then have a "NA", not a character NA, in the entry for "Field". I think that write.dcf() should not write lines in the output file where the input matrix contains a character NA. Here is a test function to demonstrate the problem. It returns TRUE when a write.dcf/read.dcf cycle does not change the data. test.write.dcf <- function () { origFile <- tempfile() copyFile <- tempfile() on.exit(unlink(c(origFile, copyFile))) writeLines(c("Package: testA", "Version: 0.1-1", "Depends:", "", "Package: testB", "Version: 2.1" , "Suggests: testA", "", "Package: testC", "Version: 1.3.1", ""), origFile) orig <- read.dcf(origFile, fields=c("Package","Version","Depends","Suggests")) write.dcf(orig, copyFile, width = 72) copy <- read.dcf(copyFile, fields=c("Package","Version","Depends","Suggests")) value <- all.equal(orig, copy) if (!identical(value, TRUE)) { attr(value, "orig") <- orig attr(value, "copy") <- copy } value } Currently we get > test.write.dcf() [1] "'is.NA' value mismatch: 0 in current 4 in target" attr(,"orig") Package Version Depends Suggests [1,] "testA" "0.1-1" "" NA [2,] "testB" "2.1" NA "testA" [3,] "testC" "1.3.1" NA NA attr(,"copy") Package Version Depends Suggests [1,] "testA" "0.1-1" "" "NA" [2,] "testB" "2.1" "NA" "testA" [3,] "testC" "1.3.1" "NA" "NA" With the attached write.dcf() it returns TRUE. The diff would be 19,22c19,24 < eor <- character(nr * nc) < eor[seq.int(1, nr - 1) * nc] <- "\n" < writeLines(paste(formatDL(rep.int(colnames(x), nr), c(t(x)), < style = "list", width = width, indent = indent), eor, ---> tx <- t(x) > not.na <- c(!is.na(tx)) > eor <- character(sum(not.na)) > eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n" > writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx), > style = "list", width = width, indent = indent)[not.na], eor,and the entire function would be `write.dcf` <- function (x, file = "", append = FALSE, indent = 0.1 * getOption("width"), width = 0.9 * getOption("width")) { if (!is.data.frame(x)) x <- data.frame(x) x <- as.matrix(x) mode(x) <- "character" if (file == "") file <- stdout() else if (is.character(file)) { file <- file(file, ifelse(append, "a", "w")) on.exit(close(file)) } if (!inherits(file, "connection")) stop("'file' must be a character string or connection") nr <- nrow(x) nc <- ncol(x) tx <- t(x) not.na <- c(!is.na(tx)) eor <- character(sum(not.na)) eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n" writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx), style = "list", width = width, indent = indent)[not.na], eor, sep = ""), file) }
Possibly Parallel Threads
- (PR#9796) write.dcf/read.dcf cycle converts missing entry
- [Questions] About small files performance
- bug: write.dcf converts hyphen in field name to period
- Lack of final newline in write.dcf changes append usage
- Read.dcf with no newline ending: gzfile drops last line