Duncan Murdoch
2017-Sep-19 21:45 UTC
[Rd] what do you think about write.table(... qmethod = "excel")?
On 19/09/2017 4:10 PM, Ista Zahn wrote:> On Tue, Sep 19, 2017 at 1:04 PM, Paul Johnson <pauljohn32 at gmail.com> wrote: >> Last week one of our clients reported trouble with a csv file I >> generated with write.table. He said that columns with quotes for >> character variables were rejected by their data importer, which was >> revised to match the way Microsoft Excel uses quotation marks in >> character variables. I explained to them that quoted character >> variables are virtuous and wise, of course, but they say Microsoft >> Excel CSV export no longer quotes characters unless they include >> commas in the values. >> >> They showed me a CSV file from Excel that looked like this >> >> x1,x2,x3,x4 5 6 >> fred,barney,betty,x >> bambam,"fred,wilma",pebbles,y >> >> Note how the quotes only happen on row 2 column 2. I was surprised it >> did that, but now I have some pressure to write a csv maker that has >> that structure. > > I think you should resist that pressure.That depends on whether this is a paying client or not. > It really makes no sense to> write a .csv parser that _only_ supports .csv files created by Excel.That's true, but if that's what they want to do, and they're willing to pay to be able to write files that imitate Excel, then why not do what they ask? On the other hand, if they aren't willing to pay for the work, then you should lecture them on how silly their request is. In any case, base R functions should not include nonsense, so this is not something that should go into R. Duncan Murdoch> If you're going to use Excel as a model, a more sensible approach > would be to write a csv parser that supports all the formats that > Excel itself supports; Excel of course has no problem importing > > "x1","x2","x3","x4" > "fred","barney","betty","x" > "bambam","fred,wilma","pebbles","y" > > So, seriously, tell them to just fix their csv parser. Since they seem > hung up on Excel, it may help to point out that it does in fact import > csv produced by write.csv without complaint. > > Best, > Ista > > Its weird, even when there are spaces in values there >> are no quotation marks. >> >> Has anybody done this and verified that it matches CSV from MS Excel? >> If I succeed will you consider a patch? >> >> pj >> -- >> Paul E. Johnson http://pj.freefaculty.org >> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu >> >> To write to me directly, please address me at pauljohn at ku.edu. >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Paul Johnson
2017-Sep-20 16:53 UTC
[Rd] what do you think about write.table(... qmethod = "excel")?
On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > > That's true, but if that's what they want to do, and they're willing to pay > to be able to write files that imitate Excel, then why not do what they ask? > > On the other hand, if they aren't willing to pay for the work, then you > should lecture them on how silly their request is. > > In any case, base R functions should not include nonsense, so this is not > something that should go into R. > > Duncan Murdoch >I understand. This is a paying client, I'm going where the money goes. Here's my current working example of a function that writes a CSV exactly as Excel does. I've posted this into StackOverflow (https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv). It is buried under a thread, maybe will not get much attention and bug-checking. Oh, well, I can hope. Processes variables column-by-column, I don't know how to do it differently. ##' Write CSV files with quotes same as MS Excel 2013 or newer ##' ##' R's write.csv inserts quotes around all elements in a character ##' vector (if quote = TRUE). In contrast, MS Excel CSV export no ##' longer inserts quotation marks on all elements in character ##' variables, except when the cells include commas or quotation ##' marks. This function generates CSV files that are, so far as we ##' know, exactly the same "quoted style" as MS Excel CSV export ##' files. ##' ##' This works by manually inserting quotation marks where necessary and ##' turning FALSE R's own method to insert quotation marks. ##' @param x a data frame ##' @param file character string for file name ##' @param row.names Default FALSE for row.names ##' @importFrom utils write.table ##' @return the return from write.table, using revised quotes ##' @export ##' @author Paul Johnson ##' @examples ##' set.seed(234) ##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""), ##' x2 = rnorm(4), stringsAsFactors = FALSE) ##' x1 ##' fn <- tempfile(pattern = "testcsv", fileext = ".csv") ##' writeCSV(x1, file = fn) ##' readLines(fn) ##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE) ##' all.equal(x1,x2) writeCSV <- function(x, file, row.names = FALSE){ xischar <- colnames(x)[sapply(x, is.character)] for(jj in xischar){ x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE) needsquotes <- grep('[\",]', x[ ,jj]) x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"") } write.table(x, file = file, sep = ",", quote = FALSE, row.names = row.names) } Output:> set.seed(234) > x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),+ x2 = rnorm(4), stringsAsFactors = FALSE)> x1x1 x2 1 a 0.6607697 2 b,c -2.0529830 3 b -1.4992061 4 The "Washington, DC" 1.4712331> fn <- tempfile(pattern = "testcsv", fileext = ".csv") > writeCSV(x1, file = fn) > readLines(fn)[1] "x1,x2" [2] "a,0.660769736644892" [3] "\"b,c\",-2.052983003941" [4] "b,-1.49920605110092" [5] "\"The \"\"Washington, DC\"\"\",1.4712331168047"> x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE) > all.equal(x1,x2)[1] TRUE I already see one problem, that I've got no special arrangement for column names with commas or quotes. People who want column names with those things are even more wrong than the people write a parser that can't understand quotes on character variables. -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write to me directly, please address me at pauljohn at ku.edu.
David Winsemius
2017-Sep-20 17:39 UTC
[Rd] what do you think about write.table(... qmethod = "excel")?
> On Sep 20, 2017, at 9:53 AM, Paul Johnson <pauljohn32 at gmail.com> wrote: > > On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> >> >> That's true, but if that's what they want to do, and they're willing to pay >> to be able to write files that imitate Excel, then why not do what they ask? >> >> On the other hand, if they aren't willing to pay for the work, then you >> should lecture them on how silly their request is. >> >> In any case, base R functions should not include nonsense, so this is not >> something that should go into R. >> >> Duncan Murdoch >> > > I understand. This is a paying client, I'm going where the money goes. > > Here's my current working example of a function that writes a CSV > exactly as Excel does. I've posted this into StackOverflow > (https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv). > It is buried under a thread, maybe will not get much attention and > bug-checking. Oh, well, I can hope. Processes variables > column-by-column, I don't know how to do it differently. > > > > ##' Write CSV files with quotes same as MS Excel 2013 or newer > ##' > ##' R's write.csv inserts quotes around all elements in a character > ##' vector (if quote = TRUE). In contrast, MS Excel CSV export no > ##' longer inserts quotation marks on all elements in character > ##' variables, except when the cells include commas or quotation > ##' marks. This function generates CSV files that are, so far as we > ##' know, exactly the same "quoted style" as MS Excel CSV export > ##' files. > ##' > ##' This works by manually inserting quotation marks where necessary and > ##' turning FALSE R's own method to insert quotation marks. > ##' @param x a data frame > ##' @param file character string for file name > ##' @param row.names Default FALSE for row.names > ##' @importFrom utils write.table > ##' @return the return from write.table, using revised quotes > ##' @export > ##' @author Paul Johnson > ##' @examples > ##' set.seed(234) > ##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""), > ##' x2 = rnorm(4), stringsAsFactors = FALSE) > ##' x1 > ##' fn <- tempfile(pattern = "testcsv", fileext = ".csv") > ##' writeCSV(x1, file = fn) > ##' readLines(fn) > ##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE) > ##' all.equal(x1,x2) > writeCSV <- function(x, file, row.names = FALSE){ > xischar <- colnames(x)[sapply(x, is.character)]? Should that be: xIsCharOrFactor <- colnames(x)[sapply(x, function(z) is.character(z) | is.factor(z) )] Just asking. No testing done. __ David.> for(jj in xischar){ > x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE) > needsquotes <- grep('[\",]', x[ ,jj]) > x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"") > } > write.table(x, file = file, sep = ",", quote = FALSE, > row.names = row.names) > } > > Output: > >> set.seed(234) >> x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""), > + x2 = rnorm(4), stringsAsFactors = FALSE) >> x1 > x1 x2 > 1 a 0.6607697 > 2 b,c -2.0529830 > 3 b -1.4992061 > 4 The "Washington, DC" 1.4712331 >> fn <- tempfile(pattern = "testcsv", fileext = ".csv") >> writeCSV(x1, file = fn) >> readLines(fn) > [1] "x1,x2" > [2] "a,0.660769736644892" > [3] "\"b,c\",-2.052983003941" > [4] "b,-1.49920605110092" > [5] "\"The \"\"Washington, DC\"\"\",1.4712331168047" >> x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE) >> all.equal(x1,x2) > [1] TRUE > > I already see one problem, that I've got no special arrangement for > column names with commas or quotes. People who want column names with > those things are even more wrong than the people write a parser that > can't understand quotes on character variables. > > -- > Paul E. Johnson http://pj.freefaculty.org > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > To write to me directly, please address me at pauljohn at ku.edu. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-develDavid Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
Possibly Parallel Threads
- what do you think about write.table(... qmethod = "excel")?
- what do you think about write.table(... qmethod = "excel")?
- Using read.table to read file created with read.table and qmethod = "escape"
- what do you think about write.table(... qmethod = "excel")?
- [Patch suggestion] Adding 3rd arg to tempfile() to set extension