Jens Oehlschlägel
2003-Nov-06 19:41 UTC
Summary: [R] How to represent pure linefeeds chr(10) under R for Windows
Thanks to all who have responded. My concern was to be able to write a csv file that can have line feeds in string columns chr(10). Why? Excel allows line feeds chr(10) within cells and line breaks chr(13)+chr(10) at line ending, but the windows version of R automatically replaces \n by \r\n in writing and \r\n by \n in reading (text mode). The clues for a solution came from Brian Ripley and Thomas Lumley: we need to use "binary" connection mode (will not replace \n by \r\n) and explicit specification of line ending as "\r\n". Testing with these gave the following results: ## write.table / read.table: a bit inconsistent: need text connection to read and binary connection to write ## writeLines / readLines: readLines misses a sep= parameter to properly read in such data ## writeChar / readChar: OK Thanks again and Best regards Jens Oehlsch?gel ## Details filename <- "c:/tmp/c2.csv" ## write.table / read.table: a bit inconsistent: need binary connection to read and text connection to write data <- data.frame(a='c\nd', b='"???????"') # writing in text mode replaces \n by \r\n file <- file(filename, "w") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double") close(file) # writing in binary mode does not replace \n, however the real line endings are also \n instead of \r\n file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double") close(file) # using the eol parameter we can create the desired csv format (which can be read by Excel file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double", eol="\r\n") close(file) # for the read test write a dataset that avoids a reported bug in read.table() data <- data.frame(a=c(rep("x", 5), "c\nd"), b=c(rep("y", 5), '"???????"')) file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double", eol="\r\n") close(file) # read astonishingly works on char mode connection file <- file(filename, "r") read.csv2(file) close(file) # and doesn't work on binary connection file <- file(filename, "rb") read.csv2(file) close(file) ## writeLines / readLines: readLines misses a sep= parameter to properly read in such data data <- c('a;b', 'c\nd;"???????"') # text mode substitutes \n -> \r\n like in write.table file <- file(filename, "w") writeLines(data, file, sep="\n") close(file) # we can write out the desired one using binary mode and sep="\r\n" file <- file(filename, "wb") writeLines(data, file, sep="\r\n") close(file) # However, we cannot read this in in binary mode, readLines misses a sepparameter file <- file(filename, "rb") readLines(file) close(file) # text mode replaces as expected file <- file(filename, "r") readLines(file) close(file) ## writeChar / readChar: OK data <- c('a;b\r\nc\nd;"???????"') # writing text mode substitutes as expected file <- file(filename, "w") writeChar(data, file, eos=NULL) close(file) # writing binary mode works file <- file(filename, "wb") writeChar(data, file, eos=NULL) close(file) # reading binary mode works file <- file(filename, "rb") readChar(file, nchar(data)) close(file) # reading text mode substitutes as expected file <- file(filename, "r") readChar(file, nchar(data)) close(file) --
Gabor Grothendieck
2003-Nov-06 20:33 UTC
Summary: [R] How to represent pure linefeeds chr(10) under R for Windows
Its also possible to avoid these intricacies by not using an intermediate text representation, i.e. csv, in the first place. The following R code uses the free dataload utility (Google search for Baird dataload utility) to create an .xls file from data frame, x: save(x,file="x.rda") system("dataload x.rda x.xls/u") At this point you can read x.xls into Excel. --- Date: Thu, 6 Nov 2003 20:41:16 +0100 (MET) From: Jens =?ISO-8859-1?Q?Oehlschl=E4gel?= <joehl at gmx.de> To: <r-help at stat.math.ethz.ch> Subject: Summary: [R] How to represent pure linefeeds chr(10) under R for Windows Thanks to all who have responded. My concern was to be able to write a csv file that can have line feeds in string columns chr(10). Why? Excel allows line feeds chr(10) within cells and line breaks chr(13)+chr(10) at line ending, but the windows version of R automatically replaces \n by \r\n in writing and \r\n by \n in reading (text mode). The clues for a solution came from Brian Ripley and Thomas Lumley: we need to use "binary" connection mode (will not replace \n by \r\n) and explicit specification of line ending as "\r\n". Testing with these gave the following results: ## write.table / read.table: a bit inconsistent: need text connection to read and binary connection to write ## writeLines / readLines: readLines misses a sep= parameter to properly read in such data ## writeChar / readChar: OK Thanks again and Best regards Jens Oehlschägel ## Details filename <- "c:/tmp/c2.csv" ## write.table / read.table: a bit inconsistent: need binary connection to read and text connection to write data <- data.frame(a='c\nd', b='"äöüÄÖÜß"') # writing in text mode replaces \n by \r\n file <- file(filename, "w") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double") close(file) # writing in binary mode does not replace \n, however the real line endings are also \n instead of \r\n file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double") close(file) # using the eol parameter we can create the desired csv format (which can be read by Excel file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double", eol="\r\n") close(file) # for the read test write a dataset that avoids a reported bug in read.table() data <- data.frame(a=c(rep("x", 5), "c\nd"), b=c(rep("y", 5), '"äöüÄÖÜß"')) file <- file(filename, "wb") write.table(data, row.names=FALSE, file=file, sep=";", qmethod="double", eol="\r\n") close(file) # read astonishingly works on char mode connection file <- file(filename, "r") read.csv2(file) close(file) # and doesn't work on binary connection file <- file(filename, "rb") read.csv2(file) close(file) ## writeLines / readLines: readLines misses a sep= parameter to properly read in such data data <- c('a;b', 'c\nd;"äöüÄÖÜß"') # text mode substitutes \n -> \r\n like in write.table file <- file(filename, "w") writeLines(data, file, sep="\n") close(file) # we can write out the desired one using binary mode and sep="\r\n" file <- file(filename, "wb") writeLines(data, file, sep="\r\n") close(file) # However, we cannot read this in in binary mode, readLines misses a sepparameter file <- file(filename, "rb") readLines(file) close(file) # text mode replaces as expected file <- file(filename, "r") readLines(file) close(file) ## writeChar / readChar: OK data <- c('a;b\r\nc\nd;"äöüÄÖÜß"') # writing text mode substitutes as expected file <- file(filename, "w") writeChar(data, file, eos=NULL) close(file) # writing binary mode works file <- file(filename, "wb") writeChar(data, file, eos=NULL) close(file) # reading binary mode works file <- file(filename, "rb") readChar(file, nchar(data)) close(file) # reading text mode substitutes as expected file <- file(filename, "r") readChar(file, nchar(data)) close(file) _______________________________________________ No banners. No pop-ups. No kidding. Introducing My Way - http://www.myway.com
Gabor Grothendieck
2003-Nov-07 13:04 UTC
Summary: [R] How to represent pure linefeeds chr(10) under R for Windows
While I don't disagree with what you say, the purpose of this is to interface to Excel which is even less free (you have to pay for Excel but not for dataload) so perhaps the status of the glue used between R and Excel is not as important.>From an expediency viewpoint, I found that dataload solvesa wide variety of interfacing problems easily, typically in a single line of code, using a single tool and consistent syntax. I can translate easily among .rda, .xls, .csv, .txt and numerous other formats. --- Date: Fri, 7 Nov 2003 10:32:44 +0100 From: Martin Maechler <maechler at stat.math.ethz.ch> To: <ggrothendieck at myway.com> Cc: <joehl at gmx.de>, <r-help at stat.math.ethz.ch> Subject: Re: Summary: [R] How to represent pure linefeeds chr(10) under R for Windows>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at myway.com> >>>>> on Thu, 6 Nov 2003 15:33:04 -0500 (EST) writes:Gabor> Its also possible to avoid these intricacies by not Gabor> using an intermediate text representation, i.e. csv, Gabor> in the first place. Gabor> The following R code uses the free dataload utility Gabor> (Google search for Baird dataload utility) to create Gabor> an .xls file from data frame, x: Gabor> save(x,file="x.rda") Gabor> system("dataload x.rda x.xls/u") Gabor> At this point you can read x.xls into Excel. Note that this has two "problems" IMO, which Jens' R-only solution does not have: 1) dataload is *not* free software in the sense of the Free Software Foundation (which has existed for a much longer time than MS windows!): It's only "free" as in "free beer", not "free" as in "free speech" . For more, read the "Free as in Freedom" main link on http://www.fsf.org/ 2) dataload is only available as *binary* on *some* platforms, as opposed to R which is available to everyone working with it :-) Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><