All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))> write.csv(x, "test.csv")# NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. # here's what got written "","X1","X2","X3" "1","1","string one","another string" "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string" "3","3","third row","last "" col" # Importing test.csv works fine.> read.csv("test.csv")X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20" 5' 30" "test string final string 3 3 3 third row last " col # this looks good. # now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the # quote escaped ' \" ' as is found in my data set. Like this: "","X1","X2","X3" "1","1","string one","another string" "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string" "3","3","third row","last \" col" # this breaks read.csv:> read.csv("test.csv")X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim
Drat, I forgot to tell you what system I am on:> sessionInfo()R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base>>> Tim Howard 1/25/2013 1:42 PM >>>All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))> write.csv(x, "test.csv")# NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. # here's what got written "","X1","X2","X3" "1","1","string one","another string" "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string" "3","3","third row","last "" col" # Importing test.csv works fine.> read.csv("test.csv")X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20" 5' 30" "test string final string 3 3 3 third row last " col # this looks good. # now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the # quote escaped ' \" ' as is found in my data set. Like this: "","X1","X2","X3" "1","1","string one","another string" "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string" "3","3","third row","last \" col" # this breaks read.csv:> read.csv("test.csv")X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim
On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:> All, > > I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: > > # set up a matrix, using escape-quote as the internal double quote mark. > > x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE)) > >> write.csv(x, "test.csv") > > # NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. > # here's what got written > > "","X1","X2","X3" > "1","1","string one","another string" > "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string" > "3","3","third row","last "" col" > > # Importing test.csv works fine. > >> read.csv("test.csv") > X X1 X2 X3 > 1 1 1 string one another string > 2 2 2 quotes escaped 10' 20" 5' 30" "test string final string > 3 3 3 third row last " col > # this looks good. > # now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the > # quote escaped ' \" ' as is found in my data set. Like this: > > "","X1","X2","X3" > "1","1","string one","another string" > "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string" > "3","3","third row","last \" col"Use quote="":> read.csv(text='"","X1","X2","X3"+ "1","1","string one","another string" + "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final string" + "3","3","third row","last "" col"', sep=",", quote="") Not ...., quote="\"" X.. X.X1. X.X2. X.X3. 1 "1" "1" "string one" "another string" 2 "2" "2" "quotes escaped 10' 20"" 5' 30"" ""test string" "final string" 3 "3" "3" "third row" "last "" col" You will then be depending entirely on commas to separate. (Needed to use escaped single quotes to illustrate from a command line.)> > # this breaks read.csv: > >> read.csv("test.csv") > X X1 X2 X3 > 1 1 1 string one another string > 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col > > # we now have only two rows, with all the data captured in col2 row2 > > Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! > > Thank you in advance, > Tim > >David Winsemius Alameda, CA, USA
Following David's suggestion you might want to have a look at https://confluence.clazzes.org/display/CSVEDIT/CSVEdit+Home . I have not used it but it seems to get good reviews from people I know. John Kane Kingston ON Canada> -----Original Message----- > From: dwinsemius at comcast.net > Sent: Fri, 25 Jan 2013 13:42:25 -0800 > To: tghoward at gw.dec.state.ny.us > Subject: Re: [R] read.csv quotes within fields > > > On Jan 25, 2013, at 1:37 PM, Tim Howard wrote: > >> David, >> Thank you again for the reply. I'll try to make readLines() and >> strplit() work. What bugs me is that I think it would import fine if >> the folks who created the csv had used double quotes "" rather than an >> escaped quote \" for those pesky internal quotes. Since that's the case, >> I'd think there would be a solution within read.csv() ... or perhaps >> scan()?, I just can't figure it out. > > Can you pre-process with an editor? Replace all the ", " hits with > something like '|'. > > -- > David. >> best, >> Tim >> >>>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 4:16 PM >>> >> >> On Jan 25, 2013, at 11:35 AM, Tim Howard wrote: >> >>> Great point, your fix (quote="") works for the example I gave. >>> Unfortunately, these text strings have commas in them as well(!). >>> Throw a few commas in any of the text strings and it breaks again. >>> Sorry about not including those in the example. >>> >>> So, I need to incorporate commas *and* quotes with the escape character >>> within a single string. >> >> Well you need to have _some_ delimiter. At the moment it sounds as >> though you might end upusing readLines() and strsplit( . , >> split="\\'\\,\\s\\"). >> >> -- >> david. >> >>> >>> Tim >>> >>> >>>>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 2:27 PM >>> >>> >>> On Jan 25, 2013, at 10:42 AM, Tim Howard wrote: >>> >>>> All, >>>> >>>> I have some csv files I am trying to import. I am finding that quotes >>>> inside strings are escaped in a way R doesn't expect for csv files. >>>> The problem only seems to rear its ugly head when there are an uneven >>>> number of internal quotes. I'll try to recreate the problem: >>>> >>>> # set up a matrix, using escape-quote as the internal double quote >>>> mark. >>>> >>>> x <- data.frame(matrix(data=c("1", "string one", "another string", >>>> "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", >>>> "3","third row","last \" col"),ncol = 3, byrow=TRUE)) >>>> >>>>> write.csv(x, "test.csv") >>>> >>>> # NOTE that write.csv correctly created the three internal quotes ' " >>>> ' by using double quotes ' "" '. >>>> # here's what got written >>>> >>>> "","X1","X2","X3" >>>> "1","1","string one","another string" >>>> "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string" >>>> "3","3","third row","last "" col" >>>> >>>> # Importing test.csv works fine. >>>> >>>>> read.csv("test.csv") >>>> X X1 X2 X3 >>>> 1 1 1 string one another string >>>> 2 2 2 quotes escaped 10' 20" 5' 30" "test string final string >>>> 3 3 3 third row last " col >>>> # this looks good. >>>> # now, please go and open "test.csv" with a text editor and replace >>>> all the double quotes '""' with the >>>> # quote escaped ' \" ' as is found in my data set. Like this: >>>> >>>> "","X1","X2","X3" >>>> "1","1","string one","another string" >>>> "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string" >>>> "3","3","third row","last \" col" >>> >>> Use quote="": >>> >>>> read.csv(text='"","X1","X2","X3" >>> + "1","1","string one","another string" >>> + "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final >>> string" >>> + "3","3","third row","last "" col"', sep=",", quote="") >>> >>> Not ...., quote="\"" >>> >>> >>> X.. X.X1. X.X2. >>> X.X3. >>> 1 "1" "1" "string one" "another >>> string" >>> 2 "2" "2" "quotes escaped 10' 20"" 5' 30"" ""test string" "final >>> string" >>> 3 "3" "3" "third row" "last "" >>> col" >>> >>> You will then be depending entirely on commas to separate. >>> >>> (Needed to use escaped single quotes to illustrate from a command >>> line.) >>> >>>> >>>> # this breaks read.csv: >>>> >>>>> read.csv("test.csv") >>>> X X1 >>>> X2 X3 >>>> 1 1 1 >>>> string one another string >>>> 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) >>>> string,final string\n3,3,third row,last \\ col >>>> >>>> # we now have only two rows, with all the data captured in col2 row2 >>>> >>>> Any suggestions on how to fix this behavior? I've tried fiddling with >>>> quote="\"" to no avail, obviously. Interestingly, an even number of >>>> escaped quotes within a field is loaded correctly, which certainly >>>> threw me for a while! >>>> >>>> Thank you in advance, >>>> Tim >>>> >>>> >>> >>> David Winsemius >>> Alameda, CA, USA >>> >> >> David Winsemius >> Alameda, CA, USA >> > > David Winsemius > Alameda, CA, USA > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
Possibly Parallel Threads
- Strange csv parsing problem
- How Can I insert another column data into the CSV file when I use FasterCSV?
- what do you think about write.table(... qmethod = "excel")?
- read.csv and field containing single quotes
- Issue with read.csv treatment of numerics enclosed in quotes (and a confession)