I need to read in csv files, created by 3rd party, with fields containing single quotes (as shown below). "header1","header2","header3","header4" "field1r1","field2r1","field3r1","field4r1" "field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2" "field1r3","field2r3","field3r3","field4r3" read.csv(filename, quote="\"'", header=TRUE) won't read the file represented above, unless the 3rd line has Very"" (double quotes) instead of Very" (single quotes)... and this is documented (scan() man page). Assuming that the creation of such csv files is something I'm not in a position to interfere with, are there (preferably, "all in R") suggestions on how to handle such task? For the moment, I'm using my poor man's solution (below), but any tricks that would simplify this task would be great. Thank you very much, benilton parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){ txt <- readLines(fname) txt <- gsub("^\"|\"$", "", txt) txt <- strsplit(txt, "\",\"") txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x))) if (header){ nms <- txt[1,] txt <- txt[-1,] } txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors) if (header) names(txt) <- nms txt }
Henrique Dallazuanna
2012-Mar-27 01:35 UTC
[R] read.csv and field containing single quotes
Benilton, Try this: read.table(textConnection(gsub('","', "','", gsub('^\"|\"$', "'", readLines('../teste.csv')))), sep = ',', quote = "'", header = TRUE) On Mon, Mar 26, 2012 at 8:09 PM, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:> I need to read in csv files, created by 3rd party, with fields > containing single quotes (as shown below). > > "header1","header2","header3","header4" > "field1r1","field2r1","field3r1","field4r1" > "field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2" > "field1r3","field2r3","field3r3","field4r3" > > > read.csv(filename, quote="\"'", header=TRUE) won't read the file > represented above, unless the 3rd line has Very"" ?(double quotes) > instead of Very" (single quotes)... and this is documented (scan() man > page). > > Assuming that the creation of such csv files is something I'm not in a > position to interfere with, are there (preferably, "all in R") > suggestions on how to handle such task? > > For the moment, I'm using my poor man's solution (below), but any > tricks that would simplify this task would be great. > > Thank you very much, > > benilton > > > parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){ > ? ?txt <- readLines(fname) > ? ?txt <- gsub("^\"|\"$", "", txt) > ? ?txt <- strsplit(txt, "\",\"") > ? ?txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x))) > ? ?if (header){ > ? ? ? ?nms <- txt[1,] > ? ? ? ?txt <- txt[-1,] > ? ?} > ? ?txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors) > ? ?if (header) names(txt) <- nms > ? ?txt > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
On 27/03/12 01:09, Benilton Carvalho wrote:> I need to read in csv files, created by 3rd party, with fields > containing single quotes (as shown below). > > "header1","header2","header3","header4" > "field1r1","field2r1","field3r1","field4r1" > "field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2" > "field1r3","field2r3","field3r3","field4r3"You could try under your OS, to 1) replace ", with ', (assuming that the csv does not contain any' 2) read into R with sep="\'" If the file is huge, some in OS solution would be the best. Cheers, Rainer> > > read.csv(filename, quote="\"'", header=TRUE) won't read the file > represented above, unless the 3rd line has Very"" (double quotes) > instead of Very" (single quotes)... and this is documented (scan() man > page). > > Assuming that the creation of such csv files is something I'm not in a > position to interfere with, are there (preferably, "all in R") > suggestions on how to handle such task? > > For the moment, I'm using my poor man's solution (below), but any > tricks that would simplify this task would be great. > > Thank you very much, > > benilton > > > parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){ > txt <- readLines(fname) > txt <- gsub("^\"|\"$", "", txt) > txt <- strsplit(txt, "\",\"") > txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x))) > if (header){ > nms <- txt[1,] > txt <- txt[-1,] > } > txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors) > if (header) names(txt) <- nms > txt > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug