Michael Friendly
2012-Aug-10 16:14 UTC
[R] translating HTML character entities to accented characters
I've imported a .csv file where character strings that contained accented characters were written as HTML character entities. Is there a function that works on a vector to translate them back to accented (latin1) characters? Some examples: > grep("&", author$lname, value=TRUE) [1] "Frère de Montizon" "Lumière" [3] "Lumière" "Niépce" [5] "Süssmilch" "Schüpbach" > grep("&", author$birthplace, value=TRUE) [1] "Marbach, Württemberg" [2] "Côte-d'Or" [3] "Chalon-sur-Saône, Saône-et-Loire" [4] "Groß Särchen, Germany" > apropos("HTML") thx, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
David L Carlson
2012-Aug-10 16:40 UTC
[R] translating HTML character entities to accented characters
It's not quite an R solution, but I just pasted your examples into a script window in R and saved it as chars.html. Then I opened it in Firefox and pasted the results here (with returns inserted to match your original).> grep("&", author$lname, value=TRUE)[1] "Fr?re de Montizon" "Lumi?re" [3] "Lumi?re" "Ni?pce" [5] "S?ssmilch" "Sch?pbach"> grep("&", author$birthplace, value=TRUE)[1] "Marbach, W?rttemberg" [2] "C?te-d'Or" [3] "Chalon-sur-Sa?ne, Sa?ne-et-Loire" [4] "Gro? S?rchen, Germany"> apropos("HTML")For a CSV file you would want to preserve the lines by adding <br> to the end of each line first. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Michael Friendly > Sent: Friday, August 10, 2012 11:15 AM > To: R-help > Subject: [R] translating HTML character entities to accented characters > > I've imported a .csv file where character strings that contained > accented characters were written as HTML > character entities. Is there a function that works on a vector to > translate them back to accented (latin1) characters? > > Some examples: > > > grep("&", author$lname, value=TRUE) > [1] "Frère de Montizon" "Lumière" > [3] "Lumière" "Niépce" > [5] "Süssmilch" "Schüpbach" > > grep("&", author$birthplace, value=TRUE) > [1] "Marbach, Württemberg" > [2] "Côte-d'Or" > [3] "Chalon-sur-Saône, Saône-et-Loire" > [4] "Groß Särchen, Germany" > > apropos("HTML") > > thx, > -Michael > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web: http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.