Pedro Emmanuel Alvarenga Americano do Brasil
2010-Sep-17 19:27 UTC
[R] odfWeave UTF-8 error and latin characters
Hello R masters, I have sent this same message to other lists and none so far could give some light. I was trying to use odfWeave to generate a report from R and Im getting an error that I think is related to latin characters. I looked around and did find some stuff related to this problem about Sweave labmoluscos.wordpress.com/2010/02/18/sweave-latex-character-encoding but did not find a way to fix it so far for odfWeave. Perhaps some one could give me some light on how to workaround it. I think my problem is that I have a table with characters such as 'ç', 'ó' and 'ã' that odfWeave is not recognizing properly. The error follows below. Just to make it clear: Windows vista (default language - Brazilian Portuguese), R 2.11.1, odfWeave 0.7.11, OpenOffice 3.0.1 in my odt file ... <<tabela2, echo = FALSE, results = xml>> odfTable(tabela2,useRowNames=T,name ='Tabela 2') @ in R console ...>library(odfWeave) >imageDefs <- getImageDefs() >imageDefs$type <- 'bmp' >imageDefs$device <- 'bmp'>setImageDefs(imageDefs)>options(SweaveSyntax="SweaveSyntaxNoweb") >odfWeave('teste.odt','figura1.odt')Copying teste.odt Setting wd to C:\Users\PEDROE~1\AppData\Local\Temp\Rtmpfv32oJ/odfWeave0215405313 Unzipping ODF file using unzip -o "teste.odt" Archive: teste.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml Removing teste.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 1 : term verbatim(label=fluxograma) Loading required package: shape Loading required package: shape 2 : term xml(label=tabela2) 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents Input is not proper UTF-8, indicate encoding ! Bytes: 0xE2 0x6E 0x63 0x69 Erro: 1: Input is not proper UTF-8, indicate encoding ! Bytes: 0xE2 0x6E 0x63 0x69>> tabela2[1:5,] # a piece of table 2Concordância observada Kappa p valor Sexo: 1.0000 1.0000 0e+00 Referenciamento para diagnóstico: 0.6863 0.5081 4e-03 Reteste na doação de sangue: 0.9379 0.7874 0e+00 Resultado do reteste da doação: 0.9317 0.6607 2e-04 Indicação médica para investigação: 0.6957 0.5556 1e-04 Considering some sugestions form other lists I tryed to encode the table using enc2utf8 and descr::toUTF8 such as <<tabela2, echo = FALSE, results = xml>> odfTable(enc2utf8(tabela2),useRowNames=T,name ='Tabela 2') @ OR <<tabela2, echo = FALSE, results = xml>> enc2utf8(odfTable(tabela2,useRowNames=T,name ='Tabela 2')) @ OR <<tabela2, echo = FALSE, results = xml>> toUTF8(odfTable(tabela2,useRowNames=T,name ='Tabela 2')) @ But all of them gave the same error. However, if I set the table without the rownames such as: <<tabela2, echo = FALSE, results = xml>> odfTable(tabela2,useRowNames=F,name ='Tabela 2') @ It works fine... but the rownames are not there. I tryed to bind the rownames as column but the error comes back. After a couple days banging my head around Im about to appeal to old friend "copy and paste". Any sugestion is most welcome. Kind regards to all and thanks in advance, Abraço forte e que a força esteja com você, Dr. Pedro Emmanuel A. A. do Brasil Instituto de Pesquisa Clínica Evandro Chagas Fundação Oswaldo Cruz Rio de Janeiro - Brasil Av. Brasil 4365 Tel 55 21 3865-9648 email: pedro.brasil@ipec.fiocruz.br email: emmanuel.brasil@gmail.com ---Apoio aos softwares livres zotero.org - gerenciamento de referências bibliográficas. broffice.org ou openoffice.org - textos, planilhas ou apresentações. epidata.dk - entrada de dados. r-project.org - análise de dados. ubuntu.com - sistema operacional [[alternative HTML version deleted]]
Hello, I am using R and Libreoffice on Ubuntu 11.10 (64-bit) and have been experiencing similar problems with character encoding (Swedish utf8) in odfWeave. Here is an example of what it looks like: Should be: "H?r ?rland d?ligt?" Appears as: "H??r ??rland d??ligt?" I found a (pretty clumsy) solution which I post below. Has anyone been able to solve this in a more elegant way? Setup:> sessionInfo()R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=sv_SE.UTF-8 LC_NUMERIC=C other attached packages: [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-30 Problem: I have some R syntax for tables in the file "in.odt": <<vl5, echo=FALSE, results=xml>>irre <- xtabs(~Species, data=iris) irre <- data.frame(irre) colnames(irre) <- c("v?xt", "antal") row.names(irre) <- c("?", "?", "?") odfTable(irre) odfTableCaption("Tabell ???") @ Running odfWeave on this with odfWeave("in.odt", "out.odt") yields lots of output, ending with this Warning message: ?content.Rnw? has unknown encoding: assuming Latin-1. On opening the output file (odt.out), Swedish characters appear jumbled. I had a look at the content.Rnw file, which was correctly coded with utf-8. The same was true for the content.xml file in the odt source (this had to be unzipped). I then tried downgrading to XML 3.2, as suggested elsewhere. This didn't help. I then looked for tools for converting an odt file from one kind of encoding to another, again to no avail. Solution: Save the odt file in flat xml format (Libreoffice > save as > second last option). Convert the resulting .fodt file FROM utf-8 TO latin 1 (aka ISO_8859-1) with iconv from a bash terminal: iconv -t ISO_8859-1 -f UTF-8 -o converted.fodt out.fodt This produces a correctly encoded file! -- View this message in context: r.789695.n4.nabble.com/odfWeave-UTF-8-error-and-latin-characters-tp2544333p4285335.html Sent from the R help mailing list archive at Nabble.com.