Mikko Korpela
2012-Jul-04 21:01 UTC
[Rd] Suggestion / patch to support more Unicode characters in R CMD Rd2pdf
Hi list, When using R CMD Rd2pdf, it is possible to set environment variable RD2PDF_INPUTENC to value "inputenx" and enjoy better support for UTF-8 characters (see ?Rd2pdf). This enables LaTeX package "inputenx" instead of "inputenc". Even better support for UTF-8 encoded characters can be had by better using the facilities provided by inputenx and making R CMD Rd2pdf insert a line to its temporary .tex file: "\input{ix-utf8enc.dfu}". The instructions are found in section 1.2 "Unicode" of the inputenx manual: http://mirror.ctan.org/macros/latex/contrib/oberdiek/inputenx.pdf I suggest that R CMD Rd2pdf automatically insert "\input{ix-utf8enc.dfu}" to its temporary .tex file when a combination of inputenx and UTF-8 is detected. The attached small patch does that. A demo package is also attached (tarball built manually, not R CMD build). It uses some UTF-8 characters not supported without the patch: R CMD Rd2pdf gives an error, propagated from LaTeX. With the patch installed, R CMD Rd2pdf works OK when RD2PDF_INPUTENC=inputenx is set. For testing, unpack tarball and run R CMD Rd2pdf on the resulting directory. Tested on R development version r59731 running on Ubuntu 10.10 64 bit. -- Mikko Korpela Aalto University School of Science Department of Information and Computer Science -------------- next part -------------- A non-text attachment was scrubbed... Name: encTest3.tar.gz Type: application/x-gzip Size: 2429 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120705/5e61ce90/attachment.gz> -------------- next part -------------- Index: src/library/tools/R/Rd2pdf.R ==================================================================--- src/library/tools/R/Rd2pdf.R (revision 59731) +++ src/library/tools/R/Rd2pdf.R (working copy) @@ -466,12 +466,17 @@ inputenc <- Sys.getenv("RD2PDF_INPUTENC", "inputenc") ## this needs to be canonical, e.g. 'utf8' ## trailer is for detection if we want to edit it later. + latex_outputEncoding <- latex_canonical_encoding(outputEncoding) setEncoding <- paste("\\usepackage[", - latex_canonical_encoding(outputEncoding), "]{", + latex_outputEncoding, "]{", inputenc, "} % @SET ENCODING@", sep="") useGraphicx <- "% \\usepackage{graphicx} % @USE GRAPHICX@" writeLines(c(setEncoding, + if (inputenc == "inputenx" && + latex_outputEncoding == "utf8") { + "\\input{ix-utf8enc.dfu}" + }, useGraphicx, if (index) "\\makeindex{}", "\\begin{document}"), out) @@ -545,21 +550,28 @@ latexEncodings <- unique(latexEncodings) latexEncodings <- latexEncodings[!is.na(latexEncodings)] cyrillic <- if (nzchar(Sys.getenv("_R_CYRILLIC_TEX_"))) "utf8" %in% latexEncodings else FALSE - latex_outputEncoding <- latex_canonical_encoding(outputEncoding) encs <- latexEncodings[latexEncodings != latex_outputEncoding] if (length(encs) || hasFigures || cyrillic) { lines <- readLines(outfile) + moreUnicode <- inputenc == "inputenx" && "utf8" %in% encs encs <- paste(encs, latex_outputEncoding, collapse=",", sep=",") if (!cyrillic) { - lines[lines == setEncoding] <- + setEncoding2 <- paste0("\\usepackage[", encs, "]{", inputenc, "}") } else { - lines[lines == setEncoding] <- + setEncoding2 <- paste( "\\usepackage[", encs, "]{", inputenc, "} \\IfFileExists{t2aenc.def}{\\usepackage[T2A]{fontenc}}{}", sep = "") } + if (moreUnicode) { + setEncoding2 <- + paste0( +setEncoding2, " +\\input{ix-utf8enc.dfu}") + } + lines[lines == setEncoding] <- setEncoding2 if (hasFigures) lines[lines == useGraphicx] <- "\\usepackage{graphicx}\\setkeys{Gin}{width=0.7\\textwidth}" writeLines(lines, outfile)
Prof Brian Ripley
2012-Jul-11 15:15 UTC
[Rd] Suggestion / patch to support more Unicode characters in R CMD Rd2pdf
On 04/07/2012 22:01, Mikko Korpela wrote:> Hi list, > > When using R CMD Rd2pdf, it is possible to set environment variable > RD2PDF_INPUTENC to value "inputenx" and enjoy better support for UTF-8 > characters (see ?Rd2pdf). This enables LaTeX package "inputenx" instead > of "inputenc". > > Even better support for UTF-8 encoded characters can be had by better > using the facilities provided by inputenx and making R CMD Rd2pdf insert > a line to its temporary .tex file: "\input{ix-utf8enc.dfu}". The > instructions are found in section 1.2 "Unicode" of the inputenx manual: > http://mirror.ctan.org/macros/latex/contrib/oberdiek/inputenx.pdf > > I suggest that R CMD Rd2pdf automatically insert > "\input{ix-utf8enc.dfu}" to its temporary .tex file when a combination > of inputenx and UTF-8 is detected. The attached small patch does that. > > A demo package is also attached (tarball built manually, not R CMD > build). It uses some UTF-8 characters not supported without the patch: R > CMD Rd2pdf gives an error, propagated from LaTeX. With the patch > installed, R CMD Rd2pdf works OK when RD2PDF_INPUTENC=inputenx is set. > For testing, unpack tarball and run R CMD Rd2pdf on the resulting > directory. Tested on R development version r59731 running on Ubuntu > 10.10 64 bit. >Thank you for the suggestion. My concern is that an installation could have inputenx but not ix-utf8enc.dfu. You can check for that at LaTeX level by \IFfileExists, as we already do for t2anenc.def. Could you please modify your patch to do so. And it is easiest if this is filed on bugs.r-project.org as a 'Wishlist' item: see the R FAQ. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595