Martin Becker
2009-Oct-13 11:46 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Dear developers, I have come across a (somewhat strange) change in the encoding of Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to Rgui) on Windows installations. Of course, the NEWS file contains quite a few changes concerning encoding, but I was not able to locate an entry which explains the observed behaviour. I am not very familiar with encodings/locales/codepages, but I will try to explain my observations as best I can. In R-2.9.2pat, when invoking R via Rgui --vanilla (output of seesionInfo() below), the output of Sweave for .rnw files containing german umlaute (latin1-encoded) is again latin1-encoded (the resulting .tex-file compiles with \usepackage[latin1]{inputenc} and \usepackage[german]{babel}). In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output of seesionInfo() below), some of Sweave's output (more precisely, Soutput environments containing german umlaute, Sinput environments with german umlaute are still latin1) is utf-8 encoded (with some extra characters at the start and the end, which could be BOMs). Surprisingly, when R is invoked from (Windows) command line (R --vanilla or Rterm --vanilla), the encoding is completely latin1 again (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of Sweave's output seems to be specific to Rgui. Of course, I can work around this problem by using Rterm instead of Rgui when Sweav'ing, but I am not sure if the current behaviour of R via Rgui is as intended. I will try to attach the .rnw - file as well as the resulting .tex - files (and hope, that the attachements pass through). Best wishes, Martin sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm): R version 2.9.2 Patched (2009-09-24 r50041) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm): R version 2.10.0 beta (2009-10-11 r50037) i386-pc-mingw32 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 206 66123 Saarbruecken Germany -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweavetest.rnw URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment.pl> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweavetest-R-2.10.0beta-Rterm.tex URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0001.pl> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweavetest-R-2.9.2pat-Rterm.tex URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0002.pl> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweavetest-R-2.9.2pat-Rgui.tex URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0003.pl>
Martin Becker
2009-Oct-13 11:57 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Ok, apparently the most interesting attachement did not pass the filters (probably because of the utf-8 encoding...). Maybe it is still helpful to copy&paste the Sweave output (with uft-8 part) for R-2.10.0beta invoked via Rgui --vanilla: \documentclass{article} \usepackage{c:/Programme/R/R-2.10.0beta/share/texmf/Sweave} \usepackage[latin1]{inputenc} %\usepackage[utf8x]{inputenc} \usepackage[german]{babel} \begin{document} \begin{Schunk} \begin{Sinput} > Umlaute <- c("?", "?", "?", "?") > Umlaute \end{Sinput} \begin{Soutput} ??[1] "??" "??" "??" "??"?? \end{Soutput} \end{Schunk} \end{document} Best wishes, Martin Martin Becker wrote:> Dear developers, > > I have come across a (somewhat strange) change in the encoding of > Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to > Rgui) on Windows installations. Of course, the NEWS file contains > quite a few changes concerning encoding, but I was not able to locate > an entry which explains the observed behaviour. I am not very familiar > with encodings/locales/codepages, but I will try to explain my > observations as best I can. > > In R-2.9.2pat, when invoking R via Rgui --vanilla (output of > seesionInfo() below), the output of Sweave for .rnw files containing > german umlaute (latin1-encoded) is again latin1-encoded (the resulting > .tex-file compiles with \usepackage[latin1]{inputenc} and > \usepackage[german]{babel}). > In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output > of seesionInfo() below), some of Sweave's output (more precisely, > Soutput environments containing german umlaute, Sinput environments > with german umlaute are still latin1) is utf-8 encoded (with some > extra characters at the start and the end, which could be BOMs). > Surprisingly, when R is invoked from (Windows) command line (R > --vanilla or Rterm --vanilla), the encoding is completely latin1 again > (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of > Sweave's output seems to be specific to Rgui. > > Of course, I can work around this problem by using Rterm instead of > Rgui when Sweav'ing, but I am not sure if the current behaviour of R > via Rgui is as intended. > I will try to attach the .rnw - file as well as the resulting .tex - > files (and hope, that the attachements pass through). > > Best wishes, > > Martin > > > > sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm): > R version 2.9.2 Patched (2009-09-24 r50041) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm): > R version 2.10.0 beta (2009-10-11 r50037) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] > LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] > LC_TIME=German_Germany.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > ------------------------------------------------------------------------ > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 206 66123 Saarbruecken Germany
Martin Becker
2009-Oct-19 12:09 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Dear developers, I am not really sure what causes the difference in the encoding of Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in R-2.10.0beta (now R-2.10.0rc), but I suppose that the different behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning regular expressions (RweaveLatexRuncode uses sub() in some places) as documented in NEWS. AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and a (conditional) back-conversion after the sub()-commands seems to resolve the encoding problems (as well as the different behaviour of Rgui and Rterm in R-2.10.0rc). It would be great if someone more involved in Sweave could take a look at (and maybe commit) the attached (untested!) patch (to r50160). Many thanks in advance! Best wishes, Martin Martin Becker wrote:> Dear developers, > > I have come across a (somewhat strange) change in the encoding of > Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to > Rgui) on Windows installations. Of course, the NEWS file contains > quite a few changes concerning encoding, but I was not able to locate > an entry which explains the observed behaviour. I am not very familiar > with encodings/locales/codepages, but I will try to explain my > observations as best I can. > > In R-2.9.2pat, when invoking R via Rgui --vanilla (output of > seesionInfo() below), the output of Sweave for .rnw files containing > german umlaute (latin1-encoded) is again latin1-encoded (the resulting > .tex-file compiles with \usepackage[latin1]{inputenc} and > \usepackage[german]{babel}). > In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output > of seesionInfo() below), some of Sweave's output (more precisely, > Soutput environments containing german umlaute, Sinput environments > with german umlaute are still latin1) is utf-8 encoded (with some > extra characters at the start and the end, which could be BOMs). > Surprisingly, when R is invoked from (Windows) command line (R > --vanilla or Rterm --vanilla), the encoding is completely latin1 again > (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of > Sweave's output seems to be specific to Rgui. > > Of course, I can work around this problem by using Rterm instead of > Rgui when Sweav'ing, but I am not sure if the current behaviour of R > via Rgui is as intended. > I will try to attach the .rnw - file as well as the resulting .tex - > files (and hope, that the attachements pass through). > > Best wishes, > > Martin > > > > sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm): > R version 2.9.2 Patched (2009-09-24 r50041) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm): > R version 2.10.0 beta (2009-10-11 r50037) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] > LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] > LC_TIME=German_Germany.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > ------------------------------------------------------------------------ > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 206 66123 Saarbruecken Germany -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweave-patch.txt URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091019/6b93e60c/attachment.txt>
Apparently Analagous Threads
- problem reading from serial connection since 2.10.0
- R crash with intToUtf8 on huge vectors (PR#14068)
- Methods not loaded in R-Devel vs 2.8.1
- Confusing error message for [[.factor (PR#14209)
- all.names() and all.vars(): sorting order of functions' return vector