Martin Becker
2009-Oct-13 11:46 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Dear developers,
I have come across a (somewhat strange) change in the encoding of Sweave
output from R-2.9.2pat to R-2.10.0beta (apparently specific to Rgui) on
Windows installations. Of course, the NEWS file contains quite a few
changes concerning encoding, but I was not able to locate an entry which
explains the observed behaviour. I am not very familiar with
encodings/locales/codepages, but I will try to explain my observations
as best I can.
In R-2.9.2pat, when invoking R via Rgui --vanilla (output of
seesionInfo() below), the output of Sweave for .rnw files containing
german umlaute (latin1-encoded) is again latin1-encoded (the resulting
.tex-file compiles with \usepackage[latin1]{inputenc} and
\usepackage[german]{babel}).
In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output of
seesionInfo() below), some of Sweave's output (more precisely, Soutput
environments containing german umlaute, Sinput environments with german
umlaute are still latin1) is utf-8 encoded (with some extra characters
at the start and the end, which could be BOMs). Surprisingly, when R is
invoked from (Windows) command line (R --vanilla or Rterm --vanilla),
the encoding is completely latin1 again (as in R-2.9.2pat). So, the
change to utf-8 encoding for parts of Sweave's output seems to be
specific to Rgui.
Of course, I can work around this problem by using Rterm instead of Rgui
when Sweav'ing, but I am not sure if the current behaviour of R via Rgui
is as intended.
I will try to attach the .rnw - file as well as the resulting .tex -
files (and hope, that the attachements pass through).
Best wishes,
Martin
sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
R version 2.9.2 Patched (2009-09-24 r50041)
i386-pc-mingw32
locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
R version 2.10.0 beta (2009-10-11 r50037)
i386-pc-mingw32
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweavetest.rnw
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweavetest-R-2.10.0beta-Rterm.tex
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0001.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweavetest-R-2.9.2pat-Rterm.tex
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0002.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sweavetest-R-2.9.2pat-Rgui.tex
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20091013/8d0f0bae/attachment-0003.pl>
Martin Becker
2009-Oct-13 11:57 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Ok,
apparently the most interesting attachement did not pass the filters
(probably because of the utf-8 encoding...).
Maybe it is still helpful to copy&paste the Sweave output (with uft-8
part) for R-2.10.0beta invoked via Rgui --vanilla:
\documentclass{article}
\usepackage{c:/Programme/R/R-2.10.0beta/share/texmf/Sweave}
\usepackage[latin1]{inputenc}
%\usepackage[utf8x]{inputenc}
\usepackage[german]{babel}
\begin{document}
\begin{Schunk}
\begin{Sinput}
> Umlaute <- c("?", "?", "?",
"?")
> Umlaute
\end{Sinput}
\begin{Soutput}
??[1] "??" "??" "??" "??"??
\end{Soutput}
\end{Schunk}
\end{document}
Best wishes,
Martin
Martin Becker wrote:> Dear developers,
>
> I have come across a (somewhat strange) change in the encoding of
> Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to
> Rgui) on Windows installations. Of course, the NEWS file contains
> quite a few changes concerning encoding, but I was not able to locate
> an entry which explains the observed behaviour. I am not very familiar
> with encodings/locales/codepages, but I will try to explain my
> observations as best I can.
>
> In R-2.9.2pat, when invoking R via Rgui --vanilla (output of
> seesionInfo() below), the output of Sweave for .rnw files containing
> german umlaute (latin1-encoded) is again latin1-encoded (the resulting
> .tex-file compiles with \usepackage[latin1]{inputenc} and
> \usepackage[german]{babel}).
> In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output
> of seesionInfo() below), some of Sweave's output (more precisely,
> Soutput environments containing german umlaute, Sinput environments
> with german umlaute are still latin1) is utf-8 encoded (with some
> extra characters at the start and the end, which could be BOMs).
> Surprisingly, when R is invoked from (Windows) command line (R
> --vanilla or Rterm --vanilla), the encoding is completely latin1 again
> (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of
> Sweave's output seems to be specific to Rgui.
>
> Of course, I can work around this problem by using Rterm instead of
> Rgui when Sweav'ing, but I am not sure if the current behaviour of R
> via Rgui is as intended.
> I will try to attach the .rnw - file as well as the resulting .tex -
> files (and hope, that the attachements pass through).
>
> Best wishes,
>
> Martin
>
>
>
> sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
> R version 2.9.2 Patched (2009-09-24 r50041)
> i386-pc-mingw32
>
> locale:
>
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
> R version 2.10.0 beta (2009-10-11 r50037)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3]
> LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5]
> LC_TIME=German_Germany.1252
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany
Martin Becker
2009-Oct-19 12:09 UTC
[Rd] Sweave output encoding in R-2.10.0beta on Windows (Rgui <-> Rterm)
Dear developers, I am not really sure what causes the difference in the encoding of Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in R-2.10.0beta (now R-2.10.0rc), but I suppose that the different behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning regular expressions (RweaveLatexRuncode uses sub() in some places) as documented in NEWS. AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and a (conditional) back-conversion after the sub()-commands seems to resolve the encoding problems (as well as the different behaviour of Rgui and Rterm in R-2.10.0rc). It would be great if someone more involved in Sweave could take a look at (and maybe commit) the attached (untested!) patch (to r50160). Many thanks in advance! Best wishes, Martin Martin Becker wrote:> Dear developers, > > I have come across a (somewhat strange) change in the encoding of > Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to > Rgui) on Windows installations. Of course, the NEWS file contains > quite a few changes concerning encoding, but I was not able to locate > an entry which explains the observed behaviour. I am not very familiar > with encodings/locales/codepages, but I will try to explain my > observations as best I can. > > In R-2.9.2pat, when invoking R via Rgui --vanilla (output of > seesionInfo() below), the output of Sweave for .rnw files containing > german umlaute (latin1-encoded) is again latin1-encoded (the resulting > .tex-file compiles with \usepackage[latin1]{inputenc} and > \usepackage[german]{babel}). > In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output > of seesionInfo() below), some of Sweave's output (more precisely, > Soutput environments containing german umlaute, Sinput environments > with german umlaute are still latin1) is utf-8 encoded (with some > extra characters at the start and the end, which could be BOMs). > Surprisingly, when R is invoked from (Windows) command line (R > --vanilla or Rterm --vanilla), the encoding is completely latin1 again > (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of > Sweave's output seems to be specific to Rgui. > > Of course, I can work around this problem by using Rterm instead of > Rgui when Sweav'ing, but I am not sure if the current behaviour of R > via Rgui is as intended. > I will try to attach the .rnw - file as well as the resulting .tex - > files (and hope, that the attachements pass through). > > Best wishes, > > Martin > > > > sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm): > R version 2.9.2 Patched (2009-09-24 r50041) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm): > R version 2.10.0 beta (2009-10-11 r50037) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] > LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] > LC_TIME=German_Germany.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > ------------------------------------------------------------------------ > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 206 66123 Saarbruecken Germany -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sweave-patch.txt URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091019/6b93e60c/attachment.txt>
Maybe Matching Threads
- problem reading from serial connection since 2.10.0
- R crash with intToUtf8 on huge vectors (PR#14068)
- Methods not loaded in R-Devel vs 2.8.1
- Confusing error message for [[.factor (PR#14209)
- all.names() and all.vars(): sorting order of functions' return vector