On 09/12/2014, 10:36 PM, Yihui Xie wrote:> I took a look at the R source and I realized that the encoding was > actually never passed to the vignette engine: > https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 > Apparently only the file and quiet arguments are passed to the > vignette engine. Did I miss anything?I think it's actually a little messier than that: sometimes the encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD check), but not always. Here's what I think should happen instead: When building a vignette in a package, R knows the encoding declared for the package, so it should assume this as the default for the vignette. If nothing is declared, it should assume "native.enc", i.e. whatever is the native encoding on the machine it's running on. For each vignette, at the same time as it determines the vignette engine, it should see whether there is a declared encoding within the vignette. When it calls the engine, it should pass an encoding (and it should be a legal one, e.g. UTF-8, not utf8). Unless I notice something missing when I do this, or someone else tells me something that's missing, I'll try to make the changes above in R-devel and R-patched sometime before 3.1.3 is released. In the meantime, unless declaring a dependence on R >= 3.1.3, vignette engines should determine the encoding themselves whenever they are called without an "encoding" argument. Duncan Murdoch> > To Thierry: I explicitly asked for library(rmarkdown);sessionInfo(), > but you only told me the version of rmarkdown, which is not the only > thing I was asking for. It is extremely important in this case to know > the versions of other packages as well as your system locale > information. > > Regards, > Yihui > -- > Yihui Xie <xieyihui at gmail.com> > Web: http://yihui.name > > > On Tue, Dec 9, 2014 at 3:42 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> On 09/12/2014, 4:38 PM, ONKELINX, Thierry wrote: >>> Dear Yihui, >>> >>> I have created a reproducible example at https://github.com/ThierryO/utf8vignette >>> >>> The \usepackage{} line is needed, otherwise R CMD check --as-cran will give a warning. >>> %\VignetteEncoding{UTF-8} did not solve the problem. >> >> I've just taken a look at the sources, and that's only in R-devel, it >> never got backported to R-patched so it isn't in the release R. You >> would need to use >> >> %\SweaveUTF8 >> >> instead. (It was introduced in 3.1.0, and should be kept until at least >> 3.2.0, but \VignetteEncoding will be preferred in the long run. It >> should make it into 3.1.3 unless we drop the ball again.) >> >> Duncan Murdoch >> >>> >>> I use rmarkdown_0.3.11 >>> >>> HTML vignette is not an option as the vignette demonstrates the use of a custom beamer output format. >>> >>> Best regards, >>> >>> Thierry >>> ________________________________________ >>> Van: xieyihui at gmail.com [xieyihui at gmail.com] namens Yihui Xie [xie at yihui.name] >>> Verzonden: dinsdag 9 december 2014 17:13 >>> Aan: ONKELINX, Thierry >>> CC: r-devel at r-project.org; Duncan Murdoch; Kurt Hornik >>> Onderwerp: Re: [Rd] UTF8 markdown vignette >>> >>> >>> Lastly, the most important piece of information is missing in this >>> post: library(rmarkdown); sessionInfo(). There is not a minimal >>> reproducible example, either. Without these information, I can only >>> guess blindly.
For the record, I saw a change had been made in R-devel: https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan) Meanwhile, I also made a change in knitr to assume UTF-8 unless R passes an encoding to the vignette engine: https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the original problem, but apparently the former one is the ideal fix. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 09/12/2014, 10:36 PM, Yihui Xie wrote: >> I took a look at the R source and I realized that the encoding was >> actually never passed to the vignette engine: >> https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 >> Apparently only the file and quiet arguments are passed to the >> vignette engine. Did I miss anything? > > I think it's actually a little messier than that: sometimes the > encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD > check), but not always. Here's what I think should happen instead: > > When building a vignette in a package, R knows the encoding declared for > the package, so it should assume this as the default for the vignette. > If nothing is declared, it should assume "native.enc", i.e. whatever is > the native encoding on the machine it's running on. > > For each vignette, at the same time as it determines the vignette > engine, it should see whether there is a declared encoding within the > vignette. > > When it calls the engine, it should pass an encoding (and it should be a > legal one, e.g. UTF-8, not utf8). > > Unless I notice something missing when I do this, or someone else tells > me something that's missing, I'll try to make the changes above in > R-devel and R-patched sometime before 3.1.3 is released. > > In the meantime, unless declaring a dependence on R >= 3.1.3, vignette > engines should determine the encoding themselves whenever they are > called without an "encoding" argument. > > Duncan Murdoch
On 18/12/2014, 12:17 AM, Yihui Xie wrote:> For the record, I saw a change had been made in R-devel: > https://github.com/wch/r-source/commit/d53b098 (Thanks, Duncan) > Meanwhile, I also made a change in knitr to assume UTF-8 unless R > passes an encoding to the vignette engine: > https://github.com/yihui/knitr/commit/23c6c8e2 Both will solve the > original problem, but apparently the former one is the ideal fix.The Windows builds of R-devel were stalled for a few days, but I've given them a kick now, so this should appear in the Windows binaries on CRAN soon. Duncan Murdoch> > Regards, > Yihui > -- > Yihui Xie <xieyihui at gmail.com> > Web: http://yihui.name > > > On Wed, Dec 10, 2014 at 6:19 AM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> On 09/12/2014, 10:36 PM, Yihui Xie wrote: >>> I took a look at the R source and I realized that the encoding was >>> actually never passed to the vignette engine: >>> https://github.com/wch/r-source/blob/e721ef5f4/src/library/tools/R/Vignettes.R#L507 >>> Apparently only the file and quiet arguments are passed to the >>> vignette engine. Did I miss anything? >> >> I think it's actually a little messier than that: sometimes the >> encoding is passed (e.g. by tools:::.run_one_vignette, used in R CMD >> check), but not always. Here's what I think should happen instead: >> >> When building a vignette in a package, R knows the encoding declared for >> the package, so it should assume this as the default for the vignette. >> If nothing is declared, it should assume "native.enc", i.e. whatever is >> the native encoding on the machine it's running on. >> >> For each vignette, at the same time as it determines the vignette >> engine, it should see whether there is a declared encoding within the >> vignette. >> >> When it calls the engine, it should pass an encoding (and it should be a >> legal one, e.g. UTF-8, not utf8). >> >> Unless I notice something missing when I do this, or someone else tells >> me something that's missing, I'll try to make the changes above in >> R-devel and R-patched sometime before 3.1.3 is released. >> >> In the meantime, unless declaring a dependence on R >= 3.1.3, vignette >> engines should determine the encoding themselves whenever they are >> called without an "encoding" argument. >> >> Duncan Murdoch