Dear Duncan, The UTF-8 characters aren't properly rendered in the pdf version of the vignette. $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ???????? The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is encoding = getOption("encoding"), which is "native.enc" on my system. I'll post the question on an RStudio forum as well. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Verzonden: dinsdag 9 december 2014 11:04 Aan: ONKELINX, Thierry; r-devel at r-project.org Onderwerp: Re: [Rd] UTF8 markdown vignette On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote:> Dear all, > > I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below.You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. Duncan Murdoch> Best regards, > > Thierry > > Details: > > Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091. > > The source packages is build using the devtools package. The build > command is R --vanilla CMD build "myPackage" --no-manual > --no-resave-data > > The DESCRIPTION file has > > VignetteBuilder: knitr > Suggests: knitr > Imports: rmarkdown > > The markdown vignette YAML contains > vignette: > > %\VignetteEngine{knitr::rmarkdown} > %\VignetteIndexEntry{The title} > \usepackage[utf8]{inputenc} > > The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag. > > The vignette in tar.gz passes R --vanilla CMD check --timings > --as-cran > > * checking files in 'vignettes' ... OK > * checking for unstated dependencies in vignettes ... OK > * checking package vignettes in 'inst/doc' ... OK > * checking running R code from vignettes ... > 'markdown_intro.Rmd' using 'UTF-8' ... OK OK > * checking re-building of vignette outputs ... [22s] OK > > > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature > and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality > Assurance Kliniekstraat 25 > 1070 Anderlecht > Belgium > + 32 2 525 02 51 > + 32 54 43 61 85 > Thierry.Onkelinx at inbo.be > www.inbo.be > > To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > > Disclaimer Bezoek onze website / Visit our > website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inb > o> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote:> Dear Duncan, > > The UTF-8 characters aren't properly rendered in the pdf version of the vignette. > $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ????????That looks as though the UTF-8 characters are being interpreted as Latin1 characters (or whatever your native encoding is on Windows) when read from the file. It is quite tricky to work with UTF-8 in R in Windows. I think Sweave does it properly, though there may be exceptions. The issue is that many character input routines assume characters start out in the native encoding. (There's also a translation that happens by default on output, but I don't think that's your problem.) So the way to debug this is to follow all of the I/O, and see where the misinterpretation happens. For vignettes, things are complicated, because R reads the file to determine which vignette engine to use, then the vignette engine reads it (perhaps more than once).> The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() isencoding = getOption("encoding"), which is "native.enc" on my system.>It sounds as though the render function needs a way to determine the encoding from the file itself. Recent Sweave versions support the declaration %\VignetteEncoding{utf8} as well as the older \usepackage[utf8]{inputenc} that you used. You might want to try that line as well. (You need to keep the \usepackage line to tell LaTeX what encoding you're using.) Duncan Murdoch> I'll post the question on an RStudio forum as well. > > Best regards, > > Thierry > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > Kliniekstraat 25 > 1070 Anderlecht > Belgium > + 32 2 525 02 51 > + 32 54 43 61 85 > Thierry.Onkelinx at inbo.be > www.inbo.be > > To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > > -----Oorspronkelijk bericht----- > Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] > Verzonden: dinsdag 9 december 2014 11:04 > Aan: ONKELINX, Thierry; r-devel at r-project.org > Onderwerp: Re: [Rd] UTF8 markdown vignette > > On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote: >> Dear all, >> >> I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below. > > You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. > > Duncan Murdoch > >> Best regards, >> >> Thierry >> >> Details: >> >> Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091. >> >> The source packages is build using the devtools package. The build >> command is R --vanilla CMD build "myPackage" --no-manual >> --no-resave-data >> >> The DESCRIPTION file has >> >> VignetteBuilder: knitr >> Suggests: knitr >> Imports: rmarkdown >> >> The markdown vignette YAML contains >> vignette: > >> %\VignetteEngine{knitr::rmarkdown} >> %\VignetteIndexEntry{The title} >> \usepackage[utf8]{inputenc} >> >> The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag. >> >> The vignette in tar.gz passes R --vanilla CMD check --timings >> --as-cran >> >> * checking files in 'vignettes' ... OK >> * checking for unstated dependencies in vignettes ... OK >> * checking package vignettes in 'inst/doc' ... OK >> * checking running R code from vignettes ... >> 'markdown_intro.Rmd' using 'UTF-8' ... OK OK >> * checking re-building of vignette outputs ... [22s] OK >> >> >> >> ir. Thierry Onkelinx >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >> and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality >> Assurance Kliniekstraat 25 >> 1070 Anderlecht >> Belgium >> + 32 2 525 02 51 >> + 32 54 43 61 85 >> Thierry.Onkelinx at inbo.be >> www.inbo.be >> >> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. >> ~ Sir Ronald Aylmer Fisher >> >> The plural of anecdote is not data. >> ~ Roger Brinner >> >> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. >> ~ John Tukey >> >> >> Disclaimer Bezoek onze website / Visit our >> website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inb >> o> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo> >
A few things to clarify: 1. You do not necessarily have to keep the \usepackage{} line if you use %\VignetteEncoding{UTF-8}, because Pandoc will use UTF-8 anyway in its LaTeX template. 2. Perhaps the vignette engine in R has done something clever to convert utf8 to UTF-8, but I'd recommend %\VignetteEncoding{UTF-8} instead of %\VignetteEncoding{utf8} to make sure it is a valid encoding name, e.g.> 'utf8' %in% iconvlist()[1] FALSE> 'UTF-8' %in% iconvlist()[1] TRUE> 'UTF8' %in% iconvlist()[1] TRUE BTW, %\VignetteEncoding is not documented anywhere (Cc'ing Kurt), and I think it needs to be documented, since the old approach \usepackage[enc]{inputenc} was basically a hack, which looks really odd in non-LaTeX vignettes (e.g. HTML vignettes). 3. The default `encoding` argument of rmarkdown::render() is not relevant here, even if its value is native.enc. When R build a vignette, it tries to detect its encoding and pass it to the vignette engine, so the default argument value may not be native.enc. Lastly, the most important piece of information is missing in this post: library(rmarkdown); sessionInfo(). There is not a minimal reproducible example, either. Without these information, I can only guess blindly. BTW, you may also try HTML vignettes instead, which is much much easier to get right than LaTeX in terms of character encodings. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Web: http://yihui.name On Tue, Dec 9, 2014 at 7:05 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 09/12/2014, 5:19 AM, ONKELINX, Thierry wrote: >> Dear Duncan, >> >> The UTF-8 characters aren't properly rendered in the pdf version of the vignette. >> $?? ????? ?????? ?????? ????? ?? ?? is rendered as $????? ?????????? ???????????? ????? ?????? ? ???????? ???????? > > That looks as though the UTF-8 characters are being interpreted as > Latin1 characters (or whatever your native encoding is on Windows) when > read from the file. > > It is quite tricky to work with UTF-8 in R in Windows. I think Sweave > does it properly, though there may be exceptions. The issue is that > many character input routines assume characters start out in the native > encoding. (There's also a translation that happens by default on > output, but I don't think that's your problem.) So the way to debug > this is to follow all of the I/O, and see where the misinterpretation > happens. For vignettes, things are complicated, because R reads the > file to determine which vignette engine to use, then the vignette engine > reads it (perhaps more than once). > > >> The same problem occurs when I use render("vignette.md", output_format = "mypackage::mystyle"), instead of render("vignette.md", output_format = "mypackage::mystyle", encoding = "UTF-8"). The default value of the encoding argument of rmarkdown::render() is > encoding = getOption("encoding"), which is "native.enc" on my system. >> > > It sounds as though the render function needs a way to determine the > encoding from the file itself. Recent Sweave versions support the > declaration > > %\VignetteEncoding{utf8} > > as well as the older > > \usepackage[utf8]{inputenc} > > that you used. You might want to try that line as well. (You need to > keep the \usepackage line to tell LaTeX what encoding you're using.) > > Duncan Murdoch > > >> I'll post the question on an RStudio forum as well. >> >> Best regards, >> >> Thierry >> >> >> -----Oorspronkelijk bericht----- >> Van: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] >> Verzonden: dinsdag 9 december 2014 11:04 >> Aan: ONKELINX, Thierry; r-devel at r-project.org >> Onderwerp: Re: [Rd] UTF8 markdown vignette >> >> On 09/12/2014, 4:48 AM, ONKELINX, Thierry wrote: >>> Dear all, >>> >>> I'm trying to use a Markdown vignette with UTF-8 encoding. It compiles well when knitting the vignette in RStudio, but it fails to recognize the UTF-8 settings when building the source package. Can someone point out what I'm doing wrong? I tried to put the relevant information below. >> >> You don't describe the symptoms of "failing to recognize", but from the look of it, this is a problem with the knitr::rmarkdown engine or with the devtools packaging, so you should probably ask on an RStudio forum. >> >> Duncan Murdoch >> >>> Best regards, >>> >>> Thierry >>> >>> Details: >>> >>> Using 64-bit R 3.1.2 with encoding = "native.enc" on Windows 7 with Rstudio 0.97.1091. >>> >>> The source packages is build using the devtools package. The build >>> command is R --vanilla CMD build "myPackage" --no-manual >>> --no-resave-data >>> >>> The DESCRIPTION file has >>> >>> VignetteBuilder: knitr >>> Suggests: knitr >>> Imports: rmarkdown >>> >>> The markdown vignette YAML contains >>> vignette: > >>> %\VignetteEngine{knitr::rmarkdown} >>> %\VignetteIndexEntry{The title} >>> \usepackage[utf8]{inputenc} >>> >>> The custom output style converts the markdown to beamer with the --latex-engine = xelatex flag. >>> >>> The vignette in tar.gz passes R --vanilla CMD check --timings >>> --as-cran >>> >>> * checking files in 'vignettes' ... OK >>> * checking for unstated dependencies in vignettes ... OK >>> * checking package vignettes in 'inst/doc' ... OK >>> * checking running R code from vignettes ... >>> 'markdown_intro.Rmd' using 'UTF-8' ... OK OK >>> * checking re-building of vignette outputs ... [22s] OK