Marc Girondot
2021-Sep-16 16:05 UTC
[R] Good practice for database with utf-8 string in package
Hello everyone, I am a little bit stucked on the problem to include a database with utf-8 string in a package. When I submit it to CRAN, it reports NOTES for several Unix system and I try to find a solution (if it exists) to not have these NOTES. The database has references and some names have non ASCII characters. * First I don't agree at all with the solution proposed here: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues "First, consider carefully if you really need non-ASCIItext." If a language has non ASCII characters, it is not just to make the writting nicer of more complex, it is because it changes the prononciation. * Then I try to find solution to not have these NOTES. For example, here is a reference with utf-8 characters> DatabaseTSD$Reference[211][1] Hern?ndez-Montoya, V., P?ez, V.P. & Ceballos, C.P. (2017) Effects of temperature on sex determination and embryonic development in the red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and Biology 16, 164-171. When I convert the characters into unicode, I get indeed only ASCII characters. Perfect.> iconv(DatabaseTSD$Reference[211], "UTF-8", "ASCII", "Unicode")[1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P. (2017) Effects of temperature on sex determination and embryonic development in the red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and Biology 16, 164-171." Then I have no NOTES when I checked the package with database in UNIX... but how can I print the reference back with original characters ? Thanks a lot to point me to best practices to include databases with non-ASCII characters and not have NOTES while submitted package to CRAN. Marc [[alternative HTML version deleted]]
Bert Gunter
2021-Sep-16 16:17 UTC
[R] Good practice for database with utf-8 string in package
This should not be posted here. Post on the R-package-devel list instead. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 16, 2021 at 9:13 AM Marc Girondot via R-help <r-help at r-project.org> wrote:> > Hello everyone, > > I am a little bit stucked on the problem to include a database with > utf-8 string in a package. When I submit it to CRAN, it reports NOTES > for several Unix system and I try to find a solution (if it exists) to > not have these NOTES. > > The database has references and some names have non ASCII characters. > > * First I don't agree at all with the solution proposed here: > > https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues > > "First, consider carefully if you really need non-ASCIItext." > > If a language has non ASCII characters, it is not just to make the > writting nicer of more complex, it is because it changes the prononciation. > > * Then I try to find solution to not have these NOTES. > > For example, here is a reference with utf-8 characters > > > DatabaseTSD$Reference[211] > [1] Hern?ndez-Montoya, V., P?ez, V.P. & Ceballos, C.P. (2017) Effects of > temperature on sex determination and embryonic development in the > red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and > Biology 16, 164-171. > > When I convert the characters into unicode, I get indeed only ASCII > characters. Perfect. > > > iconv(DatabaseTSD$Reference[211], "UTF-8", "ASCII", "Unicode") > [1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P. > (2017) Effects of temperature on sex determination and embryonic > development in the red-footed tortoise, Chelonoidis carbonarius. > Chelonian Conservation and Biology 16, 164-171." > > Then I have no NOTES when I checked the package with database in UNIX... > but how can I print the reference back with original characters ? > > Thanks a lot to point me to best practices to include databases with > non-ASCII characters and not have NOTES while submitted package to CRAN. > > Marc > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.