Hi, We found a (to our eyes) strange behaviour that might be a bug. First a little bit of context. The 'units' package allows us to set the unit using both SE or NSE. E.g., these both work in the same way: units::set_units(1:10, "?m") #> Units: [?m] #> [1] 1 2 3 4 5 6 7 8 9 10 units::set_units(1:10, ?m) #> Units: [?m] #> [1] 1 2 3 4 5 6 7 8 9 10 That's micrometers, and works fine if the session charset is UTF-8. Now the funny part comes with Windows. The first version, with quotes, works fine, but the second one fails. This is easy to demonstrate from Linux: LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "?m")' #> Units: [?m] #> [1] 1 2 3 4 5 6 7 8 9 10 LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, ?m)' #> Error: unexpected input in "units::set_units(1:10, ?" #> Execution halted However, if you use the first version, with quotes, in an example, and the package is checked on Windows, it fails too (see https://ci.appveyor.com/project/edzer/units/builds/22440023#L747). The package declares UTF-8 encoding, so none of these errors should, in principle, happen. Am I wrong? Thanks in advance, regards, I?aki
>From "Writing R Extensions":"Only ASCII characters (and the control characters tab, formfeed, LF and CR) should be used in code files." So I am afraid you cannot use ?m. Gabor On Mon, Feb 18, 2019 at 3:36 PM I?aki Ucar <iucar at fedoraproject.org> wrote:> > Hi, > > We found a (to our eyes) strange behaviour that might be a bug. First > a little bit of context. The 'units' package allows us to set the unit > using both SE or NSE. E.g., these both work in the same way: > > units::set_units(1:10, "?m") > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > units::set_units(1:10, ?m) > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > That's micrometers, and works fine if the session charset is UTF-8. > Now the funny part comes with Windows. The first version, with quotes, > works fine, but the second one fails. This is easy to demonstrate from > Linux: > > LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "?m")' > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, ?m)' > #> Error: unexpected input in "units::set_units(1:10, ?" > #> Execution halted > > However, if you use the first version, with quotes, in an example, and > the package is checked on Windows, it fails too (see > https://ci.appveyor.com/project/edzer/units/builds/22440023#L747). The > package declares UTF-8 encoding, so none of these errors should, in > principle, happen. Am I wrong? > > Thanks in advance, regards, > I?aki > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
On Mon, 18 Feb 2019 at 17:27, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:> > From "Writing R Extensions": > > "Only ASCII characters (and the control characters tab, formfeed, LF > and CR) should be used in code files." > > So I am afraid you cannot use ?m.Thanks, G?bor, I missed that bit. Then, is an .Rd file considered a "code file"? Our surprise comes from the fact that the quoted version works fine in a test file, but not in an example. Anyway, if they cause such a documented trouble, it seems that the safest option is to avoid its use in the first place. I?aki
On 2/18/19 4:36 PM, I?aki Ucar wrote:> Hi, > > We found a (to our eyes) strange behaviour that might be a bug. First > a little bit of context. The 'units' package allows us to set the unit > using both SE or NSE. E.g., these both work in the same way: > > units::set_units(1:10, "?m") > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > units::set_units(1:10, ?m) > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > That's micrometers, and works fine if the session charset is UTF-8. > Now the funny part comes with Windows. The first version, with quotes, > works fine, but the second one fails. This is easy to demonstrate from > Linux: > > LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "?m")' > #> Units: [?m] > #> [1] 1 2 3 4 5 6 7 8 9 10 > > LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, ?m)' > #> Error: unexpected input in "units::set_units(1:10, ?" > #> Execution halted > > However, if you use the first version, with quotes, in an example, and > the package is checked on Windows, it fails too (see > https://ci.appveyor.com/project/edzer/units/builds/22440023#L747). The > package declares UTF-8 encoding, so none of these errors should, in > principle, happen. Am I wrong?Hi I?aki, if you want to report a bug against R, please try to provide a minimum reproducible example that only uses base packages (not units) and please also see WRE sections 1.3, 1.6.3, including: "There is a portable way to have arbitrary text in character strings (only) in your R code, which is to supply them in Unicode as ?\uxxxx? escapes." "If your package specifies an encoding in its DESCRIPTION file, you should run these tools in a locale which makes use of that encoding" (includes R CMD check) Even though there are portable ways to have a string constant literal in source code in UTF-8, not representable in the current native encoding (e.g. using \u escapes), it does not mean that such a string can be freely used in R. Many operations require conversion to the current native encoding, which will cause an error or unexpected result. Such conversions can happen any time (except when they are documented not to happen). Implementing an API that will work with such strings in a package would be hard to get right, but not impossible. NSE will not work (non-representable strings, which are not string constant literals, are not supported). One can save a lot of headaches by using only ASCII in function APIs. Best Tomas> > Thanks in advance, regards, > I?aki > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]