Bjørn-Helge Mevik
2014-Dec-12  09:12 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
Duncan Murdoch <murdoch.duncan at gmail.com> writes:> users of other languages may want to have messages and variable names > in their native language, and ASCII might not be enough for that.Allowing for messages in non-ASCII encodings would probably be a good idea, but I think allowing non-ASCII variable names is dangerous. -- Regards, Bj?rn-Helge Mevik
Duncan Murdoch
2014-Dec-12  11:01 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On 12/12/2014, 4:12 AM, Bj?rn-Helge Mevik wrote:> Duncan Murdoch <murdoch.duncan at gmail.com> writes: > >> users of other languages may want to have messages and variable names >> in their native language, and ASCII might not be enough for that. > > Allowing for messages in non-ASCII encodings would probably be a good > idea, but I think allowing non-ASCII variable names is dangerous.Dangerous in what way? I agree that CRAN probably shouldn't accept packages like that, at least for exported symbols: packages there should run anywhere. But I suspect that the majority of R packages are for private use, and will never be sent to CRAN. Do you know any reason that non-ASCII names would be dangerous for those? Duncan Murdoch
Jan Kim
2014-Dec-12  12:34 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote:> On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote: > > Duncan Murdoch <murdoch.duncan at gmail.com> writes: > > > >> users of other languages may want to have messages and variable names > >> in their native language, and ASCII might not be enough for that. > > > > Allowing for messages in non-ASCII encodings would probably be a good > > idea, but I think allowing non-ASCII variable names is dangerous. > > Dangerous in what way? > > I agree that CRAN probably shouldn't accept packages like that, at least > for exported symbols: packages there should run anywhere. But I > suspect that the majority of R packages are for private use, and will > never be sent to CRAN. Do you know any reason that non-ASCII names > would be dangerous for those? > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-develI'm would perhaps not go as far as calling them dangerous, but non-ASCII characters in code are a mixed blessing which personally I'd opt to not have, on balance. Being German I can understand that people may want umlauted characters in their variable names, but where this catches on, it's just a matter of time that people get characters into their code that are different but indistinguishable in the font they use (I've seen this with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling over tracking down these problems. While many packages are used in-house at least initially, making a package is a step towards releasing it, so I'd anticipate that having an option to support weeding out any potentially troublesome identifiers has the potential to do some good. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | email: jttkim at gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----*
Bjørn-Helge Mevik
2014-Dec-12  14:35 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
Duncan Murdoch <murdoch.duncan at gmail.com> writes:> On 12/12/2014, 4:12 AM, Bj?rn-Helge Mevik wrote: >> Duncan Murdoch <murdoch.duncan at gmail.com> writes: >> >>> users of other languages may want to have messages and variable names >>> in their native language, and ASCII might not be enough for that. >> >> Allowing for messages in non-ASCII encodings would probably be a good >> idea, but I think allowing non-ASCII variable names is dangerous. > > Dangerous in what way?Perhaps "dangerous" is a little too strong, but it opens up possibilities for problems with sharing code or running it on other systems. Also, judging by the many files I've seen (and created myself :) with a mixture of iso8859-1 and utf8, or with "double-encoded" utf8, it is surprisingly easy to make encoding mistakes when editing or processing files. And as Jan Kim wrote, you could get things that look similar but are different. -- Regards, Bj?rn-Helge Mevik
Barry Rowlingson
2014-Dec-12  16:58 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim <jttkim at googlemail.com> wrote:> it's just a matter of time that people get characters into their code that > are different but indistinguishable in the font they use (I've seen this > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > over tracking down these problems.Then R should ban variable names from having 'l', 'i', '1', '0' and 'O' in them! Barry
Jan Kim
2014-Dec-12  19:39 UTC
[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 04:58:52PM +0000, Barry Rowlingson wrote:> On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim <jttkim at googlemail.com> wrote: > > > it's just a matter of time that people get characters into their code that > > are different but indistinguishable in the font they use (I've seen this > > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > > over tracking down these problems. > > Then R should ban variable names from having 'l', 'i', '1', '0' and > 'O' in them!well -- I can live with 'i', but if I came across code using variable names i, \'{\i}, \`{\i} and also \i, \u{\i}, \r{\i}, \d{\i} etc. I'd consider that dangerous to my sanity (especially if they're all used in the same piece of code)... ;-) More seriously, as I (literally) see it, the problems of confusing l / I / 1 or O / 0 etc. are reasonably solvable by using a decent font (e.g. Deja Vu, Source Code Pro), but ensuring distinctness of glyphs in the same way won't scale to character sets the size of Unicode. Best regards, Jan> Barry > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- +- Jan T. Kim -------------------------------------------------------+ | email: jttkim at gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----*