thr3ads.net - freebsd stable - Uppercase RE matching problems in FreeBSD 11 [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Charles Swiger

2016-Nov-07 21:13 UTC

Uppercase RE matching problems in FreeBSD 11

On Nov 6, 2016, at 1:49 PM, Stefan Bethke <stb at lassitu.de>
wrote:> Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin <bapt at
FreeBSD.org>:
>> That works for POSIX locale aka C aka ASCII only world
> 
> So what do I set my LANG and LC variables to?  I do want UTF-8, but I do
also want my scripts to continue to work.  Clearly, en_US.UTF-8 is not what I
want.  Is it C.UTF-8?  Or do I set LANG=en_US.UTF-8 and LC_COLLATE=C?
If you want to use a UTF8 locale, then you must start using character classes
like '[:upper:]' and '[:lower:]' because those will-- or at
least "should", modulo bugs-- properly handle the collation issues
including for languages which do not possess a 1-1 mapping between upper and
lower case letters.

Someone with a German email address is presumably familiar with ? / Eszett...? 
:-)

Regards,
-- 
-Chuck

Stefan Ehmann

2016-Nov-08 19:54 UTC

head link

Uppercase RE matching problems in FreeBSD 11

On 07.11.2016 22:13, Charles Swiger wrote:> On Nov 6, 2016, at 1:49 PM, Stefan Bethke <stb at lassitu.de> wrote:
>> Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin
>> <bapt at FreeBSD.org>:
>>> That works for POSIX locale aka C aka ASCII only world
>> 
>> So what do I set my LANG and LC variables to?  I do want UTF-8, but
>> I do also want my scripts to continue to work.  Clearly,
>> en_US.UTF-8 is not what I want.  Is it C.UTF-8?  Or do I set
>> LANG=en_US.UTF-8 and LC_COLLATE=C?
> 
> If you want to use a UTF8 locale, then you must start using character
> classes like '[:upper:]' and '[:lower:]' because those
will-- or at
> least "should", modulo bugs-- properly handle the collation
issues
> including for languages which do not possess a 1-1 mapping between
> upper and lower case letters.
> 
> Someone with a German email address is presumably familiar with ? /
> Eszett...?  :-)
Character classes work fine for [a-z], but I don't know of a simple way
to match a range like [a-k].

Personally, I prefer the "Rational Range Interpretation" because it
doesn't break backward compatibility and is still standard compliant.

freebsd stable - Nov 2016 - Uppercase RE matching problems in FreeBSD 11

Uppercase RE matching problems in FreeBSD 11

Uppercase RE matching problems in FreeBSD 11