thr3ads.net - openssh unix dev - AcceptEnv LANG LC_* vs available locales [Apr 2022]

If this information is useful, please help other people find it:
Share via:

Demi Marie Obenour

2022-Apr-29 00:29 UTC

AcceptEnv LANG LC_* vs available locales

On 4/27/22 05:40, Ingo Schwarze wrote:> Hi Demi,
> 
> Demi Marie Obenour wrote on Tue, Apr 26, 2022 at 09:12:07PM -0400:
>> On 4/25/22 08:23, Ingo Schwarze wrote:
> 
>>> As discussed in the above writeup, the only way to make ssh(1)
>>> connections safe it to manually make sure, *before connecting*,
>>> that the same locale is set on both sides - ideally UTF-8.
> 
>> It is also safe for the locale to be different, so long as the
>> character encodings match.  For instance, all UTF-8 locales are
>> compatible.
> 
> Yes, that is what i meant.  In OpenBSD, we are used to the deliberate
> decision that the C library ignores all aspects of the locale except
> the character encoding, so the locale and the character encoding are
> one and the same and your statement is obvious for us.  Of course,
> your statement is also true on arbitrary other operating systems, even
> if they do take other parts of the locale into account.
Off-topic: Why did OpenBSD make this decision?  In particular,
LC_MESSAGES seems to be essential to internationalization support,
without being very problematic otherwise.

Also, is it safe if the server uses the C locale (LC_ALL=C) and the
client uses UTF-8?
> Thanks for making this aspect explicit.  You are right that it might
> not be obvious for users of other systems.
You?re welcome.
> That said, on non-OpenBSD systems, if the locale used by a program does
> not match watch the user thinks, the *semantics* of the program may still
> screw up horribly, even if the character encoding matches.  For example,
> consider user input of floating point numbers with LC_NUMERIC set to a
> cultural convention the user isn't aware of.  But such issues are
> only loose related to ssh(1) and to terminal security.
When it comes to terminal security, another approach is to use
a transient tmux(1) pane or terminal window that is closed once
the session is complete.  This assumes that the mismatch cannot be
exploited for code execution, but I would be highly surprised if it
could be, especially with the client in UTF-8 mode.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB288B55FFF9C22C1.asc
Type: application/pgp-keys
Size: 4885 bytes
Desc: OpenPGP public key
URL:
<http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20220428/4b34c838/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20220428/4b34c838/attachment-0001.asc>

Christoph Anton Mitterer

2022-Apr-29 01:57 UTC

head link

AcceptEnv LANG LC_* vs available locales

On Thu, 2022-04-28 at 20:29 -0400, Demi Marie Obenour
wrote:> > That said, on non-OpenBSD systems, if the locale used by a program
> > does
> > not match watch the user thinks, the *semantics* of the program may
> > still
> > screw up horribly, even if the character encoding matches.? For
> > example,
> > consider user input of floating point numbers with LC_NUMERIC set
> > to a
> > cultural convention the user isn't aware of.? But such issues are
> > only loose related to ssh(1) and to terminal security.
> 
> When it comes to terminal security, another approach is to use
> a transient tmux(1) pane or terminal window that is closed once
> the session is complete.? This assumes that the mismatch cannot be
> exploited for code execution, but I would be highly surprised if it
> could be, especially with the client in UTF-8 mode.
Maybe it's too late in the night and I just miss the obvious point,...

... but what exactly is the security problem here (if one sends
LC_*/LANG ... or with locales in general)?

With or with any locale/character encoding differences, the (possibly
evil) remote side can send any arbitrary bytes to the terminal.

But how could it use this to for code execution on the local machine?

The only attack vector I see would be:
A remote side tricking a user into believing that he left SSH (but is
still on the remote side)... and then tricking him into e.g. entering a
password. But that should be independent of the locale/character
encoding.

What should be the attack with e.g. LC_NUMERIC? A remote side tricking
a user into using 3,14 instead of 3.14 and that having some attacking
effect? But if the remote side can mess with the locale (on its own
side)... it can anyway already do it's attack there?

Similar, if a evil remote side could swap yesexpr and noexpr in
LC_MESSAGES?
So what?
Tricking a user in to rm -ri / and then using 'n' which would then mean
'y'?
If the remote side can do this, it can again just delete those (remote)
files?

Cheers,
Chris.

Ingo Schwarze

2022-Apr-29 11:29 UTC

head link

AcceptEnv LANG LC_* vs available locales

Hi,

Demi Marie Obenour wrote on Thu, Apr 28, 2022 at 08:29:24PM
-0400:> On 4/27/22 05:40, Ingo Schwarze wrote:
>> Demi Marie Obenour wrote on Tue, Apr 26, 2022 at 09:12:07PM -0400:
>>> On 4/25/22 08:23, Ingo Schwarze wrote:
>> In OpenBSD, we are used to the deliberate
>> decision that the C library ignores all aspects of the locale except
>> the character encoding, [...]
> Off-topic: Why did OpenBSD make this decision?  In particular,
> LC_MESSAGES seems to be essential to internationalization support,
> without being very problematic otherwise.
I think having libc and POSIX utility programs always reliably print
diagnostics in the same way, and always in US-ASCII rather than sometimes
in UTF-8, is more valuable than internationalization of operating
system diagnostics, both from the user perspective (predictability and
comprehensibility) and from the OS maintainer perspective (code simplicity
and hence better change for correctness and reliability).  Even as a
native German speaker, i regularly get confused when seeing German
error messages because they usually feel quite incomprehensible.

Besides, LC_CTYPE is essential for important functionality, but picking
individual features from all the rest of LC_* for implementation isn't
going to help.  It will increase code complexity without really
achieving internationalization (even full LC_* support is not really
sufficient for complete internationalization...).  So better ditch
it outright than attempt some piece-meal approach.

Besides, even LC_MESSAGES has features that are prone to causing
trouble, for example changing the meaning of "yes" and "no".
> Also, is it safe if the server uses the C locale (LC_ALL=C) and the
> client uses UTF-8?
Yes, because US-ASCII is a subset of UTF-8, so what a well-behaved
server sends in the C locale is supposed to be a subset of what it
might send in a UTF-8 locale.

Of course, whether it is safe when both the server and the client use
a UTF-8 locale obviously depends on the terminal or terminal emulator,
but at least xterm(1) in UTF-8 mode [but not in the traditional 8-bit
mode that may still be the default on some operating systems] is safe
when the server runs either the C locale or a UTF-8 locale.

[...]>> That said, on non-OpenBSD systems, if the locale used by a program does
>> not match watch the user thinks, the *semantics* of the program may
still
>> screw up horribly, even if the character encoding matches.  For
example,
>> consider user input of floating point numbers with LC_NUMERIC set to a
>> cultural convention the user isn't aware of.  But such issues are
>> only loose related to ssh(1) and to terminal security.
> When it comes to terminal security, another approach is to use
> a transient tmux(1) pane or terminal window that is closed once
> the session is complete.
Frankly, i don't know anything about tmux(1) and simply don't know
whether it can or cannot help with the topic at hand.
> This assumes that the mismatch cannot be
> exploited for code execution, but I would be highly surprised if it
> could be, especially with the client in UTF-8 mode.
xterm(1) in UTF-8 mode is quite good because it never interprets
multibyte characters as in-band terminal control codes.  Your
mileage might vary with other terminals or emulators.

Yours,
  Ingo

openssh unix dev - Apr 2022 - AcceptEnv LANG LC_* vs available locales

AcceptEnv LANG LC_* vs available locales

AcceptEnv LANG LC_* vs available locales

AcceptEnv LANG LC_* vs available locales