Sent from my iPhone> On 17 Aug 2018, at 17:15, Ingo Schwarze <schwarze at usta.de> wrote: > > Hi Darren, > > Darren Tucker wrote on Fri, Aug 17, 2018 at 07:16:03AM -0700: >> On 13 August 2018 at 15:06, Val Baranov <val.baranov at duke.edu> wrote: > >>> test_utf8: ........................ >>> regress/unittests/utf8/tests.c:48 test #25 "c_esc" >>> ASSERT_INT_EQ(len, wantlen) failed: >>> len = -1 >>> wantlen = 5 > >> This boils down to meaning OpenSSH's smnprintf call failed for the >> string "\033x" instead of returning the expected escaped version >> "\\033x". The code is in utf8.c but I am not sure why it failed. > > Actually, it is *supposed* to fail unless the locale is either > UTF-8 or the POSIX (ASCII) locale, because '\033' is not a > printable character and attempting to escape invalid stuff > is unsafe in arbitrary locales. > >> What's your locale set to? >Irrc the default on AIX. is iso-18559-15 (hope i have all the numbers right. In any case not utf-8. AIX 7.2 may be different. In any case on AIX 6.1 the test logic automatically sets utf8 test to no.> It doesn't matter on OpenBSD, but maybe you should consider setting > LC_CTYPE=en_US.UTF-8 by default in TEST_ENV in the portable version > of the test suite? Of course, it would do no harm on OpenBSD either. > > If you worry that some target system might not have a en_US.UTF-8 > locale installed, you can look at > > http://mandoc.bsd.lv/cgi-bin/cvsweb/configure?rev=HEAD > > for a way to autodetect a suitable UTF-8 locale - look for UTF8_LOCALE > in that script. > > But that may be overkill for OpenSSH. Just recklessly forcing > LC_CTYPE=en_US.UTF-8 may be good enough for OpenSSH's purposes. > If the target system doesn't provide it, setlocale(3) will fall > back to POSIX, which should be good enough for the tests. > > Yours, > Ingo > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
On Sat, 18 Aug 2018, Michael Felt (aixtools) wrote:> Irrc the default on AIX. is iso-18559-15 (hope i have all the numbers > right. In any case not utf-8.The numbers are ... not right: https://www.iso.org/obp/ui/#iso:std:iso:18559:dis:ed-1:v1:en ISO 18559 - Dentistry ? Extraoral spatulas for mixing dental cements Maybe ISO 8559-1? If it's basically ASCII then we could add it to the check in utf8.c:dangerous_locale() and the test will probably pass. -d
Hi, Damien Miller wrote on Sun, Aug 19, 2018 at 06:30:47PM +1000:> On Sat, 18 Aug 2018, Michael Felt (aixtools) wrote:>> Irrc the default on AIX. is iso-18559-15 (hope i have all the numbers >> right. In any case not utf-8.> The numbers are ... not right: > > https://www.iso.org/obp/ui/#iso:std:iso:18559:dis:ed-1:v1:en > ISO 18559 - Dentistry ? Extraoral spatulas > for mixing dental cements > > Maybe ISO 8559-1?More likely ISO 8559-15, but whatever.> If it's basically ASCII then we could add it to the check > in utf8.c:dangerous_locale() and the test will probably pass.I dislike that idea. Sure, in theory, it would be possible and safe to add a long list of all character encodings to that function which satisfy both of the following conditions: (1) they do not carry internal state and (2) they contain ASCII as a subset. But i don't think it's worth the effort because such character encodings (with the exception of UTF-8) have been moribund for years. Also, such a list would not only be ugly, but also hard to maintain because it would require the OpenSSH maintainers to judge the properties of unfamiliar character sets. That hardship would make maintenance error-prone. On top of that, all we can check is the CODESET string returned from nl_langinfo(3), and those strings are not specified by any standard. The more strings you whitelist here, the higher the risk that some string that means a safe encoding on one given system accidentally means a different, unsafe encoding on some other system you never thought (or even heard) about. I think the current whitelist is probably more or less safe, even though the strcmp(loc, "646") looks a bit dubious - are we really sure that there is no system out there using that for an unsafe encoding? - and thinking about it again, i'm no longer all that happy with the strcmp(loc, "") either - a broken (or even merely unusual!) nl_langinfo(3) implementation could easily return that, tricking OpenSSH into unsafe encoding behaviour. In any case, i don't like the idea of adding yet more strings to the list. Yours, Ingo