Hi, Michael Felt wrote on Mon, Aug 20, 2018 at 05:00:17PM +0200:> ./nl_langinfo > setlocale -> "C" > nl_langinfo -> "ISO8859-1"Thanks, that is helpful. So i think i was wrong and Damien was right. This means that OpenSSH returns truncated messages when non-ASCII bytes occur in them, even when the user requests LC_CTYPE=POSIX. That's not good. While there is no need to cater for any potential locale that users might wilfully select, we should probably try to show complete messages to users who specifically select the POSIX locale. Admittedly, AIX is weird in calling ASCII "ISO8859-1", which is probably going to mean something different elsewhere. But given that it is very unlikely that anything another system calls ISO8859-1 is an unsafe (ASCII-incompatible or state-dependent) encoding, i'm proposing the following patch. I suggest adding some comments because otherwise, we will eventually forget where all these strings came from. OK?> There is a program - /usr/lib/nls/lsmle (just learned about it!)That's non-standard. The standard program for similar purposes is locale(1), though that usually won't report CODESET, but only LC_CTYPE. Yours, Ingo Index: utf8.c ==================================================================RCS file: /cvs/src/usr.bin/ssh/utf8.c,v retrieving revision 1.7 diff -u -p -r1.7 utf8.c --- utf8.c 31 May 2017 09:15:42 -0000 1.7 +++ utf8.c 20 Aug 2018 17:11:33 -0000 @@ -51,9 +51,18 @@ dangerous_locale(void) { char *loc; loc = nl_langinfo(CODESET); - return strcmp(loc, "US-ASCII") != 0 && strcmp(loc, "UTF-8") != 0 && - strcmp(loc, "ANSI_X3.4-1968") != 0 && strcmp(loc, "646") != 0 && - strcmp(loc, "") != 0; + return strcmp(loc, "UTF-8") != 0 && + strcmp(loc, "US-ASCII") != 0 && + + /* + * What nl_langinfo(CODESET) returns for US-ASCII + * on various operating systems: + */ + + strcmp(loc, "ANSI_X3.4-1968") != 0 && /* Linux */ + strcmp(loc, "ISO8859-1") != 0 && /* AIX */ + strcmp(loc, "646") != 0 && /* Solaris, NetBSD */ + strcmp(loc, "") != 0; /* Solaris 6 */ } static int
On 20/08/2018 19:14, Ingo Schwarze wrote:> Index: utf8.c > ==================================================================> RCS file: /cvs/src/usr.bin/ssh/utf8.c,v > retrieving revision 1.7 > diff -u -p -r1.7 utf8.c > --- utf8.c 31 May 2017 09:15:42 -0000 1.7 > +++ utf8.c 20 Aug 2018 17:11:33 -0000 > @@ -51,9 +51,18 @@ dangerous_locale(void) { > char *loc; > > loc = nl_langinfo(CODESET); > - return strcmp(loc, "US-ASCII") != 0 && strcmp(loc, "UTF-8") != 0 && > - strcmp(loc, "ANSI_X3.4-1968") != 0 && strcmp(loc, "646") != 0 && > - strcmp(loc, "") != 0; > + return strcmp(loc, "UTF-8") != 0 && > + strcmp(loc, "US-ASCII") != 0 && > + > + /* > + * What nl_langinfo(CODESET) returns for US-ASCII > + * on various operating systems: > + */ > + > + strcmp(loc, "ANSI_X3.4-1968") != 0 && /* Linux */ > + strcmp(loc, "ISO8859-1") != 0 && /* AIX */ > + strcmp(loc, "646") != 0 && /* Solaris, NetBSD */ > + strcmp(loc, "") != 0; /* Solaris 6 */ > } > > static intAfter applying the patch: root at x064:[/data/prj/openbsd/mindrot/openssh-7.8.0.20]/data/prj/openbsd/mindrot/openssh-7.8.0.20/regress/unittests/utf8/test_utf8 test_utf8: .................................. 34 tests ok August 20th snapshot patched, not the -r 1.7 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180820/1fa51f48/attachment-0001.asc>
On 20/08/2018 19:14, Ingo Schwarze wrote:>> There is a program - /usr/lib/nls/lsmle (just learned about it!) > That's non-standard. The standard program for similar purposes is > locale(1), though that usually won't report CODESET, but only > LC_CTYPE.p.s. aka FYI: AIX also has the program locale(1), but that gives a lot less information, imho. And, maybe also a bit misleading - at least for the ignorant. I would have expected UTF-8, but instead, it uses EN_US and a symbolic link. root at x064:[/data/prj/openbsd/mindrot/openssh-7.8.0.20]locale LANG=EN_US LC_COLLATE="EN_US" LC_CTYPE="EN_US" LC_MONETARY="EN_US" LC_NUMERIC="EN_US" LC_TIME="EN_US" LC_MESSAGES="EN_US" LC_ALL While the default language locale(1) is: root at x066:[/data/prj/python/python3-3.8]locale LANG=en_US LC_COLLATE="en_US" LC_CTYPE="en_US" LC_MONETARY="en_US" LC_NUMERIC="en_US" LC_TIME="en_US" LC_MESSAGES="en_US" LC_ALL -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180820/444a3f21/attachment.asc>
Hi Michael, Michael wrote on Mon, Aug 20, 2018 at 08:32:03PM +0200:> p.s. aka FYI: AIX also has the program locale(1), but that gives > a lot less information, imho.Sure it has, it's POSIX, and sure it provides less info than you have shown from the non-standard tool. But by definition, it is unclear what the output of a non-standard tool may mean.> And, maybe also a bit misleading - at least for the ignorant. I would > have expected UTF-8, but instead, it uses EN_US and a symbolic link. > > root at x064:[/data/prj/openbsd/mindrot/openssh-7.8.0.20]locale > LANG=EN_US > LC_COLLATE="EN_US" > LC_CTYPE="EN_US" > LC_MONETARY="EN_US" > LC_NUMERIC="EN_US" > LC_TIME="EN_US" > LC_MESSAGES="EN_US" > LC_ALL> > While the default language locale(1) is: > > root at x066:[/data/prj/python/python3-3.8]locale > LANG=en_US > LC_COLLATE="en_US" > LC_CTYPE="en_US" > LC_MONETARY="en_US" > LC_NUMERIC="en_US" > LC_TIME="en_US" > LC_MESSAGES="en_US" > LC_ALLSeems quite unusual indeed and hard to understand - but AIX is free to call their locales however they want. Neither locale names nor the return value from nl_langinfo(CODESET) are standardized. According to POSIX, they need contain elements relating to languages, territories, encodings, or even filenames, and the return value from nl_langinfo(CODESET) need not match the LC_CTYPE variable. AIX would be free to ask their users to specify "apples" or "oranges" in LC_CTYPE and yet report back "bananas" from nl_langinfo(CODESET). ;-) Yours, Ingo
Ingo Schwarze wrote on Aug 20, 2018 at 1:15 PM +0500;> While there is no need to cater for any potential locale that users might wilfully select, we should probably try to show complete messages to > users who specifically select the POSIX locale. > > Admittedly, AIX is weird in calling ASCII "ISO8859-1", which is probably going to mean something different elsewhere. But given that it is very > unlikely that anything another system calls ISO8859-1 is an unsafe (ASCII-incompatible or state-dependent) encoding, i'm proposing the > following patch.The patch has been applied - no error for UTF-8. Completed with all tests "make tests" - no errors on AIX 7.1.
On Mon, 20 Aug 2018, Ingo Schwarze wrote:> Hi, > > Michael Felt wrote on Mon, Aug 20, 2018 at 05:00:17PM +0200: > > > ./nl_langinfo > > setlocale -> "C" > > nl_langinfo -> "ISO8859-1" > > Thanks, that is helpful. > > So i think i was wrong and Damien was right. This means that > OpenSSH returns truncated messages when non-ASCII bytes occur > in them, even when the user requests LC_CTYPE=POSIX. > That's not good. > > While there is no need to cater for any potential locale that users > might wilfully select, we should probably try to show complete > messages to users who specifically select the POSIX locale. > > Admittedly, AIX is weird in calling ASCII "ISO8859-1", which is > probably going to mean something different elsewhere. But given > that it is very unlikely that anything another system calls ISO8859-1 > is an unsafe (ASCII-incompatible or state-dependent) encoding, i'm > proposing the following patch. > > I suggest adding some comments because otherwise, we will > eventually forget where all these strings came from. > > OK?ok, djm@ (I'd prefer the comment before the return statement, but up to you)> Index: utf8.c > ==================================================================> RCS file: /cvs/src/usr.bin/ssh/utf8.c,v > retrieving revision 1.7 > diff -u -p -r1.7 utf8.c > --- utf8.c 31 May 2017 09:15:42 -0000 1.7 > +++ utf8.c 20 Aug 2018 17:11:33 -0000 > @@ -51,9 +51,18 @@ dangerous_locale(void) { > char *loc; > > loc = nl_langinfo(CODESET); > - return strcmp(loc, "US-ASCII") != 0 && strcmp(loc, "UTF-8") != 0 && > - strcmp(loc, "ANSI_X3.4-1968") != 0 && strcmp(loc, "646") != 0 && > - strcmp(loc, "") != 0; > + return strcmp(loc, "UTF-8") != 0 && > + strcmp(loc, "US-ASCII") != 0 && > + > + /* > + * What nl_langinfo(CODESET) returns for US-ASCII > + * on various operating systems: > + */ > + > + strcmp(loc, "ANSI_X3.4-1968") != 0 && /* Linux */ > + strcmp(loc, "ISO8859-1") != 0 && /* AIX */ > + strcmp(loc, "646") != 0 && /* Solaris, NetBSD */ > + strcmp(loc, "") != 0; /* Solaris 6 */ > } > > static int > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev >
Hi Damien, Damien Miller wrote on Tue, Aug 21, 2018 at 12:04:41PM +1000:> ok, djm@Thanks for checking, and thanks to Val and Michael for testing. I just committed the patch to OpenBSD, others will likely take care of merging it to -portable.> (I'd prefer the comment before the return statement, but up to you)Immediately before the return statement, it looked really confusing, like: /* ASCII on various systems ... blabla */ return UTF-8 ... blabla So i moved it up a few more lines and appended it to the existing comment, keeping the code concise and yet clearly explaining it. Yours, Ingo