Hi, Michael Felt wrote on Mon, Aug 20, 2018 at 11:28:26AM +0200:> On 20/08/2018 10:33, Michael Felt wrote: >> On 17/08/2018 17:15, Ingo Schwarze wrote: >>> Darren Tucker wrote on Fri, Aug 17, 2018 at 07:16:03AM -0700: >>>> On 13 August 2018 at 15:06, Val Baranov <val.baranov at duke.edu> wrote:>>>>> test_utf8: ........................ >>>>> regress/unittests/utf8/tests.c:48 test #25 "c_esc" >>>>> ASSERT_INT_EQ(len, wantlen) failed: >>>>> len = -1 >>>>> wantlen = 5>>>> This boils down to meaning OpenSSH's smnprintf call failed for the >>>> string "\033x" instead of returning the expected escaped version >>>> "\\033x". The code is in utf8.c but I am not sure why it failed.>>> Actually, it is *supposed* to fail unless the locale is either >>> UTF-8 or the POSIX (ASCII) locale, because '\033' is not a >>> printable character and attempting to escape invalid stuff >>> is unsafe in arbitrary locales.Sorry, i spoke too soon, i didn't correctly remember how the tests work. It is completely irrelevant what the user sets their locale to, as it should in a test suite. The tests themselves make sure the locale ist set correctly for the tests. The "c_esc" in the test output above means that it is testing the "C" locale at that point. So the problem is somewhere else, likely in what your nl_langinfo(3) function does in the POSIX locale. Could you please run the following simple test program on your system and show us the output, for further diagnosis? OpenBSD: $ make nl_langinfo cc -O2 -pipe -o nl_langinfo nl_langinfo.c $ ./nl_langinfo setlocale -> "C" nl_langinfo -> "US-ASCII" Linux: $ make nl_langinfo cc nl_langinfo.c -o nl_langinfo $ ./nl_langinfo setlocale -> "C" nl_langinfo -> "ANSI_X3.4-1968" AIX: ? Thank you, Ingo $ cat nl_langinfo.c #include <err.h> #include <langinfo.h> #include <locale.h> #include <stdio.h> int main(void) { char *res; res = setlocale(LC_CTYPE, "C"); if (res == NULL) err(1, "setlocale"); printf("setlocale -> \"%s\"\n", res); res = nl_langinfo(CODESET); if (res == NULL) err(1, "nl_langinfo"); printf("nl_langinfo -> \"%s\"\n", res); return 0; }
On 20/08/2018 16:00, Ingo Schwarze wrote:> AIX: > ?Had to modify it, just a bit (errno.h is probably not needed) michael at x071:[/data/prj/openbsd/mindrot]cat *.c #include <errno.h> #include <langinfo.h> #include <locale.h> #include <stdio.h> static err(int exitcode, char *msg) { ??????? fflush(stdout); ??????? fprintf(stderr,"%s", msg); ??????? exit(exitcode); } int main(void) { ??????? char *res; ??????? res = setlocale(LC_CTYPE, "C"); ??????? if (res == NULL) ??????????????? err(1, "setlocale"); ??????? printf("setlocale -> \"%s\"\n", res); ??????? res = nl_langinfo(CODESET); ??????? if (res == NULL) ??????????????? err(1, "nl_langinfo"); ??????? printf("nl_langinfo -> \"%s\"\n", res); ??????? return 0; } cc nl_langinfo.c -o nl_langinfo -lc ./nl_langinfo setlocale -> "C" nl_langinfo -> "ISO8859-1" There is a program - /usr/lib/nls/lsmle (just learned about it!) FYI: First stanza is: CC: ? locale:???????? "C" ? text_string:??? "C (POSIX)" ? text_string_id: 100 ? codeset:??????? "ISO8859-1" ? messages:?????? "C" ? keyboards:????? "C" ? package:??????? "" ? variables:????? "LANG=C" ? bosinst_menu:?? "y" ? menu:?????????? "101 102" ? Keyboard Descriptions (1): ? (1) locale:???????? "C" ? (1) keyboard_map:?? "C" ? (1) text_string:??? "C (POSIX)" ? (1) text_string_id: 100 ? (1) codeset:??????? "ISO8859-1" ? (1) package:??????? "" ? (1) variables:????? "" ? (1) keyboard_cmd:?? "/usr/bin/chkbd /usr/lib/nls/loc/C.lftkeymap" ? (1) key_text:?????? "English(POSIX) KBD ID 103P" ? (1) key_text_id:??? 200 ? Message Descriptions (1): ? (1) message Lvalue: "C" ? (1) message string: "C (POSIX)" ? (1) codeset:??????? "ISO8859-1" ? (1) package:??????? "" ? (1) variables:????? "" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180820/f7cee667/attachment.asc>
Hi, Michael Felt wrote on Mon, Aug 20, 2018 at 05:00:17PM +0200:> ./nl_langinfo > setlocale -> "C" > nl_langinfo -> "ISO8859-1"Thanks, that is helpful. So i think i was wrong and Damien was right. This means that OpenSSH returns truncated messages when non-ASCII bytes occur in them, even when the user requests LC_CTYPE=POSIX. That's not good. While there is no need to cater for any potential locale that users might wilfully select, we should probably try to show complete messages to users who specifically select the POSIX locale. Admittedly, AIX is weird in calling ASCII "ISO8859-1", which is probably going to mean something different elsewhere. But given that it is very unlikely that anything another system calls ISO8859-1 is an unsafe (ASCII-incompatible or state-dependent) encoding, i'm proposing the following patch. I suggest adding some comments because otherwise, we will eventually forget where all these strings came from. OK?> There is a program - /usr/lib/nls/lsmle (just learned about it!)That's non-standard. The standard program for similar purposes is locale(1), though that usually won't report CODESET, but only LC_CTYPE. Yours, Ingo Index: utf8.c ==================================================================RCS file: /cvs/src/usr.bin/ssh/utf8.c,v retrieving revision 1.7 diff -u -p -r1.7 utf8.c --- utf8.c 31 May 2017 09:15:42 -0000 1.7 +++ utf8.c 20 Aug 2018 17:11:33 -0000 @@ -51,9 +51,18 @@ dangerous_locale(void) { char *loc; loc = nl_langinfo(CODESET); - return strcmp(loc, "US-ASCII") != 0 && strcmp(loc, "UTF-8") != 0 && - strcmp(loc, "ANSI_X3.4-1968") != 0 && strcmp(loc, "646") != 0 && - strcmp(loc, "") != 0; + return strcmp(loc, "UTF-8") != 0 && + strcmp(loc, "US-ASCII") != 0 && + + /* + * What nl_langinfo(CODESET) returns for US-ASCII + * on various operating systems: + */ + + strcmp(loc, "ANSI_X3.4-1968") != 0 && /* Linux */ + strcmp(loc, "ISO8859-1") != 0 && /* AIX */ + strcmp(loc, "646") != 0 && /* Solaris, NetBSD */ + strcmp(loc, "") != 0; /* Solaris 6 */ } static int