Christoph Anton Mitterer
2022-Jan-25 05:08 UTC
[klibc] klibc sh doesn't support switching LC_* during a script/session
Hey. I've noticed that klibc sh (like dash - but unlike bash or busybox ash) doesn't seem to support the follwoing: Switching LC_*/LANG variables during a script/session and have that change take effect on the very same script/session. Consider the following check for this: # check whether LC_* switch works in one script #in UTF-8 the following is U+220B CONTAINS AS MEMBER bs="$(printf '\342\210\213' )" export LC_ALL=C.UTF-8 len1=${#bs} export LC_ALL=C len2=${#bs} printf '%s %s\n' "${len1}" "${len2}" # should return "1 2" Now I'm not really sure whether or not POSIX mandates that this should work or whether it's unspecified. I'd kinda vaguely guess that it actually must be supported because of: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_03 Quoting: "The following variables shall affect the execution of the shell: [...] LANG [...] LC_ALL [...] " And it really says: the following variables (and the chapter is "Shell Variables") and "shall affect the execution of the shell". It does not say, "environment variables" (in the sense that it would have been just set once when the shell was invoked). Even if POSIX wouldn't mandate this behaviour, I'd kindly ask to support it, given that it's a requirement for the (apparently) only really portable way of implementing/hacking command substitution that keeps trailing newlines (see e.g.?https://unix.stackexchange.com/a/383390/474076). Thanks, Chris.
Thorsten Glaser
2022-Jan-25 06:05 UTC
[klibc] klibc sh doesn't support switching LC_* during a script/session
Christoph Anton Mitterer dixit:>Now I'm not really sure whether or not POSIX mandates that this shouldIt does, but the change must not affect the way the script is parsed (interesting requirement?), but it only requires support for the C locale; klibc can just declare to support only that. (mksh does, for POSIX conformance enquiries, it only supports the C locale and its UTF-8 mode is an extension.) klibc, AFAICT, doesn?t do locales anyway, as it?s a very limited execution environment.>Consider the following check for this:Right now, you can get that with /usr/lib/klibc/bin/mksh (bullseye+) because I?ve started implementing ?locale tracking?, basically setting the utf8-mode flag to on/off whenever the LANG or LC_* variables change. It?s not 100% done yet, but will. This will, however, give you only the C locale and mksh?s UTF-8 mode, which is like C in every respect other than multibyte decoding in many places (not ifs0 nor the PS1 \r magic), not affecting e.g. the character classes. For pre-bullseye or the current upstream release, you?ve got to set -U (and +U to disable it again); you can use this: case ${KSH_VERSION:-} in *MIRBSD\ KSH*|*LEGACY\ KSH*) case ${LC_ALL:-${LC_CTYPE:-${LANG:-}}} in *[Uu][Tt][Ff]8*|*[Uu][Tt][Ff]-8*) set -U ;; *) set +U ;; esac ;; esac Perhaps this helps. It?s certainly sufficient for any script I?m writing ? bye, //mirabilos -- (gnutls can also be used, but if you are compiling lynx for your own use, there is no reason to consider using that package) -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL
Thorsten Glaser
2022-Jan-25 06:19 UTC
[klibc] klibc sh doesn't support switching LC_* during a script/session
Christoph Anton Mitterer dixit:>e.g.?https://unix.stackexchange.com/a/383390/474076).Just add a trailing period or slash; POSIX requires them to be identically encoded across all supported locales: * The encoded values associated with <period>, <slash>, <newline>, and <carriage-return> shall be invariant across all locales supported by the implementation. And: [?] Likewise, the byte values used to encode <period>, <slash>, <newline>, and <carriage-return> shall not occur as part of any other character in any locale. I?ve been using an x myself, but will probably switch to? hmm? a period (more flexible to use). x=$(somecommand; echo .); x=${x%.} (Note the ?.? is not a regular expression there.) This will result in exactly the output of ?somecommand?, truncated at the first NUL, if any, with trailing newlines preserved. This should do in klibc dash, too. (Incidentally, a method using Unicode characters won?t, as POSIX puts requirements on the application as well; mixing encodings is not permitted.) bye, //mirabilos -- FWIW, I'm quite impressed with mksh interactively. I thought it was much *much* more bare bones. But it turns out it beats the living hell out of ksh93 in that respect. I'd even consider it for my daily use if I hadn't wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh