Eric Blake
2023-Mar-21 17:28 UTC
[Libguestfs] [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
On Tue, Mar 21, 2023 at 03:56:22PM +0100, Laszlo Ersek wrote:> > $ podman build -f ci/containers/alpine-edge.Dockerfile -t libnbd-alpine-edge > > $ podman run -it --rm --userns=keep-id -v .:/repo:z -w /repo libnbd-alpine-edge bash > > $ ./configure > > $ make check > > $ grep tmpd= lib/test-suite.log > > + tmpd=/tmp/tmp.EMgKeF > > $ /tmp/tmp.EMgKeF/bin/f 1 + 1 > > f: applet not found > > 0b748c9fe495:~$ > > > > So it looks like we need some way to work around busybox' insistance > > that argv[0] determines which applet to run. > > I couldn't come up with a reproducer like yours. I couldn't figure out > how to *quickly* get an interactive Alpine Linux environment with the > test failing, and I also couldn't figure out how to trigger "verbose" > test runs on gitlab, without polluting the master branch with debug > patches. (I tried forking the project in my own space on gitlab, and > pushed a debug patch with just "set -x" onto a non-master branch *there* > -- but CI didn't start in response to that.)ci/README.rst is informative on that regards; I'm impressed at what lcitool (from the libvirt-ci) project gives us for local reproducibility of repeating various CI build failures.> > So, I only installed a new Alpine Linux virtual machine -- what a pain > *that* was --, and investigated what /usr/bin/expr was. When I found it > was a symlink to /bin/busybox, I started looking for Alpine Linux > specific tweaks that could replace busybox (in this role) with a real > binary executable "expr" utility. > > I was relieved to find the following wiki article: > > https://wiki.alpinelinux.org/wiki/How_to_get_regular_stuff_working > > which promised -- I thought anyways -- a real coreutils package. > > Imagine my dismay when I found that, after installing coreutils with apk > in the Alpine Linux VM, the symlink stayed in place, only its target > binary changed from "/bin/busybox" to "coreutils". Well done, Alpine > Linux, well done.Yeah, I remember the debates back when GNU coreutils was offered patches to choose to build a multiname binary - it fits with Alpine's goals and matches busybox's minimalist footprint, but it violates GNU Coding Standards (which state that in general, an application's behavior should not depend on argv[0], precisely so that renaming it doesn't change its behavior).> > So, no, this mess (= Alpine Linux) is not salvageable. The > "lib/test-fork-safe-execvpe.sh" script depends on "expr" being > functional under the name "f". And yes I want "f" to be a > single-character filename; otherwise the nice tabular formatting of the > script falls apart (or produces overlong lines). > > As last step, I learned about ci/skipped_tests, and used it to disable > the test on alpine linux.That works, but is not always my favorite: it fixes CI, but does not help any developer on a similar platform. I probably would have tried this in the test itself (untested here): cp /bin/expr f cleanup_fn rm f requires ./f 1 + 1 # We insist on expr behavior independent of argv[0] which will then skip the test (exit status 77) if a renamed /bin/expr doesn't behave the same as the original. But since we haven't heard complaints from anyone using Alpine as their primary development box (or any other non-Alpine but otherwise-busybox setup), we can make that change when such a complaint arises, rather than adding even more churn to your latest series of patches. Another thought - instead of skipping the test when /bin/expr is underwhelming when renamed, we could rewrite the test to avoid that dependency altogether. After all, we are building a test-fork-safe-execvpe binary that does exactly what we say it should; we could hack up test-fork-safe-execvpe.c to depend on argv[0] (I know - using the very thing I don't like about busybox) so that when run by its usual name it performs the unit test, but when run by the basename of f, it does just enough for us to prove that we indeed executed our desired binary (whether that be emulating a subset of /bin/expr, or something else). But again, that's more effort to code up than just skipping the test on a platform that's less likely to be a primary development station in the first place.> > The latest pipeline passed: > <https://gitlab.com/nbdkit/libnbd/-/pipelines/813280321>. > > Either way, I wanted to highlight the following commits on the list: > > 1 b29ff42e5d00 lib: account for realpath deficiency on some platformsI didn't even pay attention to this one before you pointed it out; but it is indeed a bug in busybox now that POSIX is moving towards standardizing realpath, so I've filed it: https://bugs.busybox.net/show_bug.cgi?id=15466> 2 65631e5dfff5 lib/utils: try to placate attribute placement restriction from gcc > 3 4cae20ccefaf Revert "lib: account for realpath deficiency on some platforms" > 4 f5a065aa3a9c ci: skip "lib/test-fork-safe-execvpe.sh" on Alpine Linux > > Thanks for investigating! > Laszlo >-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Laszlo Ersek
2023-Mar-22 06:41 UTC
[Libguestfs] [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
Hi Eric, lots of great stuff in your email, thanks! Let me respond: On 3/21/23 18:28, Eric Blake wrote:> On Tue, Mar 21, 2023 at 03:56:22PM +0100, Laszlo Ersek wrote: > >>> $ podman build -f ci/containers/alpine-edge.Dockerfile -t libnbd-alpine-edge >>> $ podman run -it --rm --userns=keep-id -v .:/repo:z -w /repo libnbd-alpine-edge bash >>> $ ./configure >>> $ make check >>> $ grep tmpd= lib/test-suite.log >>> + tmpd=/tmp/tmp.EMgKeF >>> $ /tmp/tmp.EMgKeF/bin/f 1 + 1 >>> f: applet not found >>> 0b748c9fe495:~$ >>> >>> So it looks like we need some way to work around busybox' insistance >>> that argv[0] determines which applet to run. >> >> I couldn't come up with a reproducer like yours. I couldn't figure >> out how to *quickly* get an interactive Alpine Linux environment with >> the test failing, and I also couldn't figure out how to trigger >> "verbose" test runs on gitlab, without polluting the master branch >> with debug patches. (I tried forking the project in my own space on >> gitlab, and pushed a debug patch with just "set -x" onto a non-master >> branch *there* -- but CI didn't start in response to that.) > > ci/README.rst is informative on that regards; I'm impressed at what > lcitool (from the libvirt-ci) project gives us for local > reproducibility of repeating various CI build failures.Indeed very informative. I was hoping I could skimp on reading it, but now I've bitten the bullet. It does offer precisely such podman commands (two of them) that I need for seamless testing in an Alpine Linux container. Unfortunately, podman does not work for me on RHEL-9.1 as a mere user! podman build -f ci/containers/alpine-edge.Dockerfile -t libnbd-alpine-edge it prints: Error: error creating tmpdir: mkdir /run/user/500: permission denied I've searched the web -- I've found nothing conclusive (for example, I don't have a .config/containers/storage.conf file, so its contents cannot be blamed for the misbehavior). However, at least I've determined that Giuseppe Scrivano is an expert in this area, so I've emailed him now. We'll see where that goes.>> So, I only installed a new Alpine Linux virtual machine -- what a >> pain *that* was --, and investigated what /usr/bin/expr was. When I >> found it was a symlink to /bin/busybox, I started looking for Alpine >> Linux specific tweaks that could replace busybox (in this role) with >> a real binary executable "expr" utility. >> >> I was relieved to find the following wiki article: >> >> https://wiki.alpinelinux.org/wiki/How_to_get_regular_stuff_working >> >> which promised -- I thought anyways -- a real coreutils package. >> >> Imagine my dismay when I found that, after installing coreutils with >> apk in the Alpine Linux VM, the symlink stayed in place, only its >> target binary changed from "/bin/busybox" to "coreutils". Well done, >> Alpine Linux, well done. > > Yeah, I remember the debates back when GNU coreutils was offered > patches to choose to build a multiname binary - it fits with Alpine's > goals and matches busybox's minimalist footprint, but it violates GNU > Coding Standards (which state that in general, an application's > behavior should not depend on argv[0], precisely so that renaming it > doesn't change its behavior).Haha, I didn't expect this. Background that's good to know!> >> >> So, no, this mess (= Alpine Linux) is not salvageable. The >> "lib/test-fork-safe-execvpe.sh" script depends on "expr" being >> functional under the name "f". And yes I want "f" to be a >> single-character filename; otherwise the nice tabular formatting of >> the script falls apart (or produces overlong lines). >> >> As last step, I learned about ci/skipped_tests, and used it to >> disable the test on alpine linux. > > That works, but is not always my favorite: it fixes CI, but does not > help any developer on a similar platform. I probably would have tried > this in the test itself (untested here): > > cp /bin/expr f > cleanup_fn rm f > requires ./f 1 + 1 # We insist on expr behavior independent of argv[0] > > which will then skip the test (exit status 77) if a renamed /bin/expr > doesn't behave the same as the original. But since we haven't heard > complaints from anyone using Alpine as their primary development box > (or any other non-Alpine but otherwise-busybox setup), we can make > that change when such a complaint arises, rather than adding even more > churn to your latest series of patches. > > Another thought - instead of skipping the test when /bin/expr is > underwhelming when renamed, we could rewrite the test to avoid that > dependency altogether. After all, we are building a > test-fork-safe-execvpe binary that does exactly what we say it should; > we could hack up test-fork-safe-execvpe.c to depend on argv[0] (I know > - using the very thing I don't like about busybox) so that when run by > its usual name it performs the unit test, but when run by the basename > of f, it does just enough for us to prove that we indeed executed our > desired binary (whether that be emulating a subset of /bin/expr, or > something else). But again, that's more effort to code up than just > skipping the test on a platform that's less likely to be a primary > development station in the first place.This morning I have thought of two other approaches: 1. Simply write another test program in C that prints "hello world" to the standard output. Totally deterministic behavior, a binary executable, and does not depend on any system-provided utility. (The original reason for using "expr" was to avoid the need for such a C source file.) 2. I've recalled that "test-fork-safe-execvpe.c" actually takes argv[0] for the program to execute, as another explicit argument. Correspondingly, the "run" function in "test-fork-safe-execvpe.sh" passes its $2 positional parameter *twice* to the "test-fork-safe-execvpe" binary: # $2 and onward are passed to $execvpe; note that $2 becomes *both* # "program-to-exec" for the helper *and* argv[0] for the program executed by the # helper. I *might* be able to add a "run0" function that does not perform this duplication, i.e., it lets the caller pass in argv[0] separately from "program-to-exec". Then (by my count) I should have to modify only 4 invocations of the form "f 1 + 1"; and in advance I hope that such a change wouldn't destroy the formatting of the script.> >> >> The latest pipeline passed: >> <https://gitlab.com/nbdkit/libnbd/-/pipelines/813280321>. >> >> Either way, I wanted to highlight the following commits on the list: >> >> 1 b29ff42e5d00 lib: account for realpath deficiency on some platforms > > I didn't even pay attention to this one before you pointed it out; but > it is indeed a bug in busybox now that POSIX is moving towards > standardizing realpath, so I've filed it: > https://bugs.busybox.net/show_bug.cgi?id=15466Thanks! I remember Austin Group bug #1457 from your earlier messages. Back then I noticed that said ticket called for a utility named "readlink", not "realpath". So it's not exactly that "realpath" (which I believe will remain non-standard?) should accept "--", but that a "readlink" utility, accepting "--", should exist. ... Ah wait, no, that's not right. Upon reading the ticket more carefully now, there is a late comment in the ticket that *does* mandate "realpath": <https://www.austingroupbugs.net/view.php?id=1457#c5898>. OK then! :) (The "Summary" field on Austin Group bug #1457 might be worth updating!) Thanks! Laszlo> >> 2 65631e5dfff5 lib/utils: try to placate attribute placement restriction from gcc >> 3 4cae20ccefaf Revert "lib: account for realpath deficiency on some platforms" >> 4 f5a065aa3a9c ci: skip "lib/test-fork-safe-execvpe.sh" on Alpine Linux >> >> Thanks for investigating! >> Laszlo >> >
Laszlo Ersek
2023-Mar-22 14:45 UTC
[Libguestfs] [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
On 3/21/23 18:28, Eric Blake wrote:> it is indeed a bug in busybox now that POSIX is moving towards > standardizing realpath, so I've filed it: > https://bugs.busybox.net/show_bug.cgi?id=15466I've found another busybox bug. The "/bin/sh" utility is provided by busybox as well (via the usual symlinking). Per POSIX, if execvp(file, { argv[0], argv[1], ..., NULL }) were to fail with -1/ENOEXEC, then execvp() must retry "as if" with execv(<shell path>, { argv[0], file, argv[1], ..., NULL }) In other words, if direct execution of "file" failed because "file" "has the appropriate access permission but has an unrecognized format", then execvp() is required to try executing "file" as a shell script. For that, <shell path> is left unspecified by POSIX, but the arguments of the shell are specified: - Argv[0] remains the same. That is, what we wanted "file" to know itself as, is what we now want *the shell executable* to know itself as. - argv[1] becomes "file" -- this is the script that the shell is supposed to run. - argv[2] and onwards become positional parameters $1, $2, ... for the shell script. And the argv[0] specification is what's violated by busybox, because if argv[0] is anything other than "sh", then the busybox binary doesn't recognize itself as the shell! The simplest way to demonstrate the bug is this: bash-5.2$ ( exec -a foobar /bin/sh <<< "echo hello" ) foobar: applet not found And then, another way to demonstrate the same busybox issue... lets us, in fact, discover a musl bug in turn!!! Consider the following C program (called "test-execvp.c"; the binary is called "test-execvp"): ------------- #include <stdio.h> #include <unistd.h> int main(void) { char *args[] = { "foobar", NULL }; execvp("hello.sh", args); perror("execvp"); return 1; } ------------- The file "hello.sh" resides in the current directory (same directory where "test-execvp" resides). Furthermore it has execute permission, and the following contents: ------- echo hello ------- Now consider the following command (from bash): $ PATH=.:$PATH test-execvp What is supposed to happen is this: (1) bash shall find test-execvp in the current directory per PATH, (2) execvp() shall find "hello.sh" in the current directory per PATH, (3) execvp() shall hit an internal failure -1/ENOEXEC, (4) execvp() shall then invoke the shell (under an unspecified pathname), (5) the shell shall get "foobar" for its argv[0], and "hello.sh" for its argv[1] (6) we shall see "hello" on the standard output. That's exactly what happens on Linux/glibc. (Note: this result has absolutely nothing to do with my execvpe() implementation, or libnbd in the first place.) Now, according to my above description of the busybox bug, we're tempted to believe that step (6) fails on Alpine Linux (using musl + busybox). We expect the busybox binary to be launched, via the /bin/sh symlink, in step (4), and we expect it to fail after step (5), due to it not recognizing "foobar" as an "applet name". It turns out however that step (4) does not happen. musl does not handle ENOEXEC:> bash-5.2$ PATH=.:$PATH test-execvp > execvp: Exec format errorexecvp() is not permitted by POSIX to fail with ENOEXEC. (Not considering the virtually impossible situation when executing the system shell binary *itself* fails with ENOEXEC.) I've checked the musl source too, at commit 7d756e1c04de ("dns: prefer monotonic clock for timeouts", 2023-02-12). The execvp() implementation: https://git.musl-libc.org/cgit/musl/tree/src/process/execvp.c does not handle ENOEXEC; what's more, the entire tree only sets errno=ENOEXEC in "ldso/dynlink.c". Laszlo
Possibly Parallel Threads
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH 0/3] reenable execvpe unit testing in Alpine Linux containers
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()