Richard W.M. Jones
2023-Oct-09 11:12 UTC
[Libguestfs] [nbdkit] Two POSIX questions for you ...
On Mon, Oct 09, 2023 at 12:57:14PM +0200, Laszlo Ersek wrote:> On 10/9/23 09:46, Richard W.M. Jones wrote: > > Hi Eric, a couple of POSIX questions for you from nbdkit. > > > > The first question is from an AUR comment on nbdkit: > > > > https://aur.archlinux.org/packages/nbdkit#comment-937381 > > > > I think there's a bash-ism in the logscript parameter in this test: > > > > https://gitlab.com/nbdkit/nbdkit/-/blame/master/tests/test-log-script-info.sh#L51 > > > > I believe it is happening in the $(( .. )) expression. How do we > > write that so it'll work in a posix shell? > > (I'm not Eric, but curious! :) ) > > Here I think we should just explicitly insist on bash.Unfortunately we can't do that, not easily anyway. We use bash for writing the test scripts because it's the sane choice for shell programming, and we declare /bin/bash (or something with 'env') at the top of each of those scripts. This introduces a *build* dependency from nbdkit to /bin/bash. That's all fine. However when nbdkit is installed in production & runs an external script, it uses system(3) which uses /bin/sh. That might not be bash, and indeed bash might not even be installed on the same machine as nbdkit. [Note: I think the whole Debian dash-for-/bin/sh is the stupidest idea I ever heard, but (a) it's a thing and (b) there's BSD and Macs where they don't want to use bash for licensing reasons.] Arguably for the tests we could have some way to cause nbdkit to use /bin/bash instead of /bin/sh when running external scripts, but it's major complexity to implement. So we gotta use a lowest common denominator for these external scripts, even in our tests. Note only this one test is affected. [Aside: Windows is a whole separate kettle of fish. Currently the Windows port of nbdkit doesn't support --run for other reasons, but if it did it would probably run some Windows-ish thing (COMMAND.COM? Power Shell?). I'm not sure how Windows behaves for the other external commands we try to run like logscript.]> Shell arrays are > a bash-specific feature, and the extent array is deeply ingrained. See > especially commit df63b23b6280 ("log: Use strict shell quoting for every > parameter displayed in the log file.", 2021-01-04). In particular the > assignment > > extents=(0x0 0x8000 "hole,zero" 0x8000 0x8000 "") > > turns "extents" into an array variable. > > In the bug tracker, comment > <https://aur.archlinux.org/packages/nbdkit#comment-937375> says, "Thus > this is an upstream issue; their scripts are calling sh when they should > be calling bash". I think that's correct; for logscript=..., we should > require /bin/bash in the manual, and execute the script with /bin/bash > explicitly, not just system().This would create a runtime dependency from nbdkit to /bin/bash which I'd like to avoid.> > Secondly while looking into this I was trying variations on: > > > > $ POSIXLY_CORRECT=1 ./nbdkit -fv data '1 2 3' --run 'nbdinfo $uri' > > > > This doesn't actually cause bash to emulate a posix shell, but it does > > uncover a different bug: > > > > nbdkit: error: raw|base64|data parameter must be specified exactly once > > > > This seems to be happening because getopt_long in wrapper.c behaves > > somehow differently parsing when POSIXLY_CORRECT is set. However I > > couldn't work out exactly why. > > > > https://gitlab.com/nbdkit/nbdkit/-/blob/master/wrapper.c?ref_type=heads#L278 > > > > I guess the wrapper ought to work if POSIXLY_CORRECT is set (?) > > IIRC, POSIXLY_CORRECT makes getopt() stop parsing options when the first > non-option argument (i.e., first operand) is reached. So "-f" and "-v" > are taken as options, then the two arguments "data" and '1 2 3' are > taken as operands, and then "--run" and the rest are taken as operands > as well. > > I think the following loop is relevant: > > /* Are there any non-option arguments? */ > if (optind < argc) { > /* Ensure any further parameters can never be parsed as options by > * real nbdkit. > */ > passthru ("--"); > > /* The first non-option argument is the plugin name. If it is a > * short name then rewrite it. > */ > if (is_short_name (argv[optind])) { > const char *language; > > /* Plugins written in scripting languages. */ > if (is_script_plugin (argv[optind], &language)) { > passthru_format ("%s/plugins/%s/.libs/nbdkit-%s-plugin." SOEXT, > builddir, language, language); > passthru_format ("%s/plugins/%s/nbdkit-%s-plugin", > builddir, argv[optind], argv[optind]); > } > /* Otherwise normal plugins written in C or other languages that > * compile to .so files. > */ > else { > passthru_format ("%s/plugins/%s/.libs/nbdkit-%s-plugin." SOEXT, > builddir, argv[optind], argv[optind]); > } > ++optind; > } > > /* Everything else is passed through without rewriting. */ > while (optind < argc) { > passthru (argv[optind]); > ++optind; > } > } > > With POSIXLY_CORRECT set, the "Everything else is passed through without > rewriting" logic extends to "--run" etc. I'm not sure how that would > lead to the specific error message, though.That's annoying :-( Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit
On 10/9/23 13:12, Richard W.M. Jones wrote:> On Mon, Oct 09, 2023 at 12:57:14PM +0200, Laszlo Ersek wrote: >> On 10/9/23 09:46, Richard W.M. Jones wrote: >>> Hi Eric, a couple of POSIX questions for you from nbdkit. >>> >>> The first question is from an AUR comment on nbdkit: >>> >>> https://aur.archlinux.org/packages/nbdkit#comment-937381 >>> >>> I think there's a bash-ism in the logscript parameter in this test: >>> >>> https://gitlab.com/nbdkit/nbdkit/-/blame/master/tests/test-log-script-info.sh#L51 >>> >>> I believe it is happening in the $(( .. )) expression. How do we >>> write that so it'll work in a posix shell? >> >> (I'm not Eric, but curious! :) ) >> >> Here I think we should just explicitly insist on bash. > > Unfortunately we can't do that, not easily anyway. > > We use bash for writing the test scripts because it's the sane choice > for shell programming, and we declare /bin/bash (or something with > 'env') at the top of each of those scripts. This introduces a *build* > dependency from nbdkit to /bin/bash. That's all fine. > > However when nbdkit is installed in production & runs an external > script, it uses system(3) which uses /bin/sh. That might not be bash, > and indeed bash might not even be installed on the same machine as > nbdkit. > > [Note: I think the whole Debian dash-for-/bin/sh is the stupidest idea > I ever heard, but (a) it's a thing and (b) there's BSD and Macs where > they don't want to use bash for licensing reasons.] > > Arguably for the tests we could have some way to cause nbdkit to use > /bin/bash instead of /bin/sh when running external scripts, but it's > major complexity to implement. > > So we gotta use a lowest common denominator for these external > scripts, even in our tests. Note only this one test is affected. > > [Aside: Windows is a whole separate kettle of fish. Currently the > Windows port of nbdkit doesn't support --run for other reasons, but if > it did it would probably run some Windows-ish thing (COMMAND.COM? > Power Shell?). I'm not sure how Windows behaves for the other > external commands we try to run like logscript.] > >> Shell arrays are >> a bash-specific feature, and the extent array is deeply ingrained. See >> especially commit df63b23b6280 ("log: Use strict shell quoting for every >> parameter displayed in the log file.", 2021-01-04). In particular the >> assignment >> >> extents=(0x0 0x8000 "hole,zero" 0x8000 0x8000 "") >> >> turns "extents" into an array variable. >> >> In the bug tracker, comment >> <https://aur.archlinux.org/packages/nbdkit#comment-937375> says, "Thus >> this is an upstream issue; their scripts are calling sh when they should >> be calling bash". I think that's correct; for logscript=..., we should >> require /bin/bash in the manual, and execute the script with /bin/bash >> explicitly, not just system(). > > This would create a runtime dependency from nbdkit to /bin/bash which > I'd like to avoid.The runtime dependency is already there in our logscript interface; the name=(a b c ... z) syntax is already bash-only, for defining a shell array. So the question is basically how to best emulate an array in the POSIX shell. Some (rough) options that occur to me: - Use named variables such as name_0, name_1, name_2, ... and so on. Requires eval tricks, and if the array is large, it creates many variables, which some shells (?) may have issues with. - Generate a shell function with a huge case statement; like "get_name 0" should print "a", "get_name 1" should print "b", etc. The caller would then do element=$(get_name $idx) - write the elements of the array to a text file (one, quoted, element per line), and then use a combination of "tail" and "head" for fetching the right line. Incredibly slow, of course. ... I'm sure stackoverflow has further / better ideas for emulating arrays in the POSIX shell. And then, because keeping the current (fast, but nonportable) solution would be nice, we should probably add a new argument for the log filter, "compat" or "posix" or some such, which would select the more restricted interface. Laszlo
Richard W.M. Jones
2023-Oct-09 14:51 UTC
[Libguestfs] [nbdkit] Two POSIX questions for you ...
So one thing we could do for this test is to require (for the test) that /bin/sh is bash, something like the patch below. I was only able to test this in the positive case. diff --git a/tests/test-log-script-info.sh b/tests/test-log-script-info.sh index fa9b2ed32..d65f6415d 100755 --- a/tests/test-log-script-info.sh +++ b/tests/test-log-script-info.sh @@ -38,6 +38,10 @@ requires_run requires_nbdinfo requires_filter log +# This test implicitly assumes /bin/sh is bash, see: +# https://listman.redhat.com/archives/libguestfs/2023-October/032767.html +BASH_VERSION=no requires /bin/sh -c 'test "x$BASH_VERSION" != "xno"' + if ! nbdinfo --help | grep -- --map ; then echo "$0: nbdinfo --map option required to run this test" exit 77 -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On Mon, Oct 09, 2023 at 12:12:02PM +0100, Richard W.M. Jones wrote:> On Mon, Oct 09, 2023 at 12:57:14PM +0200, Laszlo Ersek wrote: > > On 10/9/23 09:46, Richard W.M. Jones wrote: > > > Hi Eric, a couple of POSIX questions for you from nbdkit. > > > > > > The first question is from an AUR comment on nbdkit: > > > > > > https://aur.archlinux.org/packages/nbdkit#comment-937381 > > > > > > I think there's a bash-ism in the logscript parameter in this test: > > > > > > https://gitlab.com/nbdkit/nbdkit/-/blame/master/tests/test-log-script-info.sh#L51 > > > > > > I believe it is happening in the $(( .. )) expression. How do we > > > write that so it'll work in a posix shell? > > > > (I'm not Eric, but curious! :) ) > > > > Here I think we should just explicitly insist on bash. > > Unfortunately we can't do that, not easily anyway. > > We use bash for writing the test scripts because it's the sane choice > for shell programming, and we declare /bin/bash (or something with > 'env') at the top of each of those scripts. This introduces a *build* > dependency from nbdkit to /bin/bash. That's all fine. > > However when nbdkit is installed in production & runs an external > script, it uses system(3) which uses /bin/sh. That might not be bash, > and indeed bash might not even be installed on the same machine as > nbdkit.Correct.> > [Note: I think the whole Debian dash-for-/bin/sh is the stupidest idea > I ever heard, but (a) it's a thing and (b) there's BSD and Macs where > they don't want to use bash for licensing reasons.]It's not just Debian's dash, but alpine's choice of busybox (a variant of ash with distinct differences from dash) that can also bite you when using non-POSIX constructs. Debian pioneered a shellcheck app (also available in the ShellCheck package in Fedora) to try and detect scripts that depend on bashisms; but I'm not sure it would help the case of where you are embedding constructs that will be run down the road under system(3) in a much larger file that is not constrained by a limited shell.> > Arguably for the tests we could have some way to cause nbdkit to use > /bin/bash instead of /bin/sh when running external scripts, but it's > major complexity to implement.Anywhere you call system(3), you can pass "/path/to/known/shell -c '" + shell_quote(original script) + "'" as the arguments to system to control which shell syntax you support in the original script. But I agree that it is not always trivial.> > So we gotta use a lowest common denominator for these external > scripts, even in our tests. Note only this one test is affected. > > [Aside: Windows is a whole separate kettle of fish. Currently the > Windows port of nbdkit doesn't support --run for other reasons, but if > it did it would probably run some Windows-ish thing (COMMAND.COM? > Power Shell?). I'm not sure how Windows behaves for the other > external commands we try to run like logscript.]Yeah, it's not trivial to decide how to support --filter=log logscript=... under Windows - we could require the user to have something like Cygwin or mingw installed so that there is a 'sh.exe' somewhere, but that's not reliable; and native Windows scripting is so different from anything POSIX that it's probably easier to just declare logscript= unsupported on that platform (if the rest of the log filter can still be used).> > > Shell arrays are > > a bash-specific feature, and the extent array is deeply ingrained. See > > especially commit df63b23b6280 ("log: Use strict shell quoting for every > > parameter displayed in the log file.", 2021-01-04). In particular the > > assignment > > > > extents=(0x0 0x8000 "hole,zero" 0x8000 0x8000 "") > > > > turns "extents" into an array variable.Yep, that's a definite bashism in the log filter. Several ideas: maybe we change to call out to bash (directly via exec*(), or indirectly via the system("/path/to/bash -c 'original script'") trick). Maybe we come up with a way for the command-line client of the log filter to specify which syntax they expect to be handed to their log script (if they don't specify a syntax, they get output that is parseable only by bash). I'm open to other ideas.> > > > In the bug tracker, comment > > <https://aur.archlinux.org/packages/nbdkit#comment-937375> says, "Thus > > this is an upstream issue; their scripts are calling sh when they should > > be calling bash". I think that's correct; for logscript=..., we should > > require /bin/bash in the manual, and execute the script with /bin/bash > > explicitly, not just system(). > > This would create a runtime dependency from nbdkit to /bin/bash which > I'd like to avoid.Only for that one filter, not for nbdkit in general. But yeah, it's still a rather strong dependency, and if we could avoid it, that would be nicer.> > > > Secondly while looking into this I was trying variations on: > > > > > > $ POSIXLY_CORRECT=1 ./nbdkit -fv data '1 2 3' --run 'nbdinfo $uri' > > > > > > This doesn't actually cause bash to emulate a posix shell, but it does > > > uncover a different bug: > > > > > > nbdkit: error: raw|base64|data parameter must be specified exactly once > > > > > > This seems to be happening because getopt_long in wrapper.c behaves > > > somehow differently parsing when POSIXLY_CORRECT is set. However I > > > couldn't work out exactly why.glibc's getopt_long() is indeed sensitive to POSIXLY_CORRECT - if set, it refuses to allow options after non-options. You can force that behavior even when POSIXLY_CORRECT is not set by passing leading '+' in the string handed to getopt_long(); but the manual does not list a way to explicitly request option reordering as an extension to override POSIXLY_CORRECT.> > > > > > https://gitlab.com/nbdkit/nbdkit/-/blob/master/wrapper.c?ref_type=heads#L278 > > > > > > I guess the wrapper ought to work if POSIXLY_CORRECT is set (?) > > > > IIRC, POSIXLY_CORRECT makes getopt() stop parsing options when the first > > non-option argument (i.e., first operand) is reached. So "-f" and "-v" > > are taken as options, then the two arguments "data" and '1 2 3' are > > taken as operands, and then "--run" and the rest are taken as operands > > as well.Yes, that's why we're seeing the difference in behavior. If you were to write: POSIXLY_CORRECT=1 ./nbdkit -vf --run 'nbdinfo $uri' data '1 2 3' you'd get the desired behavior, but it doesn't read as nicely. So we really DO like the late-option extension to be usable, which somewhat implies that we should be (temporarily?) unsetting POSIXLY_CORRECT from an inherited environment around the part where we construct the passthrough arguments. But even if we temporarily unset POSIXLY_CORRECT in the wrapper, that still won't help that nbdkit itself will also treat '--run' as a non-option if it occurs after the plugin name. So if we rely on temporarily unsetting the variable, we'd have to do it in both the wrapper and in nbdkit proper. Otherwise, we'll have to audit every unit test that uses --run to rewrite them to pass options like --run first before the plugin name.> > > > I think the following loop is relevant: > > > > /* Are there any non-option arguments? */ > > if (optind < argc) { > > /* Ensure any further parameters can never be parsed as options by > > * real nbdkit. > > */ > > passthru ("--"); > > > > /* The first non-option argument is the plugin name. If it is a > > * short name then rewrite it. > > */ > > if (is_short_name (argv[optind])) { > > const char *language; > > > > /* Plugins written in scripting languages. */ > > if (is_script_plugin (argv[optind], &language)) { > > passthru_format ("%s/plugins/%s/.libs/nbdkit-%s-plugin." SOEXT, > > builddir, language, language); > > passthru_format ("%s/plugins/%s/nbdkit-%s-plugin", > > builddir, argv[optind], argv[optind]); > > } > > /* Otherwise normal plugins written in C or other languages that > > * compile to .so files. > > */ > > else { > > passthru_format ("%s/plugins/%s/.libs/nbdkit-%s-plugin." SOEXT, > > builddir, argv[optind], argv[optind]); > > } > > ++optind; > > } > > > > /* Everything else is passed through without rewriting. */ > > while (optind < argc) { > > passthru (argv[optind]); > > ++optind; > > } > > } > > > > With POSIXLY_CORRECT set, the "Everything else is passed through without > > rewriting" logic extends to "--run" etc. I'm not sure how that would > > lead to the specific error message, though.It's not just that loop in wrapper.c, but also the loop in server/main.c that is seeing --run as a non-option, at which point the plugin complains that the config callback is being reached too many times on key=value with the magic "data=" key. The error is the same as if you explicitly write: ./nbdkit -vf data data='1 2 3' data='--run' data='nbdinfo $uri' which is not subject to option reordering regardless of POSIXLY_CORRECT.> > That's annoying :-(Yes, if we want option reordering in spite of POSIXLY_CORRECT possibly leaking through the environment, we have quite a bit of work to do. Even if we don't want option reordering, we probably have quite a few test cases that need to be rewritten to list --run before the plugin name. I'd actually lean towards allowing option reordering in spite of POSIXLY_CORRECT (it makes for nicer command lines, and we already limit the set of non-options we accept because we will then hand them off to the .config plugin callback as name=value pairs, possibly with the magic name). But I'm not sure how invasive it will be to write such a patch. On the other hand, it should be easy enough to tweak our CI engine to have at least one platform with POSIXLY_CORRECT=1 set in the environment to make sure we still pass, if that's what we want. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org