Richard W.M. Jones
2020-Feb-27 15:12 UTC
[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
I'd like to propose we backport this to 1.18 and some earlier stable branches too. Rich.
Richard W.M. Jones
2020-Feb-27 15:12 UTC
[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
In tests/test-eval.sh we have a test which looks something like this: nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' if ! grep 'closed' file; then fail; fi However there was a race condition here. nbdkit exits when the --run command exits without waiting for the captive nbdkit subprocess. Thus we couldn't be sure that the final 'closed' message got written to the file. It worked most of the time, but on slow machines the test failed. This indicates that we ought to wait for the captive nbdkit to exit. One reason is so that plugin cleanup is done before we continue after the captive nbdkit. That should make shell scripts using captive nbdkit + any plugin that does significant cleanup on exit more predictable. This adds the appropriate call to waitpid but we still ignore the exit status of the captive nbdkit subprocess (in fact it is most likely to fail since it will sent a SIGTERM signal). --- server/captive.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/server/captive.c b/server/captive.c index 102a77ea..0a243d2b 100644 --- a/server/captive.c +++ b/server/captive.c @@ -171,12 +171,14 @@ run_command (void) r = EXIT_FAILURE; break; case 0: - /* Captive nbdkit still running; kill it, but no need to wait - * for it, as the --run program's exit status is good enough (if - * the captive nbdkit fails to exit after SIGTERM, we have a - * bigger bug to fix). + /* Captive nbdkit is still running; kill it. We want to wait + * for nbdkit to exit since that ensures all cleanup is done in + * the plugin before we return. However we don't care if nbdkit + * returns an error, the exit code we return always comes from + * the --run command. */ kill (pid, SIGTERM); + waitpid (pid, NULL, 0); break; default: /* Captive nbdkit exited unexpectedly; update the exit status. */ -- 2.25.0
Eric Blake
2020-Feb-27 15:39 UTC
Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
On 2/27/20 9:12 AM, Richard W.M. Jones wrote:> In tests/test-eval.sh we have a test which looks something like this: > > nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' > if ! grep 'closed' file; then fail; fi > > However there was a race condition here. nbdkit exits when the --run > command exits without waiting for the captive nbdkit subprocess. Thus > we couldn't be sure that the final 'closed' message got written to the > file. It worked most of the time, but on slow machines the test > failed. > > This indicates that we ought to wait for the captive nbdkit to exit. > One reason is so that plugin cleanup is done before we continue after > the captive nbdkit. That should make shell scripts using captive > nbdkit + any plugin that does significant cleanup on exit more > predictable. > > This adds the appropriate call to waitpid but we still ignore the exit > status of the captive nbdkit subprocess (in fact it is most likely to > fail since it will sent a SIGTERM signal). > --- > server/captive.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-)ACK. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Mar-07 11:18 UTC
Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
On Thu, Feb 27, 2020 at 03:12:11PM +0000, Richard W.M. Jones wrote:> In tests/test-eval.sh we have a test which looks something like this: > > nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' > if ! grep 'closed' file; then fail; fi > > However there was a race condition here. nbdkit exits when the --run > command exits without waiting for the captive nbdkit subprocess. Thus > we couldn't be sure that the final 'closed' message got written to the > file. It worked most of the time, but on slow machines the test > failed. > > This indicates that we ought to wait for the captive nbdkit to exit. > One reason is so that plugin cleanup is done before we continue after > the captive nbdkit. That should make shell scripts using captive > nbdkit + any plugin that does significant cleanup on exit more > predictable. > > This adds the appropriate call to waitpid but we still ignore the exit > status of the captive nbdkit subprocess (in fact it is most likely to > fail since it will sent a SIGTERM signal). > --- > server/captive.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/server/captive.c b/server/captive.c > index 102a77ea..0a243d2b 100644 > --- a/server/captive.c > +++ b/server/captive.c > @@ -171,12 +171,14 @@ run_command (void) > r = EXIT_FAILURE; > break; > case 0: > - /* Captive nbdkit still running; kill it, but no need to wait > - * for it, as the --run program's exit status is good enough (if > - * the captive nbdkit fails to exit after SIGTERM, we have a > - * bigger bug to fix). > + /* Captive nbdkit is still running; kill it. We want to wait > + * for nbdkit to exit since that ensures all cleanup is done in > + * the plugin before we return. However we don't care if nbdkit > + * returns an error, the exit code we return always comes from > + * the --run command. > */ > kill (pid, SIGTERM); > + waitpid (pid, NULL, 0); > break; > default: > /* Captive nbdkit exited unexpectedly; update the exit status. */I *thought* we'd fixed this, but it failed on a recent ppc64le build (of 1.19.2, so including this patch). At the moment I can't think of a plausible reason for why this is still failing. https://github.com/libguestfs/nbdkit/blob/be44e94a7d0494ba562cb4c5c08bc5825d612881/tests/test-eval.sh Rich. FAIL: test-eval.sh =================+ requires qemu-img --version + files='eval.out eval.missing' + rm -f eval.out eval.missing + cleanup_fn rm -f eval.out eval.missing + _cleanup_hook[${#_cleanup_hook[@]}]='rm -f eval.out eval.missing' + nbdkit -U - eval 'get_size=echo 64M' 'pread=dd if=/dev/zero count=$3 iflag=count_bytes' 'missing=echo "in missing: $@" >> eval.missing; exit 2' unload= --run 'qemu-img info $nbd' + cat eval.out image: nbd+unix://?socket=/tmp/nbdkitExyrix/socket file format: raw virtual size: 64 MiB (67108864 bytes) disk size: unavailable + grep '67108864 bytes' eval.out virtual size: 64 MiB (67108864 bytes) + cat eval.missing in missing: config_complete in missing: thread_model in missing: get_ready in missing: thread_model in missing: preconnect false in missing: open false in missing: can_write in missing: can_flush in missing: is_rotational in missing: can_multi_conn in missing: can_cache in missing: can_extents + grep 'in missing' eval.missing in missing: config_complete in missing: thread_model in missing: get_ready in missing: thread_model in missing: preconnect false in missing: open false in missing: can_write in missing: can_flush in missing: is_rotational in missing: can_multi_conn in missing: can_cache in missing: can_extents + grep 'in missing: config_complete' eval.missing in missing: config_complete + grep 'in missing: thread_model' eval.missing in missing: thread_model in missing: thread_model + grep 'in missing: can_write' eval.missing in missing: can_write + grep 'in missing: is_rotational' eval.missing in missing: is_rotational + grep 'in missing: get_ready' eval.missing in missing: get_ready + grep 'in missing: preconnect' eval.missing in missing: preconnect false + grep 'in missing: open' eval.missing in missing: open false + grep 'in missing: close' eval.missing ++ _run_cleanup_hooks ++ status=1 ++ set +e ++ trap '' INT QUIT TERM EXIT ERR ++ echo ./test-eval.sh: run cleanup hooks: exit code 1 ./test-eval.sh: run cleanup hooks: exit code 1 ++ (( i = 0 )) ++ (( i < 1 )) ++ rm -f eval.out eval.missing ++ (( ++i )) ++ (( i < 1 )) ++ exit 1 FAIL test-eval.sh (exit status: 1) -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Apparently Analagous Threads
- [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
- [nbdkit PATCH v2 1/6] server: Propagate unexpected nbdkit failure with --run
- [nbdkit PATCH 1/2] server: Propagate unexpected nbdkit failure with --run
- [PATCH 6/7] tests: Add tests using a captive daemon process.
- [PATCH v2 6/9] tests: Add tests using a captive daemon process.