thr3ads.net - Libguestfs - [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit. [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2020-Feb-27 15:12 UTC

[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

I'd like to propose we backport this to 1.18 and some earlier stable
branches too.

Rich.

Richard W.M. Jones

2020-Feb-27 15:12 UTC

head link

[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

In tests/test-eval.sh we have a test which looks something like this:

  nbdkit eval close='echo closed > file' --run 'qemu-img info
$nbd'
  if ! grep 'closed' file; then fail; fi

However there was a race condition here.  nbdkit exits when the --run
command exits without waiting for the captive nbdkit subprocess.  Thus
we couldn't be sure that the final 'closed' message got written to
the
file.  It worked most of the time, but on slow machines the test
failed.

This indicates that we ought to wait for the captive nbdkit to exit.
One reason is so that plugin cleanup is done before we continue after
the captive nbdkit.  That should make shell scripts using captive
nbdkit + any plugin that does significant cleanup on exit more
predictable.

This adds the appropriate call to waitpid but we still ignore the exit
status of the captive nbdkit subprocess (in fact it is most likely to
fail since it will sent a SIGTERM signal).
---
 server/captive.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/server/captive.c b/server/captive.c
index 102a77ea..0a243d2b 100644
--- a/server/captive.c
+++ b/server/captive.c
@@ -171,12 +171,14 @@ run_command (void)
       r = EXIT_FAILURE;
       break;
     case 0:
-      /* Captive nbdkit still running; kill it, but no need to wait
-       * for it, as the --run program's exit status is good enough (if
-       * the captive nbdkit fails to exit after SIGTERM, we have a
-       * bigger bug to fix).
+      /* Captive nbdkit is still running; kill it.  We want to wait
+       * for nbdkit to exit since that ensures all cleanup is done in
+       * the plugin before we return.  However we don't care if nbdkit
+       * returns an error, the exit code we return always comes from
+       * the --run command.
        */
       kill (pid, SIGTERM);
+      waitpid (pid, NULL, 0);
       break;
     default:
       /* Captive nbdkit exited unexpectedly; update the exit status. */
-- 
2.25.0

Eric Blake

2020-Feb-27 15:39 UTC

head link

Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

On 2/27/20 9:12 AM, Richard W.M. Jones wrote:> In tests/test-eval.sh we have a test which looks something like this:
> 
>    nbdkit eval close='echo closed > file' --run 'qemu-img
info $nbd'
>    if ! grep 'closed' file; then fail; fi
> 
> However there was a race condition here.  nbdkit exits when the --run
> command exits without waiting for the captive nbdkit subprocess.  Thus
> we couldn't be sure that the final 'closed' message got written
to the
> file.  It worked most of the time, but on slow machines the test
> failed.
> 
> This indicates that we ought to wait for the captive nbdkit to exit.
> One reason is so that plugin cleanup is done before we continue after
> the captive nbdkit.  That should make shell scripts using captive
> nbdkit + any plugin that does significant cleanup on exit more
> predictable.
> 
> This adds the appropriate call to waitpid but we still ignore the exit
> status of the captive nbdkit subprocess (in fact it is most likely to
> fail since it will sent a SIGTERM signal).
> ---
>   server/captive.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
ACK.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2020-Mar-07 11:18 UTC

head link

Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

On Thu, Feb 27, 2020 at 03:12:11PM +0000, Richard W.M. Jones
wrote:> In tests/test-eval.sh we have a test which looks something like this:
> 
>   nbdkit eval close='echo closed > file' --run 'qemu-img
info $nbd'
>   if ! grep 'closed' file; then fail; fi
> 
> However there was a race condition here.  nbdkit exits when the --run
> command exits without waiting for the captive nbdkit subprocess.  Thus
> we couldn't be sure that the final 'closed' message got written
to the
> file.  It worked most of the time, but on slow machines the test
> failed.
> 
> This indicates that we ought to wait for the captive nbdkit to exit.
> One reason is so that plugin cleanup is done before we continue after
> the captive nbdkit.  That should make shell scripts using captive
> nbdkit + any plugin that does significant cleanup on exit more
> predictable.
> 
> This adds the appropriate call to waitpid but we still ignore the exit
> status of the captive nbdkit subprocess (in fact it is most likely to
> fail since it will sent a SIGTERM signal).
> ---
>  server/captive.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/server/captive.c b/server/captive.c
> index 102a77ea..0a243d2b 100644
> --- a/server/captive.c
> +++ b/server/captive.c
> @@ -171,12 +171,14 @@ run_command (void)
>        r = EXIT_FAILURE;
>        break;
>      case 0:
> -      /* Captive nbdkit still running; kill it, but no need to wait
> -       * for it, as the --run program's exit status is good enough (if
> -       * the captive nbdkit fails to exit after SIGTERM, we have a
> -       * bigger bug to fix).
> +      /* Captive nbdkit is still running; kill it.  We want to wait
> +       * for nbdkit to exit since that ensures all cleanup is done in
> +       * the plugin before we return.  However we don't care if nbdkit
> +       * returns an error, the exit code we return always comes from
> +       * the --run command.
>         */
>        kill (pid, SIGTERM);
> +      waitpid (pid, NULL, 0);
>        break;
>      default:
>        /* Captive nbdkit exited unexpectedly; update the exit status. */
I *thought* we'd fixed this, but it failed on a recent ppc64le build
(of 1.19.2, so including this patch).  At the moment I can't think of
a plausible reason for why this is still failing.

https://github.com/libguestfs/nbdkit/blob/be44e94a7d0494ba562cb4c5c08bc5825d612881/tests/test-eval.sh

Rich.

FAIL: test-eval.sh
=================+ requires qemu-img --version
+ files='eval.out eval.missing'
+ rm -f eval.out eval.missing
+ cleanup_fn rm -f eval.out eval.missing
+ _cleanup_hook[${#_cleanup_hook[@]}]='rm -f eval.out eval.missing'
+ nbdkit -U - eval 'get_size=echo 64M' 'pread=dd if=/dev/zero
count=$3 iflag=count_bytes' 'missing=echo "in missing: $@"
>> eval.missing; exit 2' unload= --run 'qemu-img info $nbd'
+ cat eval.out
image: nbd+unix://?socket=/tmp/nbdkitExyrix/socket
file format: raw
virtual size: 64 MiB (67108864 bytes)
disk size: unavailable
+ grep '67108864 bytes' eval.out
virtual size: 64 MiB (67108864 bytes)
+ cat eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false 
in missing: open false 
in missing: can_write 
in missing: can_flush 
in missing: is_rotational 
in missing: can_multi_conn 
in missing: can_cache 
in missing: can_extents 
+ grep 'in missing' eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false 
in missing: open false 
in missing: can_write 
in missing: can_flush 
in missing: is_rotational 
in missing: can_multi_conn 
in missing: can_cache 
in missing: can_extents 
+ grep 'in missing: config_complete' eval.missing
in missing: config_complete
+ grep 'in missing: thread_model' eval.missing
in missing: thread_model
in missing: thread_model
+ grep 'in missing: can_write' eval.missing
in missing: can_write 
+ grep 'in missing: is_rotational' eval.missing
in missing: is_rotational 
+ grep 'in missing: get_ready' eval.missing
in missing: get_ready
+ grep 'in missing: preconnect' eval.missing
in missing: preconnect false 
+ grep 'in missing: open' eval.missing
in missing: open false 
+ grep 'in missing: close' eval.missing
++ _run_cleanup_hooks
++ status=1
++ set +e
++ trap '' INT QUIT TERM EXIT ERR
++ echo ./test-eval.sh: run cleanup hooks: exit code 1
./test-eval.sh: run cleanup hooks: exit code 1
++ (( i = 0 ))
++ (( i < 1 ))
++ rm -f eval.out eval.missing
++ (( ++i  ))
++ (( i < 1 ))
++ exit 1
FAIL test-eval.sh (exit status: 1)

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Reasonably Related Threads

Search for more possibly parallel threads

Libguestfs - Feb 2020 - [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

Re: [Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

Reasonably Related Threads