Eric Blake
2020-Mar-20 19:47 UTC
[Libguestfs] [nbdkit PATCH] tests: Don't strand hung nbdkit processes
We've recently been hitting a transient hung rpm build when using make 4.3, due to a bug in test-nbd-tls-psk.sh. We're still trying to isolate the correct fix for that bug (it might be in the nbd plugin proper, but more likely is an issue in libnbd's tls handling of connection close), but in the meantime, this patch should at least cause a graceful fail rather than make hanging due to an nbdkit process that has hung. Signed-off-by: Eric Blake <eblake@redhat.com> --- tests/functions.sh.in | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/tests/functions.sh.in b/tests/functions.sh.in index 8406fcf9..e483505e 100644 --- a/tests/functions.sh.in +++ b/tests/functions.sh.in @@ -85,8 +85,8 @@ requires_ipv6_loopback () # instead. It's very unlikely that port 1 is open. if LANG=C qemu-img info "nbd:[::1]:1" |& \ grep -sq "Address family for hostname not supported"; then - echo "$0: IPv6 loopback is not available, skipping this test" - exit 77 + echo "$0: IPv6 loopback is not available, skipping this test" + exit 77 fi } @@ -139,7 +139,32 @@ start_nbdkit () fi # Kill nbdkit on exit. - cleanup_fn kill "$(cat "$pidfile")" + cleanup_fn kill_nbdkit "$(cat "$pidfile")" +} + +# kill_nbdkit pid +# +# End the nbkdit process that created pidfile. Exit this script with an +# error if nbdkit does not gracefully shutdown in a timely manner. +kill_nbdkit () +{ + local pid=$1 i + + # Start with SIGTERM, and wait for graceful exit + kill $pid + for i in {1..60}; do + if ! kill -0 $pid 2>/dev/null; then + break + fi + sleep 1 + done + # If nbdkit has not exited, try SIGKILL and fail the test + if test $i = 60; then + echo "error: nbdkit pid $pid failed to respond to SIGTERM" + kill -9 $pid + # Append our failure after other cleanups + cleanup_fn exit 1 + fi } # foreach_plugin f [args] -- 2.25.1
Eric Blake
2020-Mar-20 19:56 UTC
Re: [Libguestfs] [nbdkit PATCH] tests: Don't strand hung nbdkit processes
On 3/20/20 2:47 PM, Eric Blake wrote:> We've recently been hitting a transient hung rpm build when using make > 4.3, due to a bug in test-nbd-tls-psk.sh. We're still trying to > isolate the correct fix for that bug (it might be in the nbd plugin > proper, but more likely is an issue in libnbd's tls handling of > connection close), but in the meantime, this patch should at least > cause a graceful fail rather than make hanging due to an nbdkit > process that has hung. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > tests/functions.sh.in | 31 ++++++++++++++++++++++++++++--- > 1 file changed, 28 insertions(+), 3 deletions(-)To see the timeout in action, try applying this temporary patch, then running make -C tests check TESTS=test-nbd-extents.sh diff --git i/plugins/data/data.c w/plugins/data/data.c index cfbebc00..8c201ec0 100644 --- i/plugins/data/data.c +++ w/plugins/data/data.c @@ -81,10 +81,12 @@ data_load (void) } /* On unload, free the sparse array. */ +#include <unistd.h> static void data_unload (void) { free_sparse_array (sa); + sleep(15); } /* Parse the base64 parameter. */ diff --git i/tests/functions.sh.in w/tests/functions.sh.in index e483505e..22840c86 100644 --- i/tests/functions.sh.in +++ w/tests/functions.sh.in @@ -152,14 +152,14 @@ kill_nbdkit () # Start with SIGTERM, and wait for graceful exit kill $pid - for i in {1..60}; do + for i in {1..10}; do if ! kill -0 $pid 2>/dev/null; then break fi sleep 1 done # If nbdkit has not exited, try SIGKILL and fail the test - if test $i = 60; then + if test $i = 10; then echo "error: nbdkit pid $pid failed to respond to SIGTERM" kill -9 $pid # Append our failure after other cleanups -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Mar-20 21:59 UTC
Re: [Libguestfs] [nbdkit PATCH] tests: Don't strand hung nbdkit processes
On Fri, Mar 20, 2020 at 02:47:12PM -0500, Eric Blake wrote:> We've recently been hitting a transient hung rpm build when using make > 4.3, due to a bug in test-nbd-tls-psk.sh. We're still trying to > isolate the correct fix for that bug (it might be in the nbd plugin > proper, but more likely is an issue in libnbd's tls handling of > connection close), but in the meantime, this patch should at least > cause a graceful fail rather than make hanging due to an nbdkit > process that has hung. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > tests/functions.sh.in | 31 ++++++++++++++++++++++++++++--- > 1 file changed, 28 insertions(+), 3 deletions(-) > > diff --git a/tests/functions.sh.in b/tests/functions.sh.in > index 8406fcf9..e483505e 100644 > --- a/tests/functions.sh.in > +++ b/tests/functions.sh.in > @@ -85,8 +85,8 @@ requires_ipv6_loopback () > # instead. It's very unlikely that port 1 is open. > if LANG=C qemu-img info "nbd:[::1]:1" |& \ > grep -sq "Address family for hostname not supported"; then > - echo "$0: IPv6 loopback is not available, skipping this test" > - exit 77 > + echo "$0: IPv6 loopback is not available, skipping this test" > + exit 77 > fi > } > > @@ -139,7 +139,32 @@ start_nbdkit () > fi > > # Kill nbdkit on exit. > - cleanup_fn kill "$(cat "$pidfile")" > + cleanup_fn kill_nbdkit "$(cat "$pidfile")" > +} > + > +# kill_nbdkit pid > +# > +# End the nbkdit process that created pidfile. Exit this script with an > +# error if nbdkit does not gracefully shutdown in a timely manner. > +kill_nbdkit () > +{ > + local pid=$1 i > + > + # Start with SIGTERM, and wait for graceful exit > + kill $pid > + for i in {1..60}; do > + if ! kill -0 $pid 2>/dev/null; then > + break > + fi > + sleep 1 > + done > + # If nbdkit has not exited, try SIGKILL and fail the test > + if test $i = 60; then > + echo "error: nbdkit pid $pid failed to respond to SIGTERM" > + kill -9 $pid > + # Append our failure after other cleanups > + cleanup_fn exit 1Interesting, but OK. I wonder if we need a more general case for this? But ...> + fi > } > > # foreach_plugin f [args]... but this all looks good to me thanks: ACK I think I'll actually do another small release once you've pushed this because I will reenable Fedora tests that I disabled yesterday. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Maybe Matching Threads
- Re: Anyone seen build hangs (esp armv7, s390x) in Fedora?
- [PATCH v2 nbdkit 0/5] tests: Move common functions into tests/functions.sh
- [PATCH nbdkit 0/9] Create libnbdkit.so
- [PATCH v2 nbdkit 4/5] tests: Use a generic cleanup mechanism instead of explicit trap.
- Re: [PATCH nbdinfo v2] info: Add a --map option for displaying allocation metadata.