Eric Blake
2019-Aug-06 15:46 UTC
[Libguestfs] [nbdkit PATCH] tests: Test for faster shutdown
The test relies on the timeout program. Also, since the nbdkit_nanosleep code relies on the Linux extension POLLRDHUP to detect early client closure, we may have to relax that part of the test when porting to platforms that lack ppoll/POLLRDHUP. (That is, while we should still be able to let a signal to the server shut down nbdkit quickly, it's harder to let a client close()ing its end cause nbdkit to shut down a single connection quickly when limited to a poll that can't identify EOF as a way to break a poll). Signed-off-by: Eric Blake <eblake@redhat.com> --- I'm pushing this along with the nbdkit_nanosleep patches, now that I'm reliably getting the test to fail in isolation and to pass with the sleep patches applied. However, I suspect that we will definitely need to fix things for other platforms, and/or skip this test if we can't easily make other platforms match what Linux can do with POLLRDHUP. tests/Makefile.am | 4 ++- tests/test-shutdown.sh | 76 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 1 deletion(-) create mode 100755 tests/test-shutdown.sh diff --git a/tests/Makefile.am b/tests/Makefile.am index 3d78e7a2..bc308c5d 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -110,6 +110,7 @@ EXTRA_DIST = \ test-rate-dynamic.sh \ test.rb \ test-readahead-copy.sh \ + test-shutdown.sh \ test-ssh.sh \ test.tcl \ test-shebang-perl.sh \ @@ -955,7 +956,8 @@ if HAVE_GUESTFISH TESTS += test-cow.sh endif HAVE_GUESTFISH -# delay filter test. +# delay filter tests. +TESTS += test-shutdown.sh LIBGUESTFS_TESTS += test-delay test_delay_SOURCES = test-delay.c test.h diff --git a/tests/test-shutdown.sh b/tests/test-shutdown.sh new file mode 100755 index 00000000..0ac9c1eb --- /dev/null +++ b/tests/test-shutdown.sh @@ -0,0 +1,76 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -x + +requires qemu-io --version +requires timeout --version + +sock=`mktemp -u` +files="shutdown.pid $sock" +cleanup_fn rm -f $files +fail=0 + +# Create a server that delays reads and forces only one connection at a time. +# This tests that the delay filter's use of nbdkit_nanosleep is able to +# react to both connection death and server shutdown without finishing +# the entire delay duration. +start_nbdkit -P shutdown.pid -U $sock --filter=noparallel --filter=delay \ + null 1M serialize=connections rdelay=10 + +# Early client death should not stall connection of second client. +trap '' ERR +timeout 1s qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'r 0 512' </dev/null +test $? = 124 || { + echo "Unexpected status; qemu-io should have been killed for timing out" + fail=1 +} +timeout 1s qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'quit' </dev/null +test $? = 0 || { + echo "Unexpected status; nbdkit was not responsive to allow second qemu-io" + fail=1 +} + +# The server's response to shutdown signals should not stall on delays +qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'r 0 512' </dev/null & +pid=$! +sleep 1 +kill -s INT "$(cat "$pidfile")" +sleep 1 +kill -s 0 "$(cat "$pidfile")" && { + echo "Unexpected status; nbdkit didn't react fast enough to signal" + fail=1 +} +wait $pid + +exit $fail -- 2.20.1
Eric Blake
2019-Aug-23 18:18 UTC
Re: [Libguestfs] [nbdkit PATCH] tests: Test for faster shutdown
On 8/6/19 10:46 AM, Eric Blake wrote:> The test relies on the timeout program. Also, since the > nbdkit_nanosleep code relies on the Linux extension POLLRDHUP to > detect early client closure, we may have to relax that part of the > test when porting to platforms that lack ppoll/POLLRDHUP. (That is, > while we should still be able to let a signal to the server shut down > nbdkit quickly, it's harder to let a client close()ing its end cause > nbdkit to shut down a single connection quickly when limited to a poll > that can't identify EOF as a way to break a poll). > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > I'm pushing this along with the nbdkit_nanosleep patches, now that I'm > reliably getting the test to fail in isolation and to pass with the > sleep patches applied. However, I suspect that we will definitely need > to fix things for other platforms, and/or skip this test if we can't > easily make other platforms match what Linux can do with POLLRDHUP.> +# The server's response to shutdown signals should not stall on delays > +qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'r 0 512' </dev/null & > +pid=$! > +sleep 1 > +kill -s INT "$(cat "$pidfile")" > +sleep 1 > +kill -s 0 "$(cat "$pidfile")" && { > + echo "Unexpected status; nbdkit didn't react fast enough to signal" > + fail=1Rich already hit this failure on a heavily-loaded slow machine ('make -j16), so I'm pushing this followup to turn a hard-coded one-second sleep into a loop (5 seconds for now, but we could go longer) to let things have longer to settle under load but to not tie up the test if it is still fast: From 05ccea5f9717c5117785c6cc6125596088ced328 Mon Sep 17 00:00:00 2001 From: Eric Blake <eblake@redhat.com> Date: Fri, 23 Aug 2019 11:59:57 -0500 Subject: [nbdkit PATCH] tests: Increase timeouts on test-shutdown.sh Testing that shutdown happens in a timely manner is inherently a racy aspect (it's hard to guarantee that a slow heavily-loaded machine will react as fast as desired. Rich hit a test failure when running 'make -j16', complaining "Unexpected status; nbdkit didn't react fast enough to signal". Improve the test by lengthening the delay of the server (so we have more time to react to other timing events even uner load), lengthening the timeout of the 'quit' command that normally completes quickly without load, and rewriting the 'kill -0' test for a completed server into a longer timeout via a loop (to let the test remain responsive and complete quickly on a machine without load, but to allow longer before failing the test on a machine with load). On my machine, the test still completes in under 3 seconds when run in isolation (most of that time spent idling), but this should make the test friendlier to a loaded slow machine. Fixes: 782ad5b2 Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- tests/test-shutdown.sh | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/tests/test-shutdown.sh b/tests/test-shutdown.sh index 0ac9c1eb..7d17045a 100755 --- a/tests/test-shutdown.sh +++ b/tests/test-shutdown.sh @@ -46,7 +46,7 @@ fail=0 # react to both connection death and server shutdown without finishing # the entire delay duration. start_nbdkit -P shutdown.pid -U $sock --filter=noparallel --filter=delay \ - null 1M serialize=connections rdelay=10 + null 1M serialize=connections rdelay=60 # Early client death should not stall connection of second client. trap '' ERR @@ -55,7 +55,7 @@ test $? = 124 || { echo "Unexpected status; qemu-io should have been killed for timing out" fail=1 } -timeout 1s qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'quit' </dev/null +timeout 5s qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'quit' </dev/null test $? = 0 || { echo "Unexpected status; nbdkit was not responsive to allow second qemu-io" fail=1 @@ -66,11 +66,14 @@ qemu-io -f raw "nbd+unix:///?socket=$sock" -c 'r 0 512' </dev/null & pid=$! sleep 1 kill -s INT "$(cat "$pidfile")" -sleep 1 -kill -s 0 "$(cat "$pidfile")" && { +for (( i=0; i < 5; i++ )); do + kill -s 0 "$(cat "$pidfile")" || break + sleep 1 +done +if [ $i = 5 ]; then echo "Unexpected status; nbdkit didn't react fast enough to signal" fail=1 -} +fi wait $pid exit $fail -- 2.21.0 -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Apparently Analagous Threads
- [PATCH nbdkit] freebsd: In nbdkit_nanosleep, fallback to calling nanosleep(2).
- [nbdkit PATCH 0/3] More responsive shutdown
- [PATCH nbdkit 1/3] server: Add GET_CONN macro, alias for threadlocal_get_conn ().
- [nbdkit PATCH 0/4] Fix testsuite hang with nbd-stadalone
- Re: [PATCH nbdkit] freebsd: In nbdkit_nanosleep, fallback to calling nanosleep(2).