Eric Blake
2020-Mar-26 19:34 UTC
[Libguestfs] [nbdkit PATCH] tests: Swap nbdkit process order in test-nbd-tls-psk.sh
We're still seeing sporadic failures of 'nbdkit nbd tls=', and I'm still trying to come up with a root cause fix (it may involve smarter use of gnutls_bye() in libnbd). In the meantime, here's what we know: when the hang/failure happens, the 'nbdkit nbd tls=' client process is stuck in a poll() waiting to see EOF from the server, while the 'nbdkit example1' server process is stuck in a read waiting to see if the client will do a clean shutdown of the gnutls session. Sending SIGTERM to the client is not going to break the poll, but if we instead kill the server, that will cause the client to respond (perhaps with an error message that we ignore, but better than hanging). So, by rearranging the order in which we call our start_nbdkit function, we change the order in which we send SIGTERM to the two processes. And in turn, this becomes the first testsuite coverage of the 'nbd retry=' parameter, added back in commit 0bb76bc7. Signed-off-by: Eric Blake <eblake@redhat.com> --- My current setup does not seem to be hitting the testsuite hang/failure as frequently as Rich's setup, so for now I'm posting this in the hopes that we can see if it reduces the rate of testsuite failures. tests/test-nbd-tls-psk.sh | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tests/test-nbd-tls-psk.sh b/tests/test-nbd-tls-psk.sh index 7a477da9..547064ab 100755 --- a/tests/test-nbd-tls-psk.sh +++ b/tests/test-nbd-tls-psk.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # nbdkit -# Copyright (C) 2019 Red Hat Inc. +# Copyright (C) 2019-2020 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -64,14 +64,14 @@ files="$sock1 $sock2 $pid1 $pid2 nbd-tls-psk.out" rm -f $files cleanup_fn rm -f $files +# Run nbd plugin as intermediary; also test our retry code +start_nbdkit -P "$pid2" -U "$sock2" --tls=off nbd retry=5 \ + tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" + # Run encrypted server start_nbdkit -P "$pid1" -U "$sock1" \ --tls=require --tls-psk=keys.psk example1 -# Run nbd plugin as intermediary -start_nbdkit -P "$pid2" -U "$sock2" --tls=off \ - nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" - # Run unencrypted client qemu-img info --output=json -f raw "nbd+unix:///?socket=$sock2" > nbd-tls-psk.out -- 2.26.0.rc2
Richard W.M. Jones
2020-Mar-26 20:15 UTC
Re: [Libguestfs] [nbdkit PATCH] tests: Swap nbdkit process order in test-nbd-tls-psk.sh
On Thu, Mar 26, 2020 at 02:34:41PM -0500, Eric Blake wrote:> We're still seeing sporadic failures of 'nbdkit nbd tls=', and I'm > still trying to come up with a root cause fix (it may involve smarter > use of gnutls_bye() in libnbd). In the meantime, here's what we know: > when the hang/failure happens, the 'nbdkit nbd tls=' client process is > stuck in a poll() waiting to see EOF from the server, while the > 'nbdkit example1' server process is stuck in a read waiting to see if > the client will do a clean shutdown of the gnutls session. Sending > SIGTERM to the client is not going to break the poll, but if we > instead kill the server, that will cause the client to respond > (perhaps with an error message that we ignore, but better than > hanging). > > So, by rearranging the order in which we call our start_nbdkit > function, we change the order in which we send SIGTERM to the two > processes. And in turn, this becomes the first testsuite coverage of > the 'nbd retry=' parameter, added back in commit 0bb76bc7. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > My current setup does not seem to be hitting the testsuite > hang/failure as frequently as Rich's setup, so for now I'm posting > this in the hopes that we can see if it reduces the rate of testsuite > failures. > > tests/test-nbd-tls-psk.sh | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tests/test-nbd-tls-psk.sh b/tests/test-nbd-tls-psk.sh > index 7a477da9..547064ab 100755 > --- a/tests/test-nbd-tls-psk.sh > +++ b/tests/test-nbd-tls-psk.sh > @@ -1,6 +1,6 @@ > #!/usr/bin/env bash > # nbdkit > -# Copyright (C) 2019 Red Hat Inc. > +# Copyright (C) 2019-2020 Red Hat Inc. > # > # Redistribution and use in source and binary forms, with or without > # modification, are permitted provided that the following conditions are > @@ -64,14 +64,14 @@ files="$sock1 $sock2 $pid1 $pid2 nbd-tls-psk.out" > rm -f $files > cleanup_fn rm -f $files > > +# Run nbd plugin as intermediary; also test our retry code > +start_nbdkit -P "$pid2" -U "$sock2" --tls=off nbd retry=5 \ > + tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" > + > # Run encrypted server > start_nbdkit -P "$pid1" -U "$sock1" \ > --tls=require --tls-psk=keys.psk example1 > > -# Run nbd plugin as intermediary > -start_nbdkit -P "$pid2" -U "$sock2" --tls=off \ > - nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" > - > # Run unencrypted client > qemu-img info --output=json -f raw "nbd+unix:///?socket=$sock2" > nbd-tls-psk.out > > --It's worth a try, so ACK. It might be nice to add a comment into the test so we know why it's running the nbdkit instances in an unexpected order. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Eric Blake
2020-Mar-26 21:07 UTC
Re: [Libguestfs] [nbdkit PATCH] tests: Swap nbdkit process order in test-nbd-tls-psk.sh
On 3/26/20 3:15 PM, Richard W.M. Jones wrote:>> +# Run nbd plugin as intermediary; also test our retry code >> +start_nbdkit -P "$pid2" -U "$sock2" --tls=off nbd retry=5 \ >> + tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" >> + >> # Run encrypted server >> start_nbdkit -P "$pid1" -U "$sock1" \ >> --tls=require --tls-psk=keys.psk example1 >> >> -# Run nbd plugin as intermediary >> -start_nbdkit -P "$pid2" -U "$sock2" --tls=off \ >> - nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" >> - >> # Run unencrypted client >> qemu-img info --output=json -f raw "nbd+unix:///?socket=$sock2" > nbd-tls-psk.out >> >> -- > > It's worth a try, so ACK. > > It might be nice to add a comment into the test so we know why it's > running the nbdkit instances in an unexpected order.I did a bit more tweaking (hoisting the qemu-img info process to start prior to the encrypted example1 server, to better prove that retry is letting clients connect to our bridge before our bridge can get through to the server), and pushed. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Maybe Matching Threads
- [nbdkit PATCH] tests: Swap nbdkit process order in test-nbd-tls-psk.sh
- [nbdkit PATCH 3/3] nbd: Implement .list_exports
- [nbdkit PATCH v2] tests: Accommodate qemu-img 4.1 output change
- [nbdkit PATCH v2 3/3] server: More tests of stdin/out handling
- [PATCH v2 nbdkit 5/5] tests: Add a helper function which waits for nbdkit to start up.