thr3ads.net - Libguestfs - [Libguestfs] sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.) [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2019-Sep-19 16:32 UTC

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

On 9/19/19 10:26 AM, Richard W.M. Jones wrote:> This filter can be used to transparently reopen/retry when a plugin
> fails.  The connection is closed and reopened which for most plugins
> causes them to attempt to reconnect to their source.
> 
> For example if doing a long or slow SSH copy:
> 
>   nbdkit -U - ssh host=remote /var/tmp/test.iso \
>     --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> 
> if the SSH connection or network goes down in the middle then the
> whole operation will fail.
> 
> By adding the retry filter:
> 
>   nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
>     --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> 
> this operation can recover from temporary failures in at least some
> circumstances.  The NBD connection (a local Unix domain socket in the
> example above) is not interrupted during retries, so NBD clients don't
> need to be taught how to retry - everything is handled internally by
> nbdkit.
> ---
Pretty cool how fast we've gone from idea to mostly-working code.
nbdkit is awesome for fast prototyping.
> +++ b/filters/retry/nbdkit-retry-filter.pod
> @@ -0,0 +1,103 @@
> +=head1 NAME
> +
> +nbdkit-retry-filter - reopen connection on error
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=retry PLUGIN
Worth adding [retry-parameter]... here?
> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-retry-filter> is a filter that transparently reopens the
> +plugin connection when an error is encountered.  It can be used to
> +make long-running copy operations reliable in the presence of
> +temporary network failures, without requiring any changes to the
> +plugin.
> +
> +A number of parameters are available to control:
> +
> +=over 4
> +
> +=item *
> +
> +How many times we retry.
> +
> +=item *
> +
> +The delay between retries, and whether we wait longer each time (known
> +as "exponential back-off").
> +
> +=item *
> +
> +If we reopen the plugin in read-only mode after the first failure.
> +
> +=back
> +
> +The default (no parameters) is designed to offer a happy medium
> +between recovering from short temporary failures but not doing
> +anything too bad when permanent or unrecoverable failures happen: We
> +retry 5 times with exponential back-off, waiting in total about 1
> +minute before we give up.
> +
> +=head1 EXAMPLE
> +
> +In this example we copy and convert a large file using
> +L<nbdkit-ssh-plugin(1)>, L<qemu-img(1)> and
L<nbdkit-captive(1)>.
> +
> + nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
> +   --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> +
> +Without I<--filter=retry> a temporary failure would cause the copy
to
> +fail (for example, the remote host’s firewall is restarted causing the
> +SSH connection to be closed).  Adding this filter means that it may be
> +possible to transparently recover.
> +
> +=head1 PARAMETERS
EXAMPLE before PARAMETERS is unusual.  Not necessarily wrong, though.
> +
> +=over 4
> +
> +=item B<retries=>N
> +
> +The number of times any single operation will be retried before we
> +give up and fail the operation.  The default is 5.
> +
> +=item B<retry-delay=>N
> +
> +The number of seconds to wait before retrying.  The default is 2
> +seconds.
Worth allowing sub-second times?
> +
> +=item B<retry-exponential=yes>
> +
> +Use exponential back-off.  The retry delay is doubled between each
> +retry.  This is the default.
> +
> +=item B<retry-exponential=no>
> +
> +Do not use exponential back-off.  The retry delay is the same between
> +each retry.
> +
> +=item B<retry-readonly=yes>
> +
> +As soon as a failure occurs and we retry, switch the underlying plugin
> +permanently to read-only mode.
> +
> +=item B<retry-readonly=no>
> +
> +Do not change the read-write/read-only mode of the plugin when
> +retrying.  This is the default.
Worth an intermediate mode that only forces readonly if a write
operation fails, but tries to preserve read/write if a read operation fails?

> +
> +static int retries = 5;         /* 0 = filter is disabled */
> +static int initial_delay = 2;
Would need a different unit here if you allow sub-second retry.
> +static bool exponential_backoff = true;
> +static bool force_readonly = false;
Initializing a static variable to 0 is sometimes worth avoiding (as it
can force .data instead of .bss for a slightly larger binary), but here
it makes sense.
> +
> +static int
> +retry_config (nbdkit_next_config *next, void *nxdata,
> +              const char *key, const char *value)
> +{
> +  int r;
> +
> +  if (strcmp (key, "retries") == 0) {
> +    if (sscanf (value, "%d", &retries) != 1 || retries <
0) {
> +      nbdkit_error ("cannot parse retries: %s", value);
> +      return -1;
> +    }
> +  }
> +  else if (strcmp (key, "retry-delay") == 0) {
> +    if (sscanf (value, "%d", &initial_delay) != 1 ||
initial_delay <= 0) {
sscanf("%d") cannot detect overflow; should this use strtol with errno
checking instead?

> +#define retry_config_help \
> +  "retries=<N>              Number of retries (default:
5).\n" \
> +  "retry-delay=<N>          Second to wait before retry
(default: 2).\n" \
Seconds
> +  "retry-exponential=yes|no Exponential back-off (default:
yes).\n" \
> +  "retry-readonly=yes|no    Force read-only on failure (default:
no).\n"
> +
> +struct retry_handle {
> +  int readonly;                 /* Save original readonly setting. */
> +};
> +
> +static void *
> +retry_open (nbdkit_next_open *next, void *nxdata, int readonly)

Out of review time today, I'll get back to the rest of this file later...



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2019-Sep-19 16:43 UTC

head link

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

On Thu, Sep 19, 2019 at 11:32:47AM -0500, Eric Blake
wrote:> On 9/19/19 10:26 AM, Richard W.M. Jones wrote:
> > +  else if (strcmp (key, "retry-delay") == 0) {
> > +    if (sscanf (value, "%d", &initial_delay) != 1 ||
initial_delay <= 0) {
> 
> sscanf("%d") cannot detect overflow; should this use strtol with
errno
> checking instead?
Can we not fix POSIX :-?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Richard W.M. Jones

2019-Sep-19 17:30 UTC

head link

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

Thanks for reviewing.  Since I'm not around tonight I pushed my latest
version here:

  https://github.com/rwmjones/nbdkit/commits/2019-retry-filter

But I need to go through your review comments in detail again to check
that I've covered everything.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Richard W.M. Jones

2019-Sep-20 12:44 UTC

head link

[Libguestfs] sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.)

On Thu, Sep 19, 2019 at 11:32:47AM -0500, Eric Blake
wrote:> > +static int
> > +retry_config (nbdkit_next_config *next, void *nxdata,
> > +              const char *key, const char *value)
> > +{
> > +  int r;
> > +
> > +  if (strcmp (key, "retries") == 0) {
> > +    if (sscanf (value, "%d", &retries) != 1 || retries
< 0) {
> > +      nbdkit_error ("cannot parse retries: %s", value);
> > +      return -1;
> > +    }
> > +  }
> > +  else if (strcmp (key, "retry-delay") == 0) {
> > +    if (sscanf (value, "%d", &initial_delay) != 1 ||
initial_delay <= 0) {
> 
> sscanf("%d") cannot detect overflow; should this use strtol with
errno
> checking instead?
AIUI glibc does detect overflow, but it's the C17/POSIX standards
which don't mandate it?

I had a longer email here where I was going to suggest replacing all
use of sscanf/strtoul with new nbdkit_parse_* functions, but we would
need a lot of them and they still wouldn't cover all the clever uses
we make of sscanf, eg:

 
https://github.com/libguestfs/nbdkit/blob/cfbbe17ee21a9a47a96a5ac7f4321d20ed90cad9/filters/error/error.c#L134

So I still think the right way to solve this is to fix C2x / POSIX so
it mandates what is possibly already the glibc behaviour.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Eric Blake

2019-Sep-20 13:31 UTC

head link

Re: [Libguestfs] sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.)

On 9/20/19 7:44 AM, Richard W.M. Jones wrote:
>> sscanf("%d") cannot detect overflow; should this use strtol
with errno
>> checking instead?
> 
> AIUI glibc does detect overflow, but it's the C17/POSIX standards
> which don't mandate it?
Proof that glibc does not detect overflow:
$ cat foo.c
#include <stdio.h>
#include <string.h>

void examine(const char *in)
{
  int i, r, n = -1;
  r = sscanf (in, "%i%n", &i, &n);
  printf ("parsed '%s', result %d, got '%d' after %d
characters\n",
          in, r, i, n);
  if (n >= 0 && n < strlen (in))
    printf (" but did not consume %s\n", in + n);
}

int main(void)
{
  examine ("");
  examine ("  ");
  examine ("0");
  examine ("-0");
  examine ("1x");
  examine ("2147483647");
  examine ("2147483648");
  examine ("-2147483647");
  examine ("-2147483648");
  examine ("-2147483649");
  examine ("4294967295");
  examine ("4294967296");
  examine ("9999999999");
  examine ("10000000000");
}
$ gcc -o foo -Wall foo.c
$ ./foo
parsed '', result -1, got '0' after 0 characters
parsed '  ', result -1, got '0' after 0 characters
parsed '0', result 1, got '0' after 1 characters
parsed '-0', result 1, got '0' after 2 characters
parsed '1x', result 1, got '1' after 1 characters
 but did not consume x
parsed '2147483647', result 1, got '2147483647' after 10
characters
parsed '2147483648', result 1, got '-2147483648' after 10
characters
parsed '-2147483647', result 1, got '-2147483647' after 11
characters
parsed '-2147483648', result 1, got '-2147483648' after 11
characters
parsed '-2147483649', result 1, got '2147483647' after 11
characters
parsed '4294967295', result 1, got '-1' after 10 characters
parsed '4294967296', result 1, got '0' after 10 characters
parsed '9999999999', result 1, got '1410065407' after 10
characters
parsed '10000000000', result 1, got '1410065408' after 11
characters

glibc happily performs 2's-complement overflow, rather than flagging it
in any discernable way.  And since C17 does not require integer overflow
detection, and no existing libc provides it, it is unlikely that POSIX
would mandate it either.

Yes, parsing integers from untrusted sources is a bear.  I overlook uses
of sscanf on kernel /proc/... files (if that would ever overflow, either
you've been attacked where someone is overriding the mount such that you
are not reading actual kernel procfs files but an attacker's overlay, or
the kernel is hosed - but either way, if the attacker can mount an
attack like that, you're hosed anyways).  But for parsing other strings,
if overflow detection is important, then *scanf() is unusable.

Parsing fixed-width (or max-width) strings using sscanf is fine.  And
even when parsing integers, you can use tricks such as "%5i" when
parsing a 16-bit value to guarantee that you don't overflow an int (but
you still have to check whether you overflowed a uint16_t, and it is
unspecified whether any leading whitespace or '-' counted against the 5
byte limit or was silently skipped during the scanf parse).
> 
> I had a longer email here where I was going to suggest replacing all
> use of sscanf/strtoul with new nbdkit_parse_* functions, but we would
> need a lot of them and they still wouldn't cover all the clever uses
> we make of sscanf, eg:
> 
>  
https://github.com/libguestfs/nbdkit/blob/cfbbe17ee21a9a47a96a5ac7f4321d20ed90cad9/filters/error/error.c#L134
> 
Unfortunately, such a scrub is the only way I know to be safe, if we
care about detecting integer overflow.
> So I still think the right way to solve this is to fix C2x / POSIX so
> it mandates what is possibly already the glibc behaviour.
POSIX won't change unless glibc actually implements overflow detection
first; but glibc probably has no incentive to change as it might slow
*scanf() down and you already have fine-grained control over integer
parsing via strtol().

And even if POSIX were to change, it would be years before all the
various libc will catch up.  So in the meantime, your best bet is to
avoid sscanf() when detecting input errors is important, on the grounds
that it is more portable that way than to rely on whether libraries have
been changed to make detection via sscanf() feasible.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Eric Blake

2019-Sep-30 16:20 UTC

head link

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

On 9/19/19 11:32 AM, Eric Blake wrote:> On 9/19/19 10:26 AM, Richard W.M. Jones wrote:
>> This filter can be used to transparently reopen/retry when a plugin
>> fails.  The connection is closed and reopened which for most plugins
>> causes them to attempt to reconnect to their source.
>>
>> +struct retry_handle {
>> +  int readonly;                 /* Save original readonly setting. */
>> +};
>> +
>> +static void *
>> +retry_open (nbdkit_next_open *next, void *nxdata, int readonly)
> 
> 
> Out of review time today, I'll get back to the rest of this file
later...
Resuming the review, I've found several issues, and have been pushing 
trivial patches as well as preparing more complex patches that I will 
post for review soon.  Found so far:

test-retry.sh used ((++i)) in the 'sh -' scriptlet, but failed to 
request that the scriptlet be executed in bash (failure if /bin/sh 
doesn't recognize that bashism)

retry failed to implement .flush

nbdkit --run '...' exits with $? set to 0 even if nbdkit died from a 
coredump such as an assertion failure (oops). I've got a solution for 
that which involves using waitpid() (I was afraid I'd have to go with 
something more complex like a non-blocking pipe that disappears when the 
child exits, or even use the new Linux pidfd, but waitid() is reliable 
enough when spawning our own child process).

retrying anything other than .pread fails with an assertion (which was 
too easy to overlook due to the --run bug above...), because h->can_FOO 
was not consulted even though backend.c claimed it would. This is 
particularly noticeable with the retry-readonly=1 flag (where 
h->can_write changes from 1 to 0).

retrying .extents fails with an assertion if the first attempt made any 
progress before failure

we aren't calling .prepare/.finalize correctly during a reopen.  Doesn't
matter when retry is the filter closest to the plugin, but will matter 
if another filter can live in-between

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Possibly Parallel Threads

Search for more apparently analagous threads

Libguestfs - Sep 2019 - sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.)

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

[Libguestfs] sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.)

Re: [Libguestfs] sscanf/stroul (was: Re: [PATCH nbdkit v3 2/3] Add new retry filter.)

Re: [Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

Possibly Parallel Threads