thr3ads.net - Libguestfs - [Libguestfs] nbdkit test-checkwrite.sh failing occasionally on ppc64le [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2022-Jul-19 14:17 UTC

[Libguestfs] nbdkit test-checkwrite.sh failing occasionally on ppc64le

On Tue, Jul 19, 2022 at 03:21:05PM +0200, Laszlo Ersek
wrote:> On 07/19/22 11:45, Richard W.M. Jones wrote:
> >
> > Fixed in:
> >
https://gitlab.com/nbdkit/nbdkit/-/commit/16a13fa9e1f3c4cbc0fae89e364f832e1a41fb40
> >
https://gitlab.com/nbdkit/nbdkit/-/commit/2c9a557f2ebe5a8647cc44dbdd7c157356783306
> >
https://gitlab.com/nbdkit/nbdkit/-/commit/aaf5484462cd06505b7a1a8103a899d3551c451b
> >
> 
> I think the new assert from commit aaf5484462cd is wrong;
[linking to the assert in the context of the function:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/filters/checkwrite/checkwrite.c#L163-L221

> from the
> filter's perspective, it is OK if the trim/zero request extends beyond
> the currently allocated disk size. The checkwrite_trim_zero() function
> starts with this comment:
> 
> > /* Trim and zero are effectively the same operation for this plugin.
> >  * We have to check that the underlying plugin contains all zeroes.
> >  *
> >  * Note we don't check that the extents exactly match since a
valid
> >  * copying operation is to either add sparseness (qemu-img convert -S)
> >  * or create a fully allocated target (nbdcopy --allocated).
> 
> The note states that the trim request/zero request (i.e.,
"count") may
> extend beyond the extent list that is available from the underlying
> plugin.
> 
> IOW, I think nbdkit_extents_full() does not guarantee that the final
> extent ends exactly where "count" does. Here's the function:
> 
> > /* This is a convenient wrapper around next_ops->extents which can
be
> >  * used from filters where you want to get a complete set of extents
> >  * covering the region [offset..offset+count-1].
> >  */
> > NBDKIT_DLL_PUBLIC struct nbdkit_extents *
> > nbdkit_extents_full (struct context *next_c,
> >                      uint32_t count, uint64_t offset, uint32_t flags,
> >                      int *err)
> > {
> >   struct nbdkit_next_ops *next = &next_c->next;
> >   struct nbdkit_extents *ret;
> >
> >   /* Clear REQ_ONE to ask the plugin for as much information as it is
> >    * willing to return (the plugin may still truncate if it is too
> >    * costly to provide everything).
> >    */
> >   flags &= ~NBDKIT_FLAG_REQ_ONE;
> >
> >   ret = nbdkit_extents_new (offset, offset+count);
> 
> So this sets ret->start=offset and ret->end=offset+count, and
> ret->extents to the empty vector.
> 
> >   if (ret == NULL) goto error0;
> >
> >   while (count > 0) {
> >     const uint64_t old_offset = offset;
> >     size_t i;
> >
> >     CLEANUP_EXTENTS_FREE struct nbdkit_extents *t
> >       = nbdkit_extents_new (offset, offset+count);
> >     if (t == NULL) goto error1;
> >
> >     if (next->extents (next_c, count, offset, flags, t, err) == -1)
> >       goto error0;
> >
> >     for (i = 0; i < nbdkit_extents_count (t); ++i) {
> >       const struct nbdkit_extent e = nbdkit_get_extent (t, i);
> >       if (nbdkit_add_extent (ret, e.offset, e.length, e.type) == -1)
> >         goto error1;
> 
> So here we add the underlying plugin's extents one by one, trimming
each
> in nbdkit_add_extent() if necessary. However, if the source extent list
> does not extend up to offset+count, then our output ("ret")
extent list
> will also not extend up to that point.
> 
> One example: what if the input file is completely empty (zero bytes in
> size)? I think that would even mean an extent list that has 0 entries.
My understanding of the contract of nbdkit_extents_full is that it
should always return information to the end of offset+count-1.
Moreover a plugin which advertises can_extents == true should always
return at least one extent for any request within the range of the
disk, this is the "a client can always make progress" requirement in
the spec:

https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#request-types
(I cannot link to it directly, but if you search for those
words you'll find it.)

In fact you can observe this if you try to create an nbdkit plugin
which doesn't return a full set of extents covering the disk, nbdkit
prints an error and returns an EINVAL to the client:

$ nbdkit -U - eval \
       get_size=' echo 1048576 ' \
       pread=' dd if=/dev/zero count=$3 iflag=count_bytes ' \
       extents='
         echo 0 262144 3
       ' \
       --run 'nbdinfo --map "$uri"'
nbdkit: eval[1]: error: extents: plugin must return at least one extent
nbdinfo: nbd_block_status: block-status: command failed: Invalid argument

(I used the same plugin with checkwrite and it gave the same error
from nbdkit_extents_full).

Anyway if that's true then I believe I have a different and simpler
argument for keeping the current assertion:

- When we call checkwrite_trim_zero, the client is asking to trim /
  zero the range [offset..offset+count-1]

- To check this operation is valid, checkwrite must inspect every byte
  in the range, whether that is by using extents or pread or whatever.

- At each place in the code, count is decremented (and offset
  incremented) by exactly the amount of bytes that checkwrite has
  inspected.

- Therefore at the end of this, count must be zero when we leave the
  function.  (We could validly move the assert later, but it's kind of
  obvious already for the else-clause.)

Note I'm not claiming the assertion will not fail!  Just that the
assertion is correct, and if it fails we've got a bug.

Rich.
> Thus, I think the properties of the extent list produced by
> nbdkit_extents_full() are crucial. They are:
> 
> - There may be 0 extents in the extent list.
> 
> - The first extent starts precisely at "offset".
> 
> - The (exclusive) end of the last extent is *less than or equal to*
>   "offset+count".
> 
> Based on these properties (especially the last two), I think we should /
> could modify the checkwrite_trim_zero() function as follows:
> 
> > diff --git a/filters/checkwrite/checkwrite.c
b/filters/checkwrite/checkwrite.c
> > index ce8fa137aa80..9abe64ca7aa0 100644
> > --- a/filters/checkwrite/checkwrite.c
> > +++ b/filters/checkwrite/checkwrite.c
> > @@ -164,91 +164,84 @@ static int
> >  checkwrite_trim_zero (nbdkit_next *next,
> >                        void *handle, uint32_t count, uint64_t offset,
> >                        uint32_t flags, int *err)
> >  {
> >    /* If the plugin supports extents, speed this up by using them. */
> >    if (next->can_extents (next)) {
> >      size_t i, n;
> >      CLEANUP_EXTENTS_FREE struct nbdkit_extents *exts > >       
nbdkit_extents_full (next, count, offset, 0, err);
> >      if (exts == NULL)
> >        return -1;
> >
> >      n = nbdkit_extents_count (exts);
> > -    for (i = 0; i < n && count > 0; ++i) {
> > +    for (i = 0; i < n; ++i) {
> >        const struct nbdkit_extent e = nbdkit_get_extent (exts, i);
> > -      const uint64_t next_extent_offset = e.offset + e.length;
> > +      uint64_t left_in_extent = e.length;
> > +
> > +      assert (count >= left_in_extent);
> >
> >        /* Anything that reads back as zero is good. */
> >        if ((e.type & NBDKIT_EXTENT_ZERO) != 0) {
> > -        const uint64_t zerolen = MIN (count, next_extent_offset -
offset);
> > -
> > -        offset += zerolen;
> > -        count -= zerolen;
> > +        offset += left_in_extent;
> > +        count -= left_in_extent;
> >          continue;
> >        }
> >
> >        /* Otherwise we have to read the underlying data and check. */
> >        if (flags & NBDKIT_FLAG_FAST_ZERO) {
> >          *err = ENOTSUP;
> >          return -1;
> >        }
> > -      while (offset < next_extent_offset && count > 0)
{
> > -        size_t buflen = MIN (MAX_REQUEST_SIZE, count);
> > -        buflen = MIN (buflen, next_extent_offset - offset);
> > +      while (left_in_extent > 0) {
> > +        size_t buflen = MIN (MAX_REQUEST_SIZE, left_in_extent);
> >
> >          CLEANUP_FREE char *buf = malloc (buflen);
> >          if (buf == NULL) {
> >            *err = errno;
> >            nbdkit_error ("malloc: %m");
> >            return -1;
> >          }
> >
> >          if (next->pread (next, buf, buflen, offset, 0, err) == -1)
> >            return -1;
> >          if (! is_zero (buf, buflen))
> >            return data_does_not_match (err);
> >
> >          count -= buflen;
> >          offset += buflen;
> > +        left_in_extent -= buflen;
> >        }
> >      } /* for extent */
> > -
> > -    /* Assert that the loop above has actually checked the whole
> > -     * region.  If this fires then it could be because
> > -     * nbdkit_extents_full isn't returning a full range of
extents for
> > -     * the whole region ... or maybe the loop above is wrong.
> > -     */
> > -    assert (count == 0);
> >    }
> >
> >    /* Otherwise the plugin does not support extents, so do this the
> >     * slow way.
> >     */
> >    else {
> >      ...
> >    }
> >
> >    return 0;
> >  }
> 
> Note that, with this change, the *only* things we use "count" for
are:
> 
> - the initial nbdkit_extents_full() call,
> 
> - an assertion that we're still "inside count".
> 
> This makes sense, based on the properties of nbdkit_extents_full().
> nbdkit_extents_full() gives us an extent list that starts precisely at
> "offset", and extends as far as possible towards
"offset+count". Once we
> have those extents, it's enough to just check the extents'
contents; we
> can ignore "count" altogether.
> 
> If the current extent is a zero extent, we can skip it in full; if it is
> a data extent, then we need to chunk it with MAX_REQUEST_SIZE. Either
> way, "count" will never run out.
> 
> (NB: the "assert (count == 0)" is an independent revert of
> aaf5484462cd.)
> 
> A more aggressive simplification would be (for the same end goal) the
> following. This eliminates the usage of the "count" variable
altogether,
> except in the initial nbdkit_extents_full() call:
> 
> > diff --git a/filters/checkwrite/checkwrite.c
b/filters/checkwrite/checkwrite.c
> > index ce8fa137aa80..9dac501904eb 100644
> > --- a/filters/checkwrite/checkwrite.c
> > +++ b/filters/checkwrite/checkwrite.c
> > @@ -164,91 +164,80 @@ static int
> >  checkwrite_trim_zero (nbdkit_next *next,
> >                        void *handle, uint32_t count, uint64_t offset,
> >                        uint32_t flags, int *err)
> >  {
> >    /* If the plugin supports extents, speed this up by using them. */
> >    if (next->can_extents (next)) {
> >      size_t i, n;
> >      CLEANUP_EXTENTS_FREE struct nbdkit_extents *exts > >       
nbdkit_extents_full (next, count, offset, 0, err);
> >      if (exts == NULL)
> >        return -1;
> >
> >      n = nbdkit_extents_count (exts);
> > -    for (i = 0; i < n && count > 0; ++i) {
> > +    for (i = 0; i < n; ++i) {
> >        const struct nbdkit_extent e = nbdkit_get_extent (exts, i);
> > -      const uint64_t next_extent_offset = e.offset + e.length;
> > +      uint64_t left_in_extent = e.length;
> >
> >        /* Anything that reads back as zero is good. */
> >        if ((e.type & NBDKIT_EXTENT_ZERO) != 0) {
> > -        const uint64_t zerolen = MIN (count, next_extent_offset -
offset);
> > -
> > -        offset += zerolen;
> > -        count -= zerolen;
> > +        offset += left_in_extent;
> >          continue;
> >        }
> >
> >        /* Otherwise we have to read the underlying data and check. */
> >        if (flags & NBDKIT_FLAG_FAST_ZERO) {
> >          *err = ENOTSUP;
> >          return -1;
> >        }
> > -      while (offset < next_extent_offset && count > 0)
{
> > -        size_t buflen = MIN (MAX_REQUEST_SIZE, count);
> > -        buflen = MIN (buflen, next_extent_offset - offset);
> > +      while (left_in_extent > 0) {
> > +        size_t buflen = MIN (MAX_REQUEST_SIZE, left_in_extent);
> >
> >          CLEANUP_FREE char *buf = malloc (buflen);
> >          if (buf == NULL) {
> >            *err = errno;
> >            nbdkit_error ("malloc: %m");
> >            return -1;
> >          }
> >
> >          if (next->pread (next, buf, buflen, offset, 0, err) == -1)
> >            return -1;
> >          if (! is_zero (buf, buflen))
> >            return data_does_not_match (err);
> >
> > -        count -= buflen;
> >          offset += buflen;
> > +        left_in_extent -= buflen;
> >        }
> >      } /* for extent */
> > -
> > -    /* Assert that the loop above has actually checked the whole
> > -     * region.  If this fires then it could be because
> > -     * nbdkit_extents_full isn't returning a full range of
extents for
> > -     * the whole region ... or maybe the loop above is wrong.
> > -     */
> > -    assert (count == 0);
> >    }
> >
> >    /* Otherwise the plugin does not support extents, so do this the
> >     * slow way.
> >     */
> >    else {
> >      ...
> >    }
> >
> >    return 0;
> >  }
> >
> 
> Laszlo
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Laszlo Ersek

2022-Jul-19 14:41 UTC

head link

[Libguestfs] nbdkit test-checkwrite.sh failing occasionally on ppc64le

On 07/19/22 16:17, Richard W.M. Jones wrote:
> Note I'm not claiming the assertion will not fail!  Just that the
> assertion is correct, and if it fails we've got a bug.
Fair enough, but how do we explain the second paragraph of the leading comment
on the function then?

 * Note we don't check that the extents exactly match since a valid
 * copying operation is to either add sparseness (qemu-img convert -S)
 * or create a fully allocated target (nbdcopy --allocated).

I think it directly contradicts the assertion. That, or I don't understand
the comment at all.

(It's from commit 2d0954dd6431, "New filter: checkwrite",
2020-12-22.)

Laszlo

Libguestfs - Jul 2022 - nbdkit test-checkwrite.sh failing occasionally on ppc64le

[Libguestfs] nbdkit test-checkwrite.sh failing occasionally on ppc64le

[Libguestfs] nbdkit test-checkwrite.sh failing occasionally on ppc64le