thr3ads.net - Libguestfs - [Libguestfs] libnbd: When are callbacks freed [Jul 2023]

If this information is useful, please help other people find it:
Share via:

Tage Johansson

2023-Jul-12 15:19 UTC

[Libguestfs] libnbd: When are callbacks freed

Hello,


While writing some tests for the Rust bindings, I discovered a memory 
leak with Valgrind due to a completion callback not being freed. More 
specificly, the completion callback of `aio_opt_info()` doesn't seem to 
be if `aio_opt_info()` failes. In this case, `aio_opt_info()` was called 
in the connected state, so it should indeed fail, but it should perhaps 
also call the free function associated with the callback.


According to libnbd(3) <https://libguestfs.org/libnbd.3.html>:
> Callback lifetimes
> You can associate an optional free function with callbacks. Libnbd 
> will call this function when the callback will not be called again by 
> libnbd, including in the case where the API fails.
After studying the code in lib/opt.c, I think that the situation with 
synchronous callbacks like `nbd_opt_list()` is even worse. Here the list 
callback will not be freed if the internal call to 
`nbd_unlocked_aio_opt_list()` failes, but it will be freed if the 
completion callback got an error which is returned by `nbd_opt_list()`.

> /* Issue NBD_OPT_LIST and wait for the reply. */
> int
> nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
> {
> ? struct list_helper s = { .list = *list };
> ? nbd_list_callback l = { .callback = list_visitor, .user_data = &s };
> ? nbd_completion_callback c = { .callback = list_complete, .user_data 
> = &s };
>
> ? if (nbd_unlocked_aio_opt_list (h, &l, &c) == -1)
Here the list callback has not been freed.> ??? return -1;
>
> ? SET_CALLBACK_TO_NULL (*list);
> ? if (wait_fo_option (h) == -1)
> ??? return -1;
> ? if (s.err) {
But here I think that the callback has been freed.> ??? set_error (s.err, "server replied with error to list
request");
> ??? return -1;
> ? }
> ? return s.count;
> }
The problem is that if `nbd_opt_list()` returns an error, I as a user 
cannot know wether I should free the data associated with the list 
callback myself or if that has been done already.


Maybe I am just misunderstanding the code or the API, but if not then I 
wonder how I should know when I need to free the user data associated 
with a callback myself and when that is done by libnbd?


Best regards,

Tage

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://listman.redhat.com/archives/libguestfs/attachments/20230712/5db33101/attachment.htm>

Richard W.M. Jones

2023-Jul-12 16:56 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On Wed, Jul 12, 2023 at 03:19:59PM +0000, Tage Johansson
wrote:> Hello,
> 
> While writing some tests for the Rust bindings, I discovered a
> memory leak with Valgrind due to a completion callback not being
> freed. More specificly, the completion callback of `aio_opt_info()`
> doesn't seem to be if `aio_opt_info()` failes. In this case,
> `aio_opt_info()` was called in the connected state, so it should
> indeed fail, but it should perhaps also call the free function
> associated with the callback.
> 
> According to libnbd(3):
> 
>     Callback lifetimes
>     You can associate an optional free function with callbacks. Libnbd will
>     call this function when the callback will not be called again by
libnbd,
>     including in the case where the API fails.
I wanted to see / remember exactly how this was supposed to work, so I
picked nbd_aio_pread which has a callback.  Does this free the
callback in all basic error cases?  The answer is yes:

  int64_t
  nbd_aio_pread (
    struct nbd_handle *h, void *buf, size_t count, uint64_t offset,
    nbd_completion_callback completion_callback, uint32_t flags
  )
  {
  ...
    p = aio_pread_in_permitted_state (h);
    if (unlikely (!p)) {
      ret = -1;
      goto out;
    }
  ...
  out:
  ...
   FREE_CALLBACK (completion_callback);
> After studying the code in lib/opt.c, I think that the situation with
> synchronous callbacks like `nbd_opt_list()` is even worse. Here the list
> callback will not be freed if the internal call to
`nbd_unlocked_aio_opt_list()
> ` failes, but it will be freed if the completion callback got an error
which is
> returned by `nbd_opt_list()`.
> 
> 
>     /* Issue NBD_OPT_LIST and wait for the reply. */
>     int
>     nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
>     {
>     ? struct list_helper s = { .list = *list };
>     ? nbd_list_callback l = { .callback = list_visitor, .user_data = &s
};
>     ? nbd_completion_callback c = { .callback = list_complete, .user_data =
&s
>     };
> 
>     ? if (nbd_unlocked_aio_opt_list (h, &l, &c) == -1)
> 
> Here the list callback has not been freed.
Yes, I believe this is just a bug in the implementation of
nbd_unlocked_opt_list which should be fixed (to use FREE_CALLBACK
along this path).
>     ??? return -1;
> 
>     ? SET_CALLBACK_TO_NULL (*list);
>     ? if (wait_fo_option (h) == -1)
>     ??? return -1;
>     ? if (s.err) {
> 
> But here I think that the callback has been freed.
It took me a while to figure it out, but it seems so.  The callbacks
are copied to h->opt_cb.fn.list and opt_cb.fn.context and then freed
in the state machine.
>     ??? set_error (s.err, "server replied with error to list
request");
>     ??? return -1;
>     ? }
>     ? return s.count;
>     }
>
> The problem is that if `nbd_opt_list()` returns an error, I as a
> user cannot know wether I should free the data associated with the
> list callback myself or if that has been done already.
It seems like it's just a missing call to FREE_CALLBACK.
> Maybe I am just misunderstanding the code or the API, but if not
> then I wonder how I should know when I need to free the user data
> associated with a callback myself and when that is done by libnbd?
No, it should work as in the documentation, else it's a bug.

Rich.
> 
> Best regards,
> 
> Tage
> 
> 
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Eric Blake

2023-Jul-12 20:33 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On Wed, Jul 12, 2023 at 03:19:59PM +0000, Tage Johansson
wrote:> Hello,
> 
> 
> While writing some tests for the Rust bindings, I discovered a memory leak
> with Valgrind due to a completion callback not being freed. More
specificly,
> the completion callback of `aio_opt_info()` doesn't seem to be if
> `aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
> connected state, so it should indeed fail, but it should perhaps also call
> the free function associated with the callback.
Can you paste the valgrind output showing the leak?

It is important to view the generated lib/api.c.  Note that
nbd_aio_opt_info() unconditionally calls any remaining free callback
prior to returning -1.  If you called nbd_aio_opt_info() from the
wrong state, we never even get to the nbd_unlocked_aio_opt_info() code
in lib/opt.c.  So the only thing we really have to worry about is if
we have any exit paths where we have claimed ownership of the callback
by transferring it elsewhere (that is, where we called
SET_CALLBACK_TO_NULL because we have set up h->opt_cb) but still
return -1 without our claimed ownership still having a reachable path
for still calling the callback at the right time.
> 
> 
> According to libnbd(3) <https://libguestfs.org/libnbd.3.html>:
> 
> > Callback lifetimes
> > You can associate an optional free function with callbacks. Libnbd
will
> > call this function when the callback will not be called again by
libnbd,
> > including in the case where the API fails.
> 
> After studying the code in lib/opt.c, I think that the situation with
> synchronous callbacks like `nbd_opt_list()` is even worse. Here the list
> callback will not be freed if the internal call to
> `nbd_unlocked_aio_opt_list()` failes, but it will be freed if the
completion
> callback got an error which is returned by `nbd_opt_list()`.
> 
> 
> > /* Issue NBD_OPT_LIST and wait for the reply. */
> > int
> > nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
> > {
> > ? struct list_helper s = { .list = *list };
> > ? nbd_list_callback l = { .callback = list_visitor, .user_data =
&s };
> > ? nbd_completion_callback c = { .callback = list_complete, .user_data
> > &s };
> > 
> > ? if (nbd_unlocked_aio_opt_list (h, &l, &c) == -1)
> Here the list callback has not been freed.
> > ??? return -1;
True at this point, but it also has not been NULL'd.  But
nbd_unlocked_aio_opt_list() can ONLY return -1 if fails prior to the
SET_CALLBACK_TO_NULL lines, so we also know *list still contains the
free callback, and the wrapper nbd_opt_list() in lib/api.c WILL call
the callback one function up in the call stack, so there is no leak
here.

[meta-comment: I find that when repling inline, which is common
practice in patch reviews, it helps if I include a blank line before
and after anything I inject, so my reply text doesn't get overlooked
when scanning the wall of quoted text]
> > 
> > ? SET_CALLBACK_TO_NULL (*list);
> > ? if (wait_for_option (h) == -1)
> > ??? return -1;
But here, we've explicitly taken ownership of *list (that is,
successful nbd_unlocked_aio_opt_list() transferred the callback over
to h->opt_cb.fn.list), so if wait_for_option() does not free the
callback, then we must do so or we indeed have a bug.  But to actually
figure that out, I have to inspect the state machine, because
wait_for_option() itself does not currently do anything with the
callbacks, on success or on failure.

/me goes and digs further...

I _do_ see that nbd_internal_free_option() calls the appropriate
callbacks.  So the question is if we can guarantee that h->opt_cb was
set, do we always reach nbd_internal_free_option() prior to
nbd_unlocked_poll() progressing the state machine to the point that we
either succeeded or transitioned out of state_connecting.



When I look over the older lib/rw.c, I remember spending quite a bit
of time patching it to get the handling correct.  Using
nbd_unlocked_pread_structured as the comparison: if
nbd_unlocked_aio_pread_structured() returns -1, then either *chunk has
not yet been set to NULL (so the api.c nbd_pread_structured will call
the free callback), or the code ensured that the transfer semantics
happened (and now the state machine will take care of calling the free
callback at the right point in time on cmd->cb).  There's even code in
nbd_internal_common_command() that frees the callback before returning
-1 when the command could not be queued to the state machine.  I even
remember how much time I spent writing tests/test-errors.c (which Rich
then later subdivided into finer-grained tests) covering various
aspects (failure when called in the wrong state, failure with
parameter validation before contacting the server, failure returned by
the server, ...) and that each of them didn't leak.

But I would not be surprised if my more recent additions of lib/opt.c
missed some steps (unlike commands, you don't have parallel options in
flight, so the code is simpler - but maybe I simplified too much),
where we could have a leak.

At any rate, I could argue that in the synchronous nbd_unlocked_opt
functions, we should probably be able to assert that the callback was
already NULL'd by the successful nbd_unlocked_aio_opt counterpart on
success, rather than manually trying to re-NULL it ourselves.
> > ? if (s.err) {
> But here I think that the callback has been freed.
> > ??? set_error (s.err, "server replied with error to list
request");
> > ??? return -1;
> > ? }
> > ? return s.count;
> > }
> 
> The problem is that if `nbd_opt_list()` returns an error, I as a user
cannot
> know wether I should free the data associated with the list callback myself
> or if that has been done already.
The synchronous calls should ALWAYS call the free callback by the time
they complete, whether on success or on error.  If they don't, that's
a bug.

Looks like I should expand tests/opt-list.c to pass in a free
callback, to ensure that it is reached the correct number of times.
> 
> 
> Maybe I am just misunderstanding the code or the API, but if not then I
> wonder how I should know when I need to free the user data associated with
a
> callback myself and when that is done by libnbd?
It should always be done by libnbd.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2023-Jul-13 08:16 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On Wed, Jul 12, 2023 at 03:19:59PM +0000, Tage Johansson
wrote:> While writing some tests for the Rust bindings, I discovered a
> memory leak with Valgrind due to a completion callback not being
> freed.
A note about the valgrind tests:

They are quite sensitive to versions of software installed.  Basically
they work on my machine, they may work on other Fedora developer's
machines.  They probably won't work in general.

Another thing is they are known to fail if you don't have full debug
symbols installed (and probably frame pointers enabled), since the
stack traces won't be accurate and won't be comparable to the symbol
names in valgrind/suppressions.

As Eric said, if you see another valgrind failure please post the
complete error on the mailing list so we can take a look at it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Libguestfs - Jul 2023 - libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed