thr3ads.net - Libguestfs - [Libguestfs] libnbd: When are callbacks freed [Jul 2023]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2023-Jul-13 14:36 UTC

[Libguestfs] libnbd: When are callbacks freed

On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson
wrote:> 
> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> > On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> > > On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> > > > > > I have replaced a call to `nbd_opt_info()` with a
call to
> > > > > > `nbd_aio_opt_info()` and passed in a completion
callback which just
> > > > > > calls `exit(EXIT_FAILURE)`. So if the completion
callback is called
> > > > > > the test should fail, which it doesn't, at
least not on my machine.
> > > > > Isn't that OK?  Only .free is required to be
called.
> > > > For the context callback (for opt_set_meta), .callback may
be called
> > > > zero, one, or multiple times, but .free should be called
exactly once.
> > > > But for the completion callback, I agree that the docs state
that both
> > > > .callback and .free should each be called exactly once
("On the other
> > > > hand, if a completion callback is supplied (only possible
with
> > > > asynchronous commands), it will always be reached exactly
once, and
> > > > the completion callback must not ignore the value pointed to
by
> > > > C<error>."); we are indeed missing the call to
.callback.  I'll work
> > > > on a patch.
> > > Eww, the bug is bigger than just nbd_aio_opt* not always calling
> > > completion.callback exactly once.  With just this diff (to be
folded
> > > into the larger patch I'm working on), I'm getting an
assertion
> > > failure that we fail to call completion.callback for
> > > nbd_aio_pread_structured when calling nbd_close() prior to the
command
> > > running to completion, so I'll have to fix that too.
> > Just to be clear about this, are we really sure the completion
> > callback should always be called once?  I'm not clear why that
should
> > be the case.  (The .free callback however should be.)
> > 
> > For example, if I call a function with bogus invalid parameters so
> > that it fails very early on (or when the handle is in an incorrect
> > state), should I expect the completion callback to be called?  I would
> > expect not.
> > 
> > Rich.
> 
> 
> The user needs a way to know if an error occurred. So the completion
> callback must be called if the asynchronous function did not fail (returned
> 0). If the completion callback should be called, with the error parameter
> set, even when the asynchronous function immediately failed with a non-zero
> return value is another question. I see two possibilities: Either the
> completion callback should always be called. Or it should be called iff the
> asynchronous function returned 0 (did not fail).
Good point.

Our documentation currently states (docs/libnbd.pod) that the
completion callback is ALWAYS called, but this is inconsistent with
current code - you are correct that at present, the completion
callback is NOT called if the aio command returns non-zero (easy test:
call nbd_aio_block_status before nbd_connect*).  I'm game to updating
that part of the documentation to match existing practice (changing
our guarantee to the completion callback will be called once iff the
aio command reported success), since our testsuite wasn't covering it,
and it is probably an easier fix than munging the generator to call
completion.callback even for aio failures.  Meanwhile, the .free
callback MUST be called unconditionally, and I think our testsuite is
already covering that.

Life-cycle wise, I see the following sequence as being something we
could usefully rely on (although we don't yet have enough testsuite
coverage to prove that we already have it).  Note that it is not only
important how many times things are called, but I would like it if we
can guarantee the specific ordering between them (neither .free should
be called until all .callback opportunities are complete finished, to
avoid any use-after-free issues regardless of which of the two .free
slots the programmer actually uses):

- if aio command fails:
mid-command .callback: 0 calls
completion .callback: 0 calls
mid-command .free: 1 call
completion .free: 1 call

- if aio command succeeds:
mid-command .callback: 0, 1, or multiple times
completion .callback: 1 call
mid-command .free: 1 call
completion .free: 1 call

What I'm not sure of yet (without more code inspection) is whether we
can sometimes call completion.callback after mid-command.free.
> 
> 
> By the way, if the error parameter is set in the completion callback, how
> can the user retrieve the error text? Is it possible to call
> `get_nbd_error(3)` from inside the completion callback?
You mean nbd_get_error().  We currently document that it is not safe
to call any nbd_* from within a callback.  Maybe we could relax that
to carve out exceptions for nbd_get_error/errno as special cases,
since they only inspect thread-local state and can be used even when
there is no current nbd_handle.  The problem is that I'm not sure if
there is a reliable string at that point in time: part of the reason
the completion callback has an errnum parameter is that the point of
reporting the completion failure may be in a different thread than
when the failure was first detected, and we only preserved a numeric
value in cmd->error rather than also preserving a string.

Put another way, there is no way to guarantee that nbd_get_errno()
will return the same value as the errnum parameter to the callback;
and if that is true, then even if calling nbd_get_error() doesn't
crash or deadlock from within a callback, it is likewise probable that
any such string returned is not consistent with the errnum parameter
passed to the callback.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Eric Blake

2023-Jul-13 14:55 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On Thu, Jul 13, 2023 at 09:36:58AM -0500, Eric Blake
wrote:> > The user needs a way to know if an error occurred. So the completion
> > callback must be called if the asynchronous function did not fail
(returned
> > 0). If the completion callback should be called, with the error
parameter
> > set, even when the asynchronous function immediately failed with a
non-zero
> > return value is another question. I see two possibilities: Either the
> > completion callback should always be called. Or it should be called
iff the
> > asynchronous function returned 0 (did not fail).
> 
> Good point.
> 
> Our documentation currently states (docs/libnbd.pod) that the
> completion callback is ALWAYS called, but this is inconsistent with
> current code - you are correct that at present, the completion
> callback is NOT called if the aio command returns non-zero (easy test:
> call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> that part of the documentation to match existing practice (changing
> our guarantee to the completion callback will be called once iff the
> aio command reported success), since our testsuite wasn't covering it,
> and it is probably an easier fix than munging the generator to call
> completion.callback even for aio failures.  Meanwhile, the .free
> callback MUST be called unconditionally, and I think our testsuite is
> already covering that.
> 
> Life-cycle wise, I see the following sequence as being something we
> could usefully rely on (although we don't yet have enough testsuite
> coverage to prove that we already have it).  Note that it is not only
> important how many times things are called, but I would like it if we
> can guarantee the specific ordering between them (neither .free should
> be called until all .callback opportunities are complete finished, to
> avoid any use-after-free issues regardless of which of the two .free
> slots the programmer actually uses):
> 
> - if aio command fails:
> mid-command .callback: 0 calls
> completion .callback: 0 calls
This is the part that is inconsistent with our docs added in commit
6f4dcdab, but which matches current code from all the tests I've come
up with so far.
> mid-command .free: 1 call
> completion .free: 1 call
> 
> - if aio command succeeds:
> mid-command .callback: 0, 1, or multiple times
> completion .callback: 1 call
> mid-command .free: 1 call
> completion .free: 1 call
> 
> What I'm not sure of yet (without more code inspection) is whether we
> can sometimes call completion.callback after mid-command.free.
Commit 6f4dcdab documents that we guarantee this order, and I'm trying
to preserve that guarantee.  Thus, I think the only thing up for
debate is whether completion.callback can be skipped if the
corresponding aio command failed early (because that is the one case
where you KNOW there was an error without needing a completion
callback to tell you that fact), but that it does make sense as being
closer to what our code base already does.

Ramifications to user code: for aio commands that take both a
mid-command and completion callback, the user can share the same
allocated data structure between both callbacks, and use their choice
of mid-command.free or completion.free to clean up the allocation.
But if the user instead tries to uss completion.callback to clean up
the allocation, then they must ALSO manually clean up the allocation
if the aio command itself reported -1 (that is, it is better to stick
cleanup in .free than in complete.callback).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Tage Johansson

2023-Jul-13 15:05 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On 7/13/2023 4:36 PM, Eric Blake wrote:> On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote:
>> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
>>> On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
>>>> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
>>>>>>> I have replaced a call to `nbd_opt_info()` with a
call to
>>>>>>> `nbd_aio_opt_info()` and passed in a completion
callback which just
>>>>>>> calls `exit(EXIT_FAILURE)`. So if the completion
callback is called
>>>>>>> the test should fail, which it doesn't, at
least not on my machine.
>>>>>> Isn't that OK?  Only .free is required to be
called.
>>>>> For the context callback (for opt_set_meta), .callback may
be called
>>>>> zero, one, or multiple times, but .free should be called
exactly once.
>>>>> But for the completion callback, I agree that the docs
state that both
>>>>> .callback and .free should each be called exactly once
("On the other
>>>>> hand, if a completion callback is supplied (only possible
with
>>>>> asynchronous commands), it will always be reached exactly
once, and
>>>>> the completion callback must not ignore the value pointed
to by
>>>>> C<error>."); we are indeed missing the call to
.callback.  I'll work
>>>>> on a patch.
>>>> Eww, the bug is bigger than just nbd_aio_opt* not always
calling
>>>> completion.callback exactly once.  With just this diff (to be
folded
>>>> into the larger patch I'm working on), I'm getting an
assertion
>>>> failure that we fail to call completion.callback for
>>>> nbd_aio_pread_structured when calling nbd_close() prior to the
command
>>>> running to completion, so I'll have to fix that too.
>>> Just to be clear about this, are we really sure the completion
>>> callback should always be called once?  I'm not clear why that
should
>>> be the case.  (The .free callback however should be.)
>>>
>>> For example, if I call a function with bogus invalid parameters so
>>> that it fails very early on (or when the handle is in an incorrect
>>> state), should I expect the completion callback to be called?  I
would
>>> expect not.
>>>
>>> Rich.
>>
>> The user needs a way to know if an error occurred. So the completion
>> callback must be called if the asynchronous function did not fail
(returned
>> 0). If the completion callback should be called, with the error
parameter
>> set, even when the asynchronous function immediately failed with a
non-zero
>> return value is another question. I see two possibilities: Either the
>> completion callback should always be called. Or it should be called iff
the
>> asynchronous function returned 0 (did not fail).
> Good point.
>
> Our documentation currently states (docs/libnbd.pod) that the
> completion callback is ALWAYS called, but this is inconsistent with
> current code - you are correct that at present, the completion
> callback is NOT called if the aio command returns non-zero (easy test:
> call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> that part of the documentation to match existing practice (changing
> our guarantee to the completion callback will be called once iff the
> aio command reported success), since our testsuite wasn't covering it,
> and it is probably an easier fix than munging the generator to call
> completion.callback even for aio failures.  Meanwhile, the .free
> callback MUST be called unconditionally, and I think our testsuite is
> already covering that.
>
> Life-cycle wise, I see the following sequence as being something we
> could usefully rely on (although we don't yet have enough testsuite
> coverage to prove that we already have it).  Note that it is not only
> important how many times things are called, but I would like it if we
> can guarantee the specific ordering between them (neither .free should
> be called until all .callback opportunities are complete finished, to
> avoid any use-after-free issues regardless of which of the two .free
> slots the programmer actually uses):
>
> - if aio command fails:
> mid-command .callback: 0 calls
> completion .callback: 0 calls
> mid-command .free: 1 call
> completion .free: 1 call
>
> - if aio command succeeds:
> mid-command .callback: 0, 1, or multiple times
> completion .callback: 1 call
> mid-command .free: 1 call
> completion .free: 1 call
>
> What I'm not sure of yet (without more code inspection) is whether we
> can sometimes call completion.callback after mid-command.free.
>
>>
>> By the way, if the error parameter is set in the completion callback,
how
>> can the user retrieve the error text? Is it possible to call
>> `get_nbd_error(3)` from inside the completion callback?
> You mean nbd_get_error().  We currently document that it is not safe
> to call any nbd_* from within a callback.  Maybe we could relax that
> to carve out exceptions for nbd_get_error/errno as special cases,
> since they only inspect thread-local state and can be used even when
> there is no current nbd_handle.  The problem is that I'm not sure if
> there is a reliable string at that point in time: part of the reason
> the completion callback has an errnum parameter is that the point of
> reporting the completion failure may be in a different thread than
> when the failure was first detected, and we only preserved a numeric
> value in cmd->error rather than also preserving a string.
>
> Put another way, there is no way to guarantee that nbd_get_errno()
> will return the same value as the errnum parameter to the callback;
> and if that is true, then even if calling nbd_get_error() doesn't
> crash or deadlock from within a callback, it is likewise probable that
> any such string returned is not consistent with the errnum parameter
> passed to the callback.

So is there any safe way to get some description of the error from a 
completion callback apart from a non-zero number? It isn't too helpful 
to report to the user that the read operation faild with -1.


Best regards,

Tage

Libguestfs - Jul 2023 - libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed