thr3ads.net - Libguestfs - [Libguestfs] libnbd: When are callbacks freed [Jul 2023]

If this information is useful, please help other people find it:
Share via:

Tage Johansson

2023-Jul-13 15:05 UTC

[Libguestfs] libnbd: When are callbacks freed

On 7/13/2023 4:36 PM, Eric Blake wrote:> On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote:
>> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
>>> On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
>>>> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
>>>>>>> I have replaced a call to `nbd_opt_info()` with a
call to
>>>>>>> `nbd_aio_opt_info()` and passed in a completion
callback which just
>>>>>>> calls `exit(EXIT_FAILURE)`. So if the completion
callback is called
>>>>>>> the test should fail, which it doesn't, at
least not on my machine.
>>>>>> Isn't that OK?  Only .free is required to be
called.
>>>>> For the context callback (for opt_set_meta), .callback may
be called
>>>>> zero, one, or multiple times, but .free should be called
exactly once.
>>>>> But for the completion callback, I agree that the docs
state that both
>>>>> .callback and .free should each be called exactly once
("On the other
>>>>> hand, if a completion callback is supplied (only possible
with
>>>>> asynchronous commands), it will always be reached exactly
once, and
>>>>> the completion callback must not ignore the value pointed
to by
>>>>> C<error>."); we are indeed missing the call to
.callback.  I'll work
>>>>> on a patch.
>>>> Eww, the bug is bigger than just nbd_aio_opt* not always
calling
>>>> completion.callback exactly once.  With just this diff (to be
folded
>>>> into the larger patch I'm working on), I'm getting an
assertion
>>>> failure that we fail to call completion.callback for
>>>> nbd_aio_pread_structured when calling nbd_close() prior to the
command
>>>> running to completion, so I'll have to fix that too.
>>> Just to be clear about this, are we really sure the completion
>>> callback should always be called once?  I'm not clear why that
should
>>> be the case.  (The .free callback however should be.)
>>>
>>> For example, if I call a function with bogus invalid parameters so
>>> that it fails very early on (or when the handle is in an incorrect
>>> state), should I expect the completion callback to be called?  I
would
>>> expect not.
>>>
>>> Rich.
>>
>> The user needs a way to know if an error occurred. So the completion
>> callback must be called if the asynchronous function did not fail
(returned
>> 0). If the completion callback should be called, with the error
parameter
>> set, even when the asynchronous function immediately failed with a
non-zero
>> return value is another question. I see two possibilities: Either the
>> completion callback should always be called. Or it should be called iff
the
>> asynchronous function returned 0 (did not fail).
> Good point.
>
> Our documentation currently states (docs/libnbd.pod) that the
> completion callback is ALWAYS called, but this is inconsistent with
> current code - you are correct that at present, the completion
> callback is NOT called if the aio command returns non-zero (easy test:
> call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> that part of the documentation to match existing practice (changing
> our guarantee to the completion callback will be called once iff the
> aio command reported success), since our testsuite wasn't covering it,
> and it is probably an easier fix than munging the generator to call
> completion.callback even for aio failures.  Meanwhile, the .free
> callback MUST be called unconditionally, and I think our testsuite is
> already covering that.
>
> Life-cycle wise, I see the following sequence as being something we
> could usefully rely on (although we don't yet have enough testsuite
> coverage to prove that we already have it).  Note that it is not only
> important how many times things are called, but I would like it if we
> can guarantee the specific ordering between them (neither .free should
> be called until all .callback opportunities are complete finished, to
> avoid any use-after-free issues regardless of which of the two .free
> slots the programmer actually uses):
>
> - if aio command fails:
> mid-command .callback: 0 calls
> completion .callback: 0 calls
> mid-command .free: 1 call
> completion .free: 1 call
>
> - if aio command succeeds:
> mid-command .callback: 0, 1, or multiple times
> completion .callback: 1 call
> mid-command .free: 1 call
> completion .free: 1 call
>
> What I'm not sure of yet (without more code inspection) is whether we
> can sometimes call completion.callback after mid-command.free.
>
>>
>> By the way, if the error parameter is set in the completion callback,
how
>> can the user retrieve the error text? Is it possible to call
>> `get_nbd_error(3)` from inside the completion callback?
> You mean nbd_get_error().  We currently document that it is not safe
> to call any nbd_* from within a callback.  Maybe we could relax that
> to carve out exceptions for nbd_get_error/errno as special cases,
> since they only inspect thread-local state and can be used even when
> there is no current nbd_handle.  The problem is that I'm not sure if
> there is a reliable string at that point in time: part of the reason
> the completion callback has an errnum parameter is that the point of
> reporting the completion failure may be in a different thread than
> when the failure was first detected, and we only preserved a numeric
> value in cmd->error rather than also preserving a string.
>
> Put another way, there is no way to guarantee that nbd_get_errno()
> will return the same value as the errnum parameter to the callback;
> and if that is true, then even if calling nbd_get_error() doesn't
> crash or deadlock from within a callback, it is likewise probable that
> any such string returned is not consistent with the errnum parameter
> passed to the callback.

So is there any safe way to get some description of the error from a 
completion callback apart from a non-zero number? It isn't too helpful 
to report to the user that the read operation faild with -1.


Best regards,

Tage

Richard W.M. Jones

2023-Jul-13 15:37 UTC

head link

[Libguestfs] libnbd: When are callbacks freed

On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson
wrote:> 
> On 7/13/2023 4:36 PM, Eric Blake wrote:
> >On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote:
> >>On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> >>>On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> >>>>On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> >>>>>>>I have replaced a call to `nbd_opt_info()` with
a call to
> >>>>>>>`nbd_aio_opt_info()` and passed in a completion
callback which just
> >>>>>>>calls `exit(EXIT_FAILURE)`. So if the
completion callback is called
> >>>>>>>the test should fail, which it doesn't, at
least not on my machine.
> >>>>>>Isn't that OK?  Only .free is required to be
called.
> >>>>>For the context callback (for opt_set_meta), .callback
may be called
> >>>>>zero, one, or multiple times, but .free should be
called exactly once.
> >>>>>But for the completion callback, I agree that the docs
state that both
> >>>>>.callback and .free should each be called exactly once
("On the other
> >>>>>hand, if a completion callback is supplied (only
possible with
> >>>>>asynchronous commands), it will always be reached
exactly once, and
> >>>>>the completion callback must not ignore the value
pointed to by
> >>>>>C<error>."); we are indeed missing the call
to .callback.  I'll work
> >>>>>on a patch.
> >>>>Eww, the bug is bigger than just nbd_aio_opt* not always
calling
> >>>>completion.callback exactly once.  With just this diff (to
be folded
> >>>>into the larger patch I'm working on), I'm getting
an assertion
> >>>>failure that we fail to call completion.callback for
> >>>>nbd_aio_pread_structured when calling nbd_close() prior to
the command
> >>>>running to completion, so I'll have to fix that too.
> >>>Just to be clear about this, are we really sure the completion
> >>>callback should always be called once?  I'm not clear why
that should
> >>>be the case.  (The .free callback however should be.)
> >>>
> >>>For example, if I call a function with bogus invalid parameters
so
> >>>that it fails very early on (or when the handle is in an
incorrect
> >>>state), should I expect the completion callback to be called? 
I would
> >>>expect not.
> >>>
> >>>Rich.
> >>
> >>The user needs a way to know if an error occurred. So the
completion
> >>callback must be called if the asynchronous function did not fail
(returned
> >>0). If the completion callback should be called, with the error
parameter
> >>set, even when the asynchronous function immediately failed with a
non-zero
> >>return value is another question. I see two possibilities: Either
the
> >>completion callback should always be called. Or it should be called
iff the
> >>asynchronous function returned 0 (did not fail).
> >Good point.
> >
> >Our documentation currently states (docs/libnbd.pod) that the
> >completion callback is ALWAYS called, but this is inconsistent with
> >current code - you are correct that at present, the completion
> >callback is NOT called if the aio command returns non-zero (easy test:
> >call nbd_aio_block_status before nbd_connect*).  I'm game to
updating
> >that part of the documentation to match existing practice (changing
> >our guarantee to the completion callback will be called once iff the
> >aio command reported success), since our testsuite wasn't covering
it,
> >and it is probably an easier fix than munging the generator to call
> >completion.callback even for aio failures.  Meanwhile, the .free
> >callback MUST be called unconditionally, and I think our testsuite is
> >already covering that.
> >
> >Life-cycle wise, I see the following sequence as being something we
> >could usefully rely on (although we don't yet have enough testsuite
> >coverage to prove that we already have it).  Note that it is not only
> >important how many times things are called, but I would like it if we
> >can guarantee the specific ordering between them (neither .free should
> >be called until all .callback opportunities are complete finished, to
> >avoid any use-after-free issues regardless of which of the two .free
> >slots the programmer actually uses):
> >
> >- if aio command fails:
> >mid-command .callback: 0 calls
> >completion .callback: 0 calls
> >mid-command .free: 1 call
> >completion .free: 1 call
> >
> >- if aio command succeeds:
> >mid-command .callback: 0, 1, or multiple times
> >completion .callback: 1 call
> >mid-command .free: 1 call
> >completion .free: 1 call
> >
> >What I'm not sure of yet (without more code inspection) is whether
we
> >can sometimes call completion.callback after mid-command.free.
> >
> >>
> >>By the way, if the error parameter is set in the completion
callback, how
> >>can the user retrieve the error text? Is it possible to call
> >>`get_nbd_error(3)` from inside the completion callback?
> >You mean nbd_get_error().  We currently document that it is not safe
> >to call any nbd_* from within a callback.  Maybe we could relax that
> >to carve out exceptions for nbd_get_error/errno as special cases,
> >since they only inspect thread-local state and can be used even when
> >there is no current nbd_handle.  The problem is that I'm not sure
if
> >there is a reliable string at that point in time: part of the reason
> >the completion callback has an errnum parameter is that the point of
> >reporting the completion failure may be in a different thread than
> >when the failure was first detected, and we only preserved a numeric
> >value in cmd->error rather than also preserving a string.
> >
> >Put another way, there is no way to guarantee that nbd_get_errno()
> >will return the same value as the errnum parameter to the callback;
> >and if that is true, then even if calling nbd_get_error() doesn't
> >crash or deadlock from within a callback, it is likewise probable that
> >any such string returned is not consistent with the errnum parameter
> >passed to the callback.
> 
> 
> So is there any safe way to get some description of the error from a
> completion callback apart from a non-zero number? It isn't too
> helpful to report to the user that the read operation faild with -1.
As I recall, from the callback, no.

The error is not lost however, it will be returned by the API call
itself.  eg. If you're in nbd_aio_opt_list -> callback (error) then
nbd_aio_opt_list will return -1 and at that point you can use
nbd_get_error to report the error.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Libguestfs - Jul 2023 - libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed

[Libguestfs] libnbd: When are callbacks freed