thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback [Jul 2023]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2023-Jul-14 14:13 UTC

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek
wrote:> On 7/13/23 21:29, Eric Blake wrote:
> > The documentation has claimed since commit 6f4dcdab that any
> > completion callback will be called exactly once; but this is not
> > consistent with the code: if nbd_aio_* itself returns an error, then
> > nothing is queued and the user does not need to wait for a completion
> > callback to know how the command failed.  We could tweak the generator
> > to call completion.callback no matter what, but since the
> > completion.free callback already serves that role, it's easier to
fix
> > the documentation to match reality.  After all, one only needs
> > completion status if an aio command returned success (if it returned
> > failure, we know that there is nothing that is going to complete
> > later).
> > 
> > However, there was one place where we indeed fail to call
> > completion.callback, even though the corresponding aio call returned
> > success, which can strand a user that was depending on the callback to
> > know that the pending aio command failed after all.  That's when a
> > call to nbd_close() interrupts a connection while commands are in
> > flight.  This problem appears to have been around even before commit
> > 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
> > the first place.
> > 
> > Beef up the closure-lifetimes unit test to more robustly check the
> > various conditions guaranteed by the updated documentation, and to
> > expose the previous skip of a completion callback during nbd_close.
> > 
> > In summary, the behavior we want (where sequence is important) is:
> > 
> > - aio command fails:
> > mid-command .callback: 0 calls
> > completion .callback: 0 calls
> > mid-command .free: 1 call
> > completion .free: 1 call
> > 
> > - aio command succeeds:
> > mid-command .callback: 0, 1, or multiple calls
> > completion .callback: 1 call
> > mid-command .free: 1 call
> > completion .free: 1 call
> > 
> > Reported-by: Tage Johansson <tage.j.lists at posteo.net>
> > Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle
errors", v1.11.8)
> > Signed-off-by: Eric Blake <eblake at redhat.com>
> > ---
> >  docs/libnbd.pod           | 26 ++++++++++++++++----------
> >  lib/handle.c              |  3 +++
> >  tests/closure-lifetimes.c | 14 ++++++++++++++
> >  3 files changed, 33 insertions(+), 10 deletions(-)
> > 
> > diff --git a/docs/libnbd.pod b/docs/libnbd.pod
> > index 72f74053..433479e6 100644
> > --- a/docs/libnbd.pod
> > +++ b/docs/libnbd.pod
> > @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
> >  =head2 Completion callbacks
> > 
> >  All of the asychronous commands have an optional completion callback
> > -function that is used right before the command is marked complete,
> > -after any mid-command callbacks have finished, and before any free
> > -functions.
> > +function that is used if the asynchronous command succeeded, right
> > +before the command is marked complete, after any mid-command
callbacks
> > +have finished, and before any free functions.  The completion
callback
> > +is not reached if the asynchronous command itself fails, while free
> > +functions are reached regardless of the initial result of the
> > +asynchronous command.
> > 
> >  When the completion callback returns C<1>, the command is
> >  automatically retired (there is no need to call
> 
> I agree with this approach (i.e., with adapting the documentation to
> reality), but I find the language somewhat confusing. We have three terms:
> 
> - "asynchronous command succeeds"
> - "command is marked complete"
> - "command is retired"
> 
> The last two are mostly interchangeable in my view, and are also *not*
> confusing in the documentation. But the first term is confusing, IMO; it
> can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
> the successful queueing (or submission) of the command. The text below
> does say "queued", but the text above doesn't. So I'd
suggest replacing
> term#1 above with "call to asynchronous API succeeds" or
"asynchronous
> command is successfully submitted" or something like that.
> 
> (Side comment: I'd kinda prefer an all-or-nothing approach for async
> APIs. If the API fails at once, it should not take ownership of
> anything; i.e., it shouldn't call either completion or free callbacks.
> And if it succeeds, then it should take complete ownership. I'm not
> suggesting to rework callbacks whole-sale for this, though.)
> 
> With the language clarified a bit:
> 
> Acked-by: Laszlo Ersek <lersek at redhat.com>
> 
> Thanks
> Laszlo
>
Here's what I squashed in, before pushing the series as
dfa9473c..28134d9e

diff --git i/docs/libnbd.pod w/docs/libnbd.pod
index 433479e6..d8b22107 100644
--- i/docs/libnbd.pod
+++ w/docs/libnbd.pod
@@ -171,8 +171,12 @@ There are several things to note here:

 =item *

-This only starts the command.  The command is still in flight when the
-call returns.
+This only starts the command.  The command is (usually) still in
+flight when the call returns success, where you must rely on
+subsequent API calls for learning the final command outcome and
+trigger any remaining callbacks.  However, you must also be able to
+handle the case where system load allows the state machine to advance
+far enough to invoke callbacks before the asynchronous API returns.

 =item *

@@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and
E<ge> 1.

 You may register a function which is called when the command
 completes, see L</Completion callbacks> below.  In this case we have
-specified a null completion callback.
+specified a null completion callback.  If a completion callback is
+specified, it will only be called if the asynchronous command was
+sucessfully submitted (if the asynchronous API itself returns an
+error, there is nothing further to be completed).

 =back

@@ -897,19 +904,25 @@ asynchronous commands are retired.

 =head2 Callbacks and locking

-The callbacks are invoked at a point where the libnbd lock is held; as
-such, it is unsafe for the callback to call any C<nbd_*> APIs on the
-same nbd object, as it would cause deadlock.
+The callbacks are invoked at a point where the libnbd lock is held,
+typically during a call to C<nbd_aio_notify_read>,
+C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can
+advance libnbd's state machine.  Depending on system load, it is even
+possible for a callback to reached before completion of the
+C<nbd_aio_*> call that specified the callback.  As such, it is unsafe
+for the callback to call any C<nbd_*> APIs on the same nbd object, as
+it would cause deadlock.

 =head2 Completion callbacks

 All of the asychronous commands have an optional completion callback
-function that is used if the asynchronous command succeeded, right
-before the command is marked complete, after any mid-command callbacks
+function that is used if the call to the asynchronous API reports
+success.  The completion callback is invoked when the submitted
+command is eventually marked complete, after any mid-command callbacks
 have finished, and before any free functions.  The completion callback
-is not reached if the asynchronous command itself fails, while free
-functions are reached regardless of the initial result of the
-asynchronous command.
+is not reached if the asynchronous API itself fails, while free
+callbacks are reached regardless of the result of the initial
+asynchronous API.

 When the completion callback returns C<1>, the command is
 automatically retired (there is no need to call


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Laszlo Ersek

2023-Jul-14 15:23 UTC

head link

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

On 7/14/23 16:13, Eric Blake wrote:> On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:
>> On 7/13/23 21:29, Eric Blake wrote:
>>> The documentation has claimed since commit 6f4dcdab that any
>>> completion callback will be called exactly once; but this is not
>>> consistent with the code: if nbd_aio_* itself returns an error,
then
>>> nothing is queued and the user does not need to wait for a
completion
>>> callback to know how the command failed.  We could tweak the
generator
>>> to call completion.callback no matter what, but since the
>>> completion.free callback already serves that role, it's easier
to fix
>>> the documentation to match reality.  After all, one only needs
>>> completion status if an aio command returned success (if it
returned
>>> failure, we know that there is nothing that is going to complete
>>> later).
>>>
>>> However, there was one place where we indeed fail to call
>>> completion.callback, even though the corresponding aio call
returned
>>> success, which can strand a user that was depending on the callback
to
>>> know that the pending aio command failed after all.  That's
when a
>>> call to nbd_close() interrupts a connection while commands are in
>>> flight.  This problem appears to have been around even before
commit
>>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks
in
>>> the first place.
>>>
>>> Beef up the closure-lifetimes unit test to more robustly check the
>>> various conditions guaranteed by the updated documentation, and to
>>> expose the previous skip of a completion callback during nbd_close.
>>>
>>> In summary, the behavior we want (where sequence is important) is:
>>>
>>> - aio command fails:
>>> mid-command .callback: 0 calls
>>> completion .callback: 0 calls
>>> mid-command .free: 1 call
>>> completion .free: 1 call
>>>
>>> - aio command succeeds:
>>> mid-command .callback: 0, 1, or multiple calls
>>> completion .callback: 1 call
>>> mid-command .free: 1 call
>>> completion .free: 1 call
>>>
>>> Reported-by: Tage Johansson <tage.j.lists at posteo.net>
>>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle
errors", v1.11.8)
>>> Signed-off-by: Eric Blake <eblake at redhat.com>
>>> ---
>>>  docs/libnbd.pod           | 26 ++++++++++++++++----------
>>>  lib/handle.c              |  3 +++
>>>  tests/closure-lifetimes.c | 14 ++++++++++++++
>>>  3 files changed, 33 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
>>> index 72f74053..433479e6 100644
>>> --- a/docs/libnbd.pod
>>> +++ b/docs/libnbd.pod
>>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
>>>  =head2 Completion callbacks
>>>
>>>  All of the asychronous commands have an optional completion
callback
>>> -function that is used right before the command is marked complete,
>>> -after any mid-command callbacks have finished, and before any free
>>> -functions.
>>> +function that is used if the asynchronous command succeeded, right
>>> +before the command is marked complete, after any mid-command
callbacks
>>> +have finished, and before any free functions.  The completion
callback
>>> +is not reached if the asynchronous command itself fails, while
free
>>> +functions are reached regardless of the initial result of the
>>> +asynchronous command.
>>>
>>>  When the completion callback returns C<1>, the command is
>>>  automatically retired (there is no need to call
>>
>> I agree with this approach (i.e., with adapting the documentation to
>> reality), but I find the language somewhat confusing. We have three
terms:
>>
>> - "asynchronous command succeeds"
>> - "command is marked complete"
>> - "command is retired"
>>
>> The last two are mostly interchangeable in my view, and are also *not*
>> confusing in the documentation. But the first term is confusing, IMO;
it
>> can easily be mistaken for meanings #2/#3. What we mean by #1 instead
is
>> the successful queueing (or submission) of the command. The text below
>> does say "queued", but the text above doesn't. So I'd
suggest replacing
>> term#1 above with "call to asynchronous API succeeds" or
"asynchronous
>> command is successfully submitted" or something like that.
>>
>> (Side comment: I'd kinda prefer an all-or-nothing approach for
async
>> APIs. If the API fails at once, it should not take ownership of
>> anything; i.e., it shouldn't call either completion or free
callbacks.
>> And if it succeeds, then it should take complete ownership. I'm not
>> suggesting to rework callbacks whole-sale for this, though.)
>>
>> With the language clarified a bit:
>>
>> Acked-by: Laszlo Ersek <lersek at redhat.com>
>>
>> Thanks
>> Laszlo
>>
> 
> Here's what I squashed in, before pushing the series as
> dfa9473c..28134d9e
> 
> diff --git i/docs/libnbd.pod w/docs/libnbd.pod
> index 433479e6..d8b22107 100644
> --- i/docs/libnbd.pod
> +++ w/docs/libnbd.pod
> @@ -171,8 +171,12 @@ There are several things to note here:
> 
>  =item *
> 
> -This only starts the command.  The command is still in flight when the
> -call returns.
> +This only starts the command.  The command is (usually) still in
> +flight when the call returns success, where you must rely on
> +subsequent API calls for learning the final command outcome and
> +trigger any remaining callbacks.  However, you must also be able to
> +handle the case where system load allows the state machine to advance
> +far enough to invoke callbacks before the asynchronous API returns.
> 
>  =item *
> 
> @@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and
E<ge> 1.
> 
>  You may register a function which is called when the command
>  completes, see L</Completion callbacks> below.  In this case we have
> -specified a null completion callback.
> +specified a null completion callback.  If a completion callback is
> +specified, it will only be called if the asynchronous command was
> +sucessfully submitted (if the asynchronous API itself returns an
> +error, there is nothing further to be completed).
> 
>  =back
> 
> @@ -897,19 +904,25 @@ asynchronous commands are retired.
> 
>  =head2 Callbacks and locking
> 
> -The callbacks are invoked at a point where the libnbd lock is held; as
> -such, it is unsafe for the callback to call any C<nbd_*> APIs on the
> -same nbd object, as it would cause deadlock.
> +The callbacks are invoked at a point where the libnbd lock is held,
> +typically during a call to C<nbd_aio_notify_read>,
> +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that
can
> +advance libnbd's state machine.  Depending on system load, it is even
> +possible for a callback to reached before completion of the
> +C<nbd_aio_*> call that specified the callback.  As such, it is
unsafe
> +for the callback to call any C<nbd_*> APIs on the same nbd object,
as
> +it would cause deadlock.
> 
>  =head2 Completion callbacks
> 
>  All of the asychronous commands have an optional completion callback
> -function that is used if the asynchronous command succeeded, right
> -before the command is marked complete, after any mid-command callbacks
> +function that is used if the call to the asynchronous API reports
> +success.  The completion callback is invoked when the submitted
> +command is eventually marked complete, after any mid-command callbacks
>  have finished, and before any free functions.  The completion callback
> -is not reached if the asynchronous command itself fails, while free
> -functions are reached regardless of the initial result of the
> -asynchronous command.
> +is not reached if the asynchronous API itself fails, while free
> +callbacks are reached regardless of the result of the initial
> +asynchronous API.
> 
>  When the completion callback returns C<1>, the command is
>  automatically retired (there is no need to call
This is great, thanks!
Laszlo

Tage Johansson

2023-Jul-16 16:39 UTC

head link

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

On 7/14/2023 4:13 PM, Eric Blake wrote:> On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:
>> On 7/13/23 21:29, Eric Blake wrote:
>>> The documentation has claimed since commit 6f4dcdab that any
>>> completion callback will be called exactly once; but this is not
>>> consistent with the code: if nbd_aio_* itself returns an error,
then
>>> nothing is queued and the user does not need to wait for a
completion
>>> callback to know how the command failed.  We could tweak the
generator
>>> to call completion.callback no matter what, but since the
>>> completion.free callback already serves that role, it's easier
to fix
>>> the documentation to match reality.  After all, one only needs
>>> completion status if an aio command returned success (if it
returned
>>> failure, we know that there is nothing that is going to complete
>>> later).
>>>
>>> However, there was one place where we indeed fail to call
>>> completion.callback, even though the corresponding aio call
returned
>>> success, which can strand a user that was depending on the callback
to
>>> know that the pending aio command failed after all.  That's
when a
>>> call to nbd_close() interrupts a connection while commands are in
>>> flight.  This problem appears to have been around even before
commit
>>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks
in
>>> the first place.
>>>
>>> Beef up the closure-lifetimes unit test to more robustly check the
>>> various conditions guaranteed by the updated documentation, and to
>>> expose the previous skip of a completion callback during nbd_close.
>>>
>>> In summary, the behavior we want (where sequence is important) is:
>>>
>>> - aio command fails:
>>> mid-command .callback: 0 calls
>>> completion .callback: 0 calls
>>> mid-command .free: 1 call
>>> completion .free: 1 call
>>>
>>> - aio command succeeds:
>>> mid-command .callback: 0, 1, or multiple calls
>>> completion .callback: 1 call
>>> mid-command .free: 1 call
>>> completion .free: 1 call
>>>
>>> Reported-by: Tage Johansson <tage.j.lists at posteo.net>
>>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle
errors", v1.11.8)
>>> Signed-off-by: Eric Blake <eblake at redhat.com>
>>> ---
>>>   docs/libnbd.pod           | 26 ++++++++++++++++----------
>>>   lib/handle.c              |  3 +++
>>>   tests/closure-lifetimes.c | 14 ++++++++++++++
>>>   3 files changed, 33 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
>>> index 72f74053..433479e6 100644
>>> --- a/docs/libnbd.pod
>>> +++ b/docs/libnbd.pod
>>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
>>>   =head2 Completion callbacks
>>>
>>>   All of the asychronous commands have an optional completion
callback
>>> -function that is used right before the command is marked complete,
>>> -after any mid-command callbacks have finished, and before any free
>>> -functions.
>>> +function that is used if the asynchronous command succeeded, right
>>> +before the command is marked complete, after any mid-command
callbacks
>>> +have finished, and before any free functions.  The completion
callback
>>> +is not reached if the asynchronous command itself fails, while
free
>>> +functions are reached regardless of the initial result of the
>>> +asynchronous command.
>>>
>>>   When the completion callback returns C<1>, the command is
>>>   automatically retired (there is no need to call
>> I agree with this approach (i.e., with adapting the documentation to
>> reality), but I find the language somewhat confusing. We have three
terms:
>>
>> - "asynchronous command succeeds"
>> - "command is marked complete"
>> - "command is retired"
>>
>> The last two are mostly interchangeable in my view, and are also *not*
>> confusing in the documentation. But the first term is confusing, IMO;
it
>> can easily be mistaken for meanings #2/#3. What we mean by #1 instead
is
>> the successful queueing (or submission) of the command. The text below
>> does say "queued", but the text above doesn't. So I'd
suggest replacing
>> term#1 above with "call to asynchronous API succeeds" or
"asynchronous
>> command is successfully submitted" or something like that.
>>
>> (Side comment: I'd kinda prefer an all-or-nothing approach for
async
>> APIs. If the API fails at once, it should not take ownership of
>> anything; i.e., it shouldn't call either completion or free
callbacks.
>> And if it succeeds, then it should take complete ownership. I'm not
>> suggesting to rework callbacks whole-sale for this, though.)
>>
>> With the language clarified a bit:
>>
>> Acked-by: Laszlo Ersek <lersek at redhat.com>
>>
>> Thanks
>> Laszlo
>>
> Here's what I squashed in, before pushing the series as
> dfa9473c..28134d9e
>
> diff --git i/docs/libnbd.pod w/docs/libnbd.pod
> index 433479e6..d8b22107 100644
> --- i/docs/libnbd.pod
> +++ w/docs/libnbd.pod
> @@ -171,8 +171,12 @@ There are several things to note here:
>
>   =item *
>
> -This only starts the command.  The command is still in flight when the
> -call returns.
> +This only starts the command.  The command is (usually) still in
> +flight when the call returns success, where you must rely on
> +subsequent API calls for learning the final command outcome and
> +trigger any remaining callbacks.  However, you must also be able to
> +handle the case where system load allows the state machine to advance
> +far enough to invoke callbacks before the asynchronous API returns.
>
>   =item *
>
> @@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and
E<ge> 1.
>
>   You may register a function which is called when the command
>   completes, see L</Completion callbacks> below.  In this case we
have
> -specified a null completion callback.
> +specified a null completion callback.  If a completion callback is
> +specified, it will only be called if the asynchronous command was
> +sucessfully submitted (if the asynchronous API itself returns an

Should probably be "successfully" instead of "sucessfully".

> +error, there is nothing further to be completed).
>
>   =back
>
> @@ -897,19 +904,25 @@ asynchronous commands are retired.
>
>   =head2 Callbacks and locking
>
> -The callbacks are invoked at a point where the libnbd lock is held; as
> -such, it is unsafe for the callback to call any C<nbd_*> APIs on the
> -same nbd object, as it would cause deadlock.
> +The callbacks are invoked at a point where the libnbd lock is held,
> +typically during a call to C<nbd_aio_notify_read>,
> +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that
can
> +advance libnbd's state machine.  Depending on system load, it is even
> +possible for a callback to reached before completion of the

Shouldn't it be "to be reached" instead of "to reached"?


Best regards,

Tage

> +C<nbd_aio_*> call that specified the callback.  As such, it is
unsafe
> +for the callback to call any C<nbd_*> APIs on the same nbd object,
as
> +it would cause deadlock.
>
>   =head2 Completion callbacks
>
>   All of the asychronous commands have an optional completion callback
> -function that is used if the asynchronous command succeeded, right
> -before the command is marked complete, after any mid-command callbacks
> +function that is used if the call to the asynchronous API reports
> +success.  The completion callback is invoked when the submitted
> +command is eventually marked complete, after any mid-command callbacks
>   have finished, and before any free functions.  The completion callback
> -is not reached if the asynchronous command itself fails, while free
> -functions are reached regardless of the initial result of the
> -asynchronous command.
> +is not reached if the asynchronous API itself fails, while free
> +callbacks are reached regardless of the result of the initial
> +asynchronous API.
>
>   When the completion callback returns C<1>, the command is
>   automatically retired (there is no need to call
>
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xBFBE172D49294052.asc
Type: application/pgp-keys
Size: 2432 bytes
Desc: OpenPGP public key
URL:
<http://listman.redhat.com/archives/libguestfs/attachments/20230716/dc118859/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL:
<http://listman.redhat.com/archives/libguestfs/attachments/20230716/dc118859/attachment.sig>

Possibly Parallel Threads

Search for more apparently analagous threads

Libguestfs - Jul 2023 - [libnbd PATCH 1/2] api: Tighten rules on completion.callback

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Possibly Parallel Threads