Eric Blake
2023-Jul-14 14:13 UTC
[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback
On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:> On 7/13/23 21:29, Eric Blake wrote: > > The documentation has claimed since commit 6f4dcdab that any > > completion callback will be called exactly once; but this is not > > consistent with the code: if nbd_aio_* itself returns an error, then > > nothing is queued and the user does not need to wait for a completion > > callback to know how the command failed. We could tweak the generator > > to call completion.callback no matter what, but since the > > completion.free callback already serves that role, it's easier to fix > > the documentation to match reality. After all, one only needs > > completion status if an aio command returned success (if it returned > > failure, we know that there is nothing that is going to complete > > later). > > > > However, there was one place where we indeed fail to call > > completion.callback, even though the corresponding aio call returned > > success, which can strand a user that was depending on the callback to > > know that the pending aio command failed after all. That's when a > > call to nbd_close() interrupts a connection while commands are in > > flight. This problem appears to have been around even before commit > > 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in > > the first place. > > > > Beef up the closure-lifetimes unit test to more robustly check the > > various conditions guaranteed by the updated documentation, and to > > expose the previous skip of a completion callback during nbd_close. > > > > In summary, the behavior we want (where sequence is important) is: > > > > - aio command fails: > > mid-command .callback: 0 calls > > completion .callback: 0 calls > > mid-command .free: 1 call > > completion .free: 1 call > > > > - aio command succeeds: > > mid-command .callback: 0, 1, or multiple calls > > completion .callback: 1 call > > mid-command .free: 1 call > > completion .free: 1 call > > > > Reported-by: Tage Johansson <tage.j.lists at posteo.net> > > Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors", v1.11.8) > > Signed-off-by: Eric Blake <eblake at redhat.com> > > --- > > docs/libnbd.pod | 26 ++++++++++++++++---------- > > lib/handle.c | 3 +++ > > tests/closure-lifetimes.c | 14 ++++++++++++++ > > 3 files changed, 33 insertions(+), 10 deletions(-) > > > > diff --git a/docs/libnbd.pod b/docs/libnbd.pod > > index 72f74053..433479e6 100644 > > --- a/docs/libnbd.pod > > +++ b/docs/libnbd.pod > > @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock. > > =head2 Completion callbacks > > > > All of the asychronous commands have an optional completion callback > > -function that is used right before the command is marked complete, > > -after any mid-command callbacks have finished, and before any free > > -functions. > > +function that is used if the asynchronous command succeeded, right > > +before the command is marked complete, after any mid-command callbacks > > +have finished, and before any free functions. The completion callback > > +is not reached if the asynchronous command itself fails, while free > > +functions are reached regardless of the initial result of the > > +asynchronous command. > > > > When the completion callback returns C<1>, the command is > > automatically retired (there is no need to call > > I agree with this approach (i.e., with adapting the documentation to > reality), but I find the language somewhat confusing. We have three terms: > > - "asynchronous command succeeds" > - "command is marked complete" > - "command is retired" > > The last two are mostly interchangeable in my view, and are also *not* > confusing in the documentation. But the first term is confusing, IMO; it > can easily be mistaken for meanings #2/#3. What we mean by #1 instead is > the successful queueing (or submission) of the command. The text below > does say "queued", but the text above doesn't. So I'd suggest replacing > term#1 above with "call to asynchronous API succeeds" or "asynchronous > command is successfully submitted" or something like that. > > (Side comment: I'd kinda prefer an all-or-nothing approach for async > APIs. If the API fails at once, it should not take ownership of > anything; i.e., it shouldn't call either completion or free callbacks. > And if it succeeds, then it should take complete ownership. I'm not > suggesting to rework callbacks whole-sale for this, though.) > > With the language clarified a bit: > > Acked-by: Laszlo Ersek <lersek at redhat.com> > > Thanks > Laszlo >Here's what I squashed in, before pushing the series as dfa9473c..28134d9e diff --git i/docs/libnbd.pod w/docs/libnbd.pod index 433479e6..d8b22107 100644 --- i/docs/libnbd.pod +++ w/docs/libnbd.pod @@ -171,8 +171,12 @@ There are several things to note here: =item * -This only starts the command. The command is still in flight when the -call returns. +This only starts the command. The command is (usually) still in +flight when the call returns success, where you must rely on +subsequent API calls for learning the final command outcome and +trigger any remaining callbacks. However, you must also be able to +handle the case where system load allows the state machine to advance +far enough to invoke callbacks before the asynchronous API returns. =item * @@ -194,7 +198,10 @@ calls. The cookie is unique (per libnbd handle) and E<ge> 1. You may register a function which is called when the command completes, see L</Completion callbacks> below. In this case we have -specified a null completion callback. +specified a null completion callback. If a completion callback is +specified, it will only be called if the asynchronous command was +sucessfully submitted (if the asynchronous API itself returns an +error, there is nothing further to be completed). =back @@ -897,19 +904,25 @@ asynchronous commands are retired. =head2 Callbacks and locking -The callbacks are invoked at a point where the libnbd lock is held; as -such, it is unsafe for the callback to call any C<nbd_*> APIs on the -same nbd object, as it would cause deadlock. +The callbacks are invoked at a point where the libnbd lock is held, +typically during a call to C<nbd_aio_notify_read>, +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can +advance libnbd's state machine. Depending on system load, it is even +possible for a callback to reached before completion of the +C<nbd_aio_*> call that specified the callback. As such, it is unsafe +for the callback to call any C<nbd_*> APIs on the same nbd object, as +it would cause deadlock. =head2 Completion callbacks All of the asychronous commands have an optional completion callback -function that is used if the asynchronous command succeeded, right -before the command is marked complete, after any mid-command callbacks +function that is used if the call to the asynchronous API reports +success. The completion callback is invoked when the submitted +command is eventually marked complete, after any mid-command callbacks have finished, and before any free functions. The completion callback -is not reached if the asynchronous command itself fails, while free -functions are reached regardless of the initial result of the -asynchronous command. +is not reached if the asynchronous API itself fails, while free +callbacks are reached regardless of the result of the initial +asynchronous API. When the completion callback returns C<1>, the command is automatically retired (there is no need to call -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Laszlo Ersek
2023-Jul-14 15:23 UTC
[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback
On 7/14/23 16:13, Eric Blake wrote:> On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote: >> On 7/13/23 21:29, Eric Blake wrote: >>> The documentation has claimed since commit 6f4dcdab that any >>> completion callback will be called exactly once; but this is not >>> consistent with the code: if nbd_aio_* itself returns an error, then >>> nothing is queued and the user does not need to wait for a completion >>> callback to know how the command failed. We could tweak the generator >>> to call completion.callback no matter what, but since the >>> completion.free callback already serves that role, it's easier to fix >>> the documentation to match reality. After all, one only needs >>> completion status if an aio command returned success (if it returned >>> failure, we know that there is nothing that is going to complete >>> later). >>> >>> However, there was one place where we indeed fail to call >>> completion.callback, even though the corresponding aio call returned >>> success, which can strand a user that was depending on the callback to >>> know that the pending aio command failed after all. That's when a >>> call to nbd_close() interrupts a connection while commands are in >>> flight. This problem appears to have been around even before commit >>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in >>> the first place. >>> >>> Beef up the closure-lifetimes unit test to more robustly check the >>> various conditions guaranteed by the updated documentation, and to >>> expose the previous skip of a completion callback during nbd_close. >>> >>> In summary, the behavior we want (where sequence is important) is: >>> >>> - aio command fails: >>> mid-command .callback: 0 calls >>> completion .callback: 0 calls >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> - aio command succeeds: >>> mid-command .callback: 0, 1, or multiple calls >>> completion .callback: 1 call >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> Reported-by: Tage Johansson <tage.j.lists at posteo.net> >>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors", v1.11.8) >>> Signed-off-by: Eric Blake <eblake at redhat.com> >>> --- >>> docs/libnbd.pod | 26 ++++++++++++++++---------- >>> lib/handle.c | 3 +++ >>> tests/closure-lifetimes.c | 14 ++++++++++++++ >>> 3 files changed, 33 insertions(+), 10 deletions(-) >>> >>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod >>> index 72f74053..433479e6 100644 >>> --- a/docs/libnbd.pod >>> +++ b/docs/libnbd.pod >>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock. >>> =head2 Completion callbacks >>> >>> All of the asychronous commands have an optional completion callback >>> -function that is used right before the command is marked complete, >>> -after any mid-command callbacks have finished, and before any free >>> -functions. >>> +function that is used if the asynchronous command succeeded, right >>> +before the command is marked complete, after any mid-command callbacks >>> +have finished, and before any free functions. The completion callback >>> +is not reached if the asynchronous command itself fails, while free >>> +functions are reached regardless of the initial result of the >>> +asynchronous command. >>> >>> When the completion callback returns C<1>, the command is >>> automatically retired (there is no need to call >> >> I agree with this approach (i.e., with adapting the documentation to >> reality), but I find the language somewhat confusing. We have three terms: >> >> - "asynchronous command succeeds" >> - "command is marked complete" >> - "command is retired" >> >> The last two are mostly interchangeable in my view, and are also *not* >> confusing in the documentation. But the first term is confusing, IMO; it >> can easily be mistaken for meanings #2/#3. What we mean by #1 instead is >> the successful queueing (or submission) of the command. The text below >> does say "queued", but the text above doesn't. So I'd suggest replacing >> term#1 above with "call to asynchronous API succeeds" or "asynchronous >> command is successfully submitted" or something like that. >> >> (Side comment: I'd kinda prefer an all-or-nothing approach for async >> APIs. If the API fails at once, it should not take ownership of >> anything; i.e., it shouldn't call either completion or free callbacks. >> And if it succeeds, then it should take complete ownership. I'm not >> suggesting to rework callbacks whole-sale for this, though.) >> >> With the language clarified a bit: >> >> Acked-by: Laszlo Ersek <lersek at redhat.com> >> >> Thanks >> Laszlo >> > > Here's what I squashed in, before pushing the series as > dfa9473c..28134d9e > > diff --git i/docs/libnbd.pod w/docs/libnbd.pod > index 433479e6..d8b22107 100644 > --- i/docs/libnbd.pod > +++ w/docs/libnbd.pod > @@ -171,8 +171,12 @@ There are several things to note here: > > =item * > > -This only starts the command. The command is still in flight when the > -call returns. > +This only starts the command. The command is (usually) still in > +flight when the call returns success, where you must rely on > +subsequent API calls for learning the final command outcome and > +trigger any remaining callbacks. However, you must also be able to > +handle the case where system load allows the state machine to advance > +far enough to invoke callbacks before the asynchronous API returns. > > =item * > > @@ -194,7 +198,10 @@ calls. The cookie is unique (per libnbd handle) and E<ge> 1. > > You may register a function which is called when the command > completes, see L</Completion callbacks> below. In this case we have > -specified a null completion callback. > +specified a null completion callback. If a completion callback is > +specified, it will only be called if the asynchronous command was > +sucessfully submitted (if the asynchronous API itself returns an > +error, there is nothing further to be completed). > > =back > > @@ -897,19 +904,25 @@ asynchronous commands are retired. > > =head2 Callbacks and locking > > -The callbacks are invoked at a point where the libnbd lock is held; as > -such, it is unsafe for the callback to call any C<nbd_*> APIs on the > -same nbd object, as it would cause deadlock. > +The callbacks are invoked at a point where the libnbd lock is held, > +typically during a call to C<nbd_aio_notify_read>, > +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can > +advance libnbd's state machine. Depending on system load, it is even > +possible for a callback to reached before completion of the > +C<nbd_aio_*> call that specified the callback. As such, it is unsafe > +for the callback to call any C<nbd_*> APIs on the same nbd object, as > +it would cause deadlock. > > =head2 Completion callbacks > > All of the asychronous commands have an optional completion callback > -function that is used if the asynchronous command succeeded, right > -before the command is marked complete, after any mid-command callbacks > +function that is used if the call to the asynchronous API reports > +success. The completion callback is invoked when the submitted > +command is eventually marked complete, after any mid-command callbacks > have finished, and before any free functions. The completion callback > -is not reached if the asynchronous command itself fails, while free > -functions are reached regardless of the initial result of the > -asynchronous command. > +is not reached if the asynchronous API itself fails, while free > +callbacks are reached regardless of the result of the initial > +asynchronous API. > > When the completion callback returns C<1>, the command is > automatically retired (there is no need to callThis is great, thanks! Laszlo
Tage Johansson
2023-Jul-16 16:39 UTC
[Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback
On 7/14/2023 4:13 PM, Eric Blake wrote:> On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote: >> On 7/13/23 21:29, Eric Blake wrote: >>> The documentation has claimed since commit 6f4dcdab that any >>> completion callback will be called exactly once; but this is not >>> consistent with the code: if nbd_aio_* itself returns an error, then >>> nothing is queued and the user does not need to wait for a completion >>> callback to know how the command failed. We could tweak the generator >>> to call completion.callback no matter what, but since the >>> completion.free callback already serves that role, it's easier to fix >>> the documentation to match reality. After all, one only needs >>> completion status if an aio command returned success (if it returned >>> failure, we know that there is nothing that is going to complete >>> later). >>> >>> However, there was one place where we indeed fail to call >>> completion.callback, even though the corresponding aio call returned >>> success, which can strand a user that was depending on the callback to >>> know that the pending aio command failed after all. That's when a >>> call to nbd_close() interrupts a connection while commands are in >>> flight. This problem appears to have been around even before commit >>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in >>> the first place. >>> >>> Beef up the closure-lifetimes unit test to more robustly check the >>> various conditions guaranteed by the updated documentation, and to >>> expose the previous skip of a completion callback during nbd_close. >>> >>> In summary, the behavior we want (where sequence is important) is: >>> >>> - aio command fails: >>> mid-command .callback: 0 calls >>> completion .callback: 0 calls >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> - aio command succeeds: >>> mid-command .callback: 0, 1, or multiple calls >>> completion .callback: 1 call >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> Reported-by: Tage Johansson <tage.j.lists at posteo.net> >>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors", v1.11.8) >>> Signed-off-by: Eric Blake <eblake at redhat.com> >>> --- >>> docs/libnbd.pod | 26 ++++++++++++++++---------- >>> lib/handle.c | 3 +++ >>> tests/closure-lifetimes.c | 14 ++++++++++++++ >>> 3 files changed, 33 insertions(+), 10 deletions(-) >>> >>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod >>> index 72f74053..433479e6 100644 >>> --- a/docs/libnbd.pod >>> +++ b/docs/libnbd.pod >>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock. >>> =head2 Completion callbacks >>> >>> All of the asychronous commands have an optional completion callback >>> -function that is used right before the command is marked complete, >>> -after any mid-command callbacks have finished, and before any free >>> -functions. >>> +function that is used if the asynchronous command succeeded, right >>> +before the command is marked complete, after any mid-command callbacks >>> +have finished, and before any free functions. The completion callback >>> +is not reached if the asynchronous command itself fails, while free >>> +functions are reached regardless of the initial result of the >>> +asynchronous command. >>> >>> When the completion callback returns C<1>, the command is >>> automatically retired (there is no need to call >> I agree with this approach (i.e., with adapting the documentation to >> reality), but I find the language somewhat confusing. We have three terms: >> >> - "asynchronous command succeeds" >> - "command is marked complete" >> - "command is retired" >> >> The last two are mostly interchangeable in my view, and are also *not* >> confusing in the documentation. But the first term is confusing, IMO; it >> can easily be mistaken for meanings #2/#3. What we mean by #1 instead is >> the successful queueing (or submission) of the command. The text below >> does say "queued", but the text above doesn't. So I'd suggest replacing >> term#1 above with "call to asynchronous API succeeds" or "asynchronous >> command is successfully submitted" or something like that. >> >> (Side comment: I'd kinda prefer an all-or-nothing approach for async >> APIs. If the API fails at once, it should not take ownership of >> anything; i.e., it shouldn't call either completion or free callbacks. >> And if it succeeds, then it should take complete ownership. I'm not >> suggesting to rework callbacks whole-sale for this, though.) >> >> With the language clarified a bit: >> >> Acked-by: Laszlo Ersek <lersek at redhat.com> >> >> Thanks >> Laszlo >> > Here's what I squashed in, before pushing the series as > dfa9473c..28134d9e > > diff --git i/docs/libnbd.pod w/docs/libnbd.pod > index 433479e6..d8b22107 100644 > --- i/docs/libnbd.pod > +++ w/docs/libnbd.pod > @@ -171,8 +171,12 @@ There are several things to note here: > > =item * > > -This only starts the command. The command is still in flight when the > -call returns. > +This only starts the command. The command is (usually) still in > +flight when the call returns success, where you must rely on > +subsequent API calls for learning the final command outcome and > +trigger any remaining callbacks. However, you must also be able to > +handle the case where system load allows the state machine to advance > +far enough to invoke callbacks before the asynchronous API returns. > > =item * > > @@ -194,7 +198,10 @@ calls. The cookie is unique (per libnbd handle) and E<ge> 1. > > You may register a function which is called when the command > completes, see L</Completion callbacks> below. In this case we have > -specified a null completion callback. > +specified a null completion callback. If a completion callback is > +specified, it will only be called if the asynchronous command was > +sucessfully submitted (if the asynchronous API itself returns anShould probably be "successfully" instead of "sucessfully".> +error, there is nothing further to be completed). > > =back > > @@ -897,19 +904,25 @@ asynchronous commands are retired. > > =head2 Callbacks and locking > > -The callbacks are invoked at a point where the libnbd lock is held; as > -such, it is unsafe for the callback to call any C<nbd_*> APIs on the > -same nbd object, as it would cause deadlock. > +The callbacks are invoked at a point where the libnbd lock is held, > +typically during a call to C<nbd_aio_notify_read>, > +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can > +advance libnbd's state machine. Depending on system load, it is even > +possible for a callback to reached before completion of theShouldn't it be "to be reached" instead of "to reached"? Best regards, Tage> +C<nbd_aio_*> call that specified the callback. As such, it is unsafe > +for the callback to call any C<nbd_*> APIs on the same nbd object, as > +it would cause deadlock. > > =head2 Completion callbacks > > All of the asychronous commands have an optional completion callback > -function that is used if the asynchronous command succeeded, right > -before the command is marked complete, after any mid-command callbacks > +function that is used if the call to the asynchronous API reports > +success. The completion callback is invoked when the submitted > +command is eventually marked complete, after any mid-command callbacks > have finished, and before any free functions. The completion callback > -is not reached if the asynchronous command itself fails, while free > -functions are reached regardless of the initial result of the > -asynchronous command. > +is not reached if the asynchronous API itself fails, while free > +callbacks are reached regardless of the result of the initial > +asynchronous API. > > When the completion callback returns C<1>, the command is > automatically retired (there is no need to call > >-------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0xBFBE172D49294052.asc Type: application/pgp-keys Size: 2432 bytes Desc: OpenPGP public key URL: <http://listman.redhat.com/archives/libguestfs/attachments/20230716/dc118859/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: <http://listman.redhat.com/archives/libguestfs/attachments/20230716/dc118859/attachment.sig>
Reasonably Related Threads
- [libnbd PATCH 1/2] api: Tighten rules on completion.callback
- [libnbd PATCH 1/2] api: Tighten rules on completion.callback
- [libnbd PATCH 0/2] Fix docs and testing of completion callback
- Re: [PATCH libnbd] api: Rename nbd_aio_*_callback to nbd_aio_*.
- Libnbd asynchronous API with epoll