On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson wrote:> > On 7/13/2023 4:36 PM, Eric Blake wrote: > >On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote: > >>On 7/13/2023 3:26 PM, Richard W.M. Jones wrote: > >>>On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote: > >>>>On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote: > >>>>>>>I have replaced a call to `nbd_opt_info()` with a call to > >>>>>>>`nbd_aio_opt_info()` and passed in a completion callback which just > >>>>>>>calls `exit(EXIT_FAILURE)`. So if the completion callback is called > >>>>>>>the test should fail, which it doesn't, at least not on my machine. > >>>>>>Isn't that OK? Only .free is required to be called. > >>>>>For the context callback (for opt_set_meta), .callback may be called > >>>>>zero, one, or multiple times, but .free should be called exactly once. > >>>>>But for the completion callback, I agree that the docs state that both > >>>>>.callback and .free should each be called exactly once ("On the other > >>>>>hand, if a completion callback is supplied (only possible with > >>>>>asynchronous commands), it will always be reached exactly once, and > >>>>>the completion callback must not ignore the value pointed to by > >>>>>C<error>."); we are indeed missing the call to .callback. I'll work > >>>>>on a patch. > >>>>Eww, the bug is bigger than just nbd_aio_opt* not always calling > >>>>completion.callback exactly once. With just this diff (to be folded > >>>>into the larger patch I'm working on), I'm getting an assertion > >>>>failure that we fail to call completion.callback for > >>>>nbd_aio_pread_structured when calling nbd_close() prior to the command > >>>>running to completion, so I'll have to fix that too. > >>>Just to be clear about this, are we really sure the completion > >>>callback should always be called once? I'm not clear why that should > >>>be the case. (The .free callback however should be.) > >>> > >>>For example, if I call a function with bogus invalid parameters so > >>>that it fails very early on (or when the handle is in an incorrect > >>>state), should I expect the completion callback to be called? I would > >>>expect not. > >>> > >>>Rich. > >> > >>The user needs a way to know if an error occurred. So the completion > >>callback must be called if the asynchronous function did not fail (returned > >>0). If the completion callback should be called, with the error parameter > >>set, even when the asynchronous function immediately failed with a non-zero > >>return value is another question. I see two possibilities: Either the > >>completion callback should always be called. Or it should be called iff the > >>asynchronous function returned 0 (did not fail). > >Good point. > > > >Our documentation currently states (docs/libnbd.pod) that the > >completion callback is ALWAYS called, but this is inconsistent with > >current code - you are correct that at present, the completion > >callback is NOT called if the aio command returns non-zero (easy test: > >call nbd_aio_block_status before nbd_connect*). I'm game to updating > >that part of the documentation to match existing practice (changing > >our guarantee to the completion callback will be called once iff the > >aio command reported success), since our testsuite wasn't covering it, > >and it is probably an easier fix than munging the generator to call > >completion.callback even for aio failures. Meanwhile, the .free > >callback MUST be called unconditionally, and I think our testsuite is > >already covering that. > > > >Life-cycle wise, I see the following sequence as being something we > >could usefully rely on (although we don't yet have enough testsuite > >coverage to prove that we already have it). Note that it is not only > >important how many times things are called, but I would like it if we > >can guarantee the specific ordering between them (neither .free should > >be called until all .callback opportunities are complete finished, to > >avoid any use-after-free issues regardless of which of the two .free > >slots the programmer actually uses): > > > >- if aio command fails: > >mid-command .callback: 0 calls > >completion .callback: 0 calls > >mid-command .free: 1 call > >completion .free: 1 call > > > >- if aio command succeeds: > >mid-command .callback: 0, 1, or multiple times > >completion .callback: 1 call > >mid-command .free: 1 call > >completion .free: 1 call > > > >What I'm not sure of yet (without more code inspection) is whether we > >can sometimes call completion.callback after mid-command.free. > > > >> > >>By the way, if the error parameter is set in the completion callback, how > >>can the user retrieve the error text? Is it possible to call > >>`get_nbd_error(3)` from inside the completion callback? > >You mean nbd_get_error(). We currently document that it is not safe > >to call any nbd_* from within a callback. Maybe we could relax that > >to carve out exceptions for nbd_get_error/errno as special cases, > >since they only inspect thread-local state and can be used even when > >there is no current nbd_handle. The problem is that I'm not sure if > >there is a reliable string at that point in time: part of the reason > >the completion callback has an errnum parameter is that the point of > >reporting the completion failure may be in a different thread than > >when the failure was first detected, and we only preserved a numeric > >value in cmd->error rather than also preserving a string. > > > >Put another way, there is no way to guarantee that nbd_get_errno() > >will return the same value as the errnum parameter to the callback; > >and if that is true, then even if calling nbd_get_error() doesn't > >crash or deadlock from within a callback, it is likewise probable that > >any such string returned is not consistent with the errnum parameter > >passed to the callback. > > > So is there any safe way to get some description of the error from a > completion callback apart from a non-zero number? It isn't too > helpful to report to the user that the read operation faild with -1.As I recall, from the callback, no. The error is not lost however, it will be returned by the API call itself. eg. If you're in nbd_aio_opt_list -> callback (error) then nbd_aio_opt_list will return -1 and at that point you can use nbd_get_error to report the error. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:> On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson wrote: >> On 7/13/2023 4:36 PM, Eric Blake wrote: >>> On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote: >>>> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote: >>>>> On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote: >>>>>> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote: >>>>>>>>> I have replaced a call to `nbd_opt_info()` with a call to >>>>>>>>> `nbd_aio_opt_info()` and passed in a completion callback which just >>>>>>>>> calls `exit(EXIT_FAILURE)`. So if the completion callback is called >>>>>>>>> the test should fail, which it doesn't, at least not on my machine. >>>>>>>> Isn't that OK? Only .free is required to be called. >>>>>>> For the context callback (for opt_set_meta), .callback may be called >>>>>>> zero, one, or multiple times, but .free should be called exactly once. >>>>>>> But for the completion callback, I agree that the docs state that both >>>>>>> .callback and .free should each be called exactly once ("On the other >>>>>>> hand, if a completion callback is supplied (only possible with >>>>>>> asynchronous commands), it will always be reached exactly once, and >>>>>>> the completion callback must not ignore the value pointed to by >>>>>>> C<error>."); we are indeed missing the call to .callback. I'll work >>>>>>> on a patch. >>>>>> Eww, the bug is bigger than just nbd_aio_opt* not always calling >>>>>> completion.callback exactly once. With just this diff (to be folded >>>>>> into the larger patch I'm working on), I'm getting an assertion >>>>>> failure that we fail to call completion.callback for >>>>>> nbd_aio_pread_structured when calling nbd_close() prior to the command >>>>>> running to completion, so I'll have to fix that too. >>>>> Just to be clear about this, are we really sure the completion >>>>> callback should always be called once? I'm not clear why that should >>>>> be the case. (The .free callback however should be.) >>>>> >>>>> For example, if I call a function with bogus invalid parameters so >>>>> that it fails very early on (or when the handle is in an incorrect >>>>> state), should I expect the completion callback to be called? I would >>>>> expect not. >>>>> >>>>> Rich. >>>> The user needs a way to know if an error occurred. So the completion >>>> callback must be called if the asynchronous function did not fail (returned >>>> 0). If the completion callback should be called, with the error parameter >>>> set, even when the asynchronous function immediately failed with a non-zero >>>> return value is another question. I see two possibilities: Either the >>>> completion callback should always be called. Or it should be called iff the >>>> asynchronous function returned 0 (did not fail). >>> Good point. >>> >>> Our documentation currently states (docs/libnbd.pod) that the >>> completion callback is ALWAYS called, but this is inconsistent with >>> current code - you are correct that at present, the completion >>> callback is NOT called if the aio command returns non-zero (easy test: >>> call nbd_aio_block_status before nbd_connect*). I'm game to updating >>> that part of the documentation to match existing practice (changing >>> our guarantee to the completion callback will be called once iff the >>> aio command reported success), since our testsuite wasn't covering it, >>> and it is probably an easier fix than munging the generator to call >>> completion.callback even for aio failures. Meanwhile, the .free >>> callback MUST be called unconditionally, and I think our testsuite is >>> already covering that. >>> >>> Life-cycle wise, I see the following sequence as being something we >>> could usefully rely on (although we don't yet have enough testsuite >>> coverage to prove that we already have it). Note that it is not only >>> important how many times things are called, but I would like it if we >>> can guarantee the specific ordering between them (neither .free should >>> be called until all .callback opportunities are complete finished, to >>> avoid any use-after-free issues regardless of which of the two .free >>> slots the programmer actually uses): >>> >>> - if aio command fails: >>> mid-command .callback: 0 calls >>> completion .callback: 0 calls >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> - if aio command succeeds: >>> mid-command .callback: 0, 1, or multiple times >>> completion .callback: 1 call >>> mid-command .free: 1 call >>> completion .free: 1 call >>> >>> What I'm not sure of yet (without more code inspection) is whether we >>> can sometimes call completion.callback after mid-command.free. >>> >>>> By the way, if the error parameter is set in the completion callback, how >>>> can the user retrieve the error text? Is it possible to call >>>> `get_nbd_error(3)` from inside the completion callback? >>> You mean nbd_get_error(). We currently document that it is not safe >>> to call any nbd_* from within a callback. Maybe we could relax that >>> to carve out exceptions for nbd_get_error/errno as special cases, >>> since they only inspect thread-local state and can be used even when >>> there is no current nbd_handle. The problem is that I'm not sure if >>> there is a reliable string at that point in time: part of the reason >>> the completion callback has an errnum parameter is that the point of >>> reporting the completion failure may be in a different thread than >>> when the failure was first detected, and we only preserved a numeric >>> value in cmd->error rather than also preserving a string. >>> >>> Put another way, there is no way to guarantee that nbd_get_errno() >>> will return the same value as the errnum parameter to the callback; >>> and if that is true, then even if calling nbd_get_error() doesn't >>> crash or deadlock from within a callback, it is likewise probable that >>> any such string returned is not consistent with the errnum parameter >>> passed to the callback. >> >> So is there any safe way to get some description of the error from a >> completion callback apart from a non-zero number? It isn't too >> helpful to report to the user that the read operation faild with -1. > As I recall, from the callback, no. > > The error is not lost however, it will be returned by the API call > itself. eg. If you're in nbd_aio_opt_list -> callback (error) then > nbd_aio_opt_list will return -1 and at that point you can use > nbd_get_error to report the error.I don't understand. If I call `nbd_aio_opt_list()` with a completion callback. `nbd_aio_opt_list()` will return immediately, maybe with a successful result. But the command is not complete until `nbd_aio_is_connecting()` returns `false`, so the completion callback may be invoked with an error after `nbd_aio_opt_list()` has returned. Also, does the value of `err` (passed into a caller) has any meaning like a known errno value or something else that can be converted to a description? Or is it just an arbitrary non zero integer?> Rich. >