On Thu, Jul 13, 2023 at 05:46:56PM +0100, Richard W.M. Jones
wrote:> On Thu, Jul 13, 2023 at 04:18:03PM +0000, Tage Johansson wrote:
> >
> > On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:
> > >On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson wrote:
> > >>On 7/13/2023 4:36 PM, Eric Blake wrote:
> > >>>On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson
wrote:
> > >>>>On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> > >>>>>On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric
Blake wrote:
> > >>>>>>On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric
Blake wrote:
> > >>>>>>>>>I have replaced a call to
`nbd_opt_info()` with a call to
> > >>>>>>>>>`nbd_aio_opt_info()` and passed in
a completion callback which just
> > >>>>>>>>>calls `exit(EXIT_FAILURE)`. So if
the completion callback is called
> > >>>>>>>>>the test should fail, which it
doesn't, at least not on my machine.
> > >>>>>>>>Isn't that OK? Only .free is
required to be called.
> > >>>>>>>For the context callback (for
opt_set_meta), .callback may be called
> > >>>>>>>zero, one, or multiple times, but .free
should be called exactly once.
> > >>>>>>>But for the completion callback, I agree
that the docs state that both
> > >>>>>>>.callback and .free should each be called
exactly once ("On the other
> > >>>>>>>hand, if a completion callback is supplied
(only possible with
> > >>>>>>>asynchronous commands), it will always be
reached exactly once, and
> > >>>>>>>the completion callback must not ignore
the value pointed to by
> > >>>>>>>C<error>."); we are indeed
missing the call to .callback. I'll work
> > >>>>>>>on a patch.
> > >>>>>>Eww, the bug is bigger than just nbd_aio_opt*
not always calling
> > >>>>>>completion.callback exactly once. With just
this diff (to be folded
> > >>>>>>into the larger patch I'm working on),
I'm getting an assertion
> > >>>>>>failure that we fail to call
completion.callback for
> > >>>>>>nbd_aio_pread_structured when calling
nbd_close() prior to the command
> > >>>>>>running to completion, so I'll have to fix
that too.
> > >>>>>Just to be clear about this, are we really sure
the completion
> > >>>>>callback should always be called once? I'm
not clear why that should
> > >>>>>be the case. (The .free callback however should
be.)
> > >>>>>
> > >>>>>For example, if I call a function with bogus
invalid parameters so
> > >>>>>that it fails very early on (or when the handle is
in an incorrect
> > >>>>>state), should I expect the completion callback to
be called? I would
> > >>>>>expect not.
> > >>>>>
> > >>>>>Rich.
> > >>>>The user needs a way to know if an error occurred. So
the completion
> > >>>>callback must be called if the asynchronous function
did not fail (returned
> > >>>>0). If the completion callback should be called, with
the error parameter
> > >>>>set, even when the asynchronous function immediately
failed with a non-zero
> > >>>>return value is another question. I see two
possibilities: Either the
> > >>>>completion callback should always be called. Or it
should be called iff the
> > >>>>asynchronous function returned 0 (did not fail).
> > >>>Good point.
> > >>>
> > >>>Our documentation currently states (docs/libnbd.pod) that
the
> > >>>completion callback is ALWAYS called, but this is
inconsistent with
> > >>>current code - you are correct that at present, the
completion
> > >>>callback is NOT called if the aio command returns non-zero
(easy test:
> > >>>call nbd_aio_block_status before nbd_connect*). I'm
game to updating
> > >>>that part of the documentation to match existing practice
(changing
> > >>>our guarantee to the completion callback will be called
once iff the
> > >>>aio command reported success), since our testsuite
wasn't covering it,
> > >>>and it is probably an easier fix than munging the
generator to call
> > >>>completion.callback even for aio failures. Meanwhile, the
.free
> > >>>callback MUST be called unconditionally, and I think our
testsuite is
> > >>>already covering that.
> > >>>
> > >>>Life-cycle wise, I see the following sequence as being
something we
> > >>>could usefully rely on (although we don't yet have
enough testsuite
> > >>>coverage to prove that we already have it). Note that it
is not only
> > >>>important how many times things are called, but I would
like it if we
> > >>>can guarantee the specific ordering between them (neither
.free should
> > >>>be called until all .callback opportunities are complete
finished, to
> > >>>avoid any use-after-free issues regardless of which of the
two .free
> > >>>slots the programmer actually uses):
> > >>>
> > >>>- if aio command fails:
> > >>>mid-command .callback: 0 calls
> > >>>completion .callback: 0 calls
> > >>>mid-command .free: 1 call
> > >>>completion .free: 1 call
> > >>>
> > >>>- if aio command succeeds:
> > >>>mid-command .callback: 0, 1, or multiple times
> > >>>completion .callback: 1 call
> > >>>mid-command .free: 1 call
> > >>>completion .free: 1 call
> > >>>
> > >>>What I'm not sure of yet (without more code
inspection) is whether we
> > >>>can sometimes call completion.callback after
mid-command.free.
> > >>>
> > >>>>By the way, if the error parameter is set in the
completion callback, how
> > >>>>can the user retrieve the error text? Is it possible
to call
> > >>>>`get_nbd_error(3)` from inside the completion
callback?
> > >>>You mean nbd_get_error(). We currently document that it
is not safe
> > >>>to call any nbd_* from within a callback. Maybe we could
relax that
> > >>>to carve out exceptions for nbd_get_error/errno as special
cases,
> > >>>since they only inspect thread-local state and can be used
even when
> > >>>there is no current nbd_handle. The problem is that
I'm not sure if
> > >>>there is a reliable string at that point in time: part of
the reason
> > >>>the completion callback has an errnum parameter is that
the point of
> > >>>reporting the completion failure may be in a different
thread than
> > >>>when the failure was first detected, and we only preserved
a numeric
> > >>>value in cmd->error rather than also preserving a
string.
> > >>>
> > >>>Put another way, there is no way to guarantee that
nbd_get_errno()
> > >>>will return the same value as the errnum parameter to the
callback;
> > >>>and if that is true, then even if calling nbd_get_error()
doesn't
> > >>>crash or deadlock from within a callback, it is likewise
probable that
> > >>>any such string returned is not consistent with the errnum
parameter
> > >>>passed to the callback.
> > >>
> > >>So is there any safe way to get some description of the error
from a
> > >>completion callback apart from a non-zero number? It isn't
too
> > >>helpful to report to the user that the read operation faild
with -1.
> > >As I recall, from the callback, no.
> > >
> > >The error is not lost however, it will be returned by the API call
> > >itself. eg. If you're in nbd_aio_opt_list -> callback
(error) then
> > >nbd_aio_opt_list will return -1 and at that point you can use
> > >nbd_get_error to report the error.
> >
> >
> > I don't understand. If I call `nbd_aio_opt_list()` with a
completion
> > callback. `nbd_aio_opt_list()` will return immediately, maybe with a
> > successful result. But the command is not complete until
> > `nbd_aio_is_connecting()` returns `false`, so the completion
> > callback may be invoked with an error after `nbd_aio_opt_list()` has
> > returned.
>
> It's returned by some later call, such as nbd_aio_notify_read.
>
> Basically libnbd doesn't create threads, so the only place that
> callbacks can be called is when you call into libnbd. That function
> that you call is what will return an error.
>
> > Also, does the value of `err` (passed into a caller) has any meaning
> > like a known errno value or something else that can be converted to
> > a description? Or is it just an arbitrary non zero integer?
>
> It's a C errno number. Setting it will change what nbd_get_errno
> returns.
>
> In the OCaml bindings we actually provide a function to translate
> OCaml <-> C errnos for this purpose:
>
> val errno_of_unix_error : Unix.error -> int
> (** Return the raw C errno corresponding to a {!Unix.error}. This
> can be used in callbacks to update the [int ref] parameter. *)
ocaml/tests/test_505_aio_pread_structured_callback.ml has an example
of how this is used. The errno is translated twice, once from
OCaml -> C inside the callback. And a second time from C -> OCaml
when the error is returned by the NBD.* call.
Rich.
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> nbdkit - Flexible, fast NBD server with plugins
> https://gitlab.com/nbdkit/nbdkit
> _______________________________________________
> Libguestfs mailing list
> Libguestfs at redhat.com
> https://listman.redhat.com/mailman/listinfo/libguestfs
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW