Eric Blake
2019-Jun-05 15:23 UTC
Re: [Libguestfs] [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.
On 6/5/19 6:15 AM, Richard W.M. Jones wrote:> Split h->state into: > > - h->state = the state on entry to the locked region > > - h->next_state = the current state and what the "publicly visible" > state will become when we leave the locked regionThe rest of this thread discusses potential other names, such as h->public_state for the state visible to unlocked code and h->state for internal while still locked; I trust that once you pick a naming scheme you like, you can redo this patch accordingly. I'll go ahead and review the changes as spelled here to at least see if I can spot any problems.> > Some calls to get_state become calls to get_next_state depending on > which of these they are trying to read. Calls to set_state become > set_next_state because that is what gets updated. > > When we leave the locked region we update h->state. > > The purpose of this patch is to make it easier to reason about the > state in lockless code.Lockless code (including nbd_aio_get_direction) can no longer see intermediate states that did not block, which is a lot nicer for deciding whether we will accidentally let the user's poll() loop block forever if it didn't see that we were blocked on POLLIN or POLLOUT. There's still always a TOCTTOU race (anywhere the user reads the public state while another thread holds the lock, their future actions based on the observed state may be unexpected by the time they actually grab the lock because the other thread changed state), as well as the issue I mentioned in on 3/4 about any consecutive user calls to nbd_aio_is_* having a race where the two calls can be done on different public states. I'm still thinking that for any functions that grab the lock but want to do gating on state MUST save the public state check until after grabbing the lock, not beforehand, to avoid the TOCTTOU race.> --- > generator/generator | 23 +++++++++++++---------- > lib/connect.c | 10 +++++----- > lib/disconnect.c | 8 ++++---- > lib/handle.c | 2 +- > lib/internal.h | 15 +++++++++++++-- > lib/rw.c | 4 ++-- > 6 files changed, 38 insertions(+), 24 deletions(-) >> @@ -2462,7 +2462,7 @@ let generate_lib_states_c () > pr " }\n"; > pr "\n"; > pr " set_error (EINVAL, \"external event %%d is invalid in state %%s\",\n"; > - pr " ev, nbd_internal_state_short_string (get_state (h)));\n"; > + pr " ev, nbd_internal_state_short_string (get_next_state (h)));\n";I'm also wondering if we should treat external events more like interrupts, where calling nbd_aio_notify_read() in an incorrect state succeeds by setting a flag that is not cleared until we later reach a state that can actually care about the flag, rather than by rejecting the call out-right for being in the wrong state. (That is, states where we are NOT expecting a notify_read() are the same as setting an IRQ mask that we don't want to be interrupted by a a read notification yet, but we don't want to lose the interrupt when we later clear the mask) In fact, CmdIssue is somewhat already like this - the code determines if the state machine is unmasked (in the READY state) to kick it off immediately; or is masked (in a processing state) to queue things but leave the interrupt set (the next time we enter READY, we unmask, observe the queue is non-empty, and so fire off the next command). Another thought - if we ever want to allow the user the ability to send custom NBD_OPT_ commands during handshake, or even to emulate 'qemu-nbd --list' where the client can make queries to see what the server supports before finally settling on whether to run NBD_OPT_GO or NBD_OPT_ABORT, we'll need to add an external event after nbd_aio_connect but before nbd_aio_is_connected for doing those additional handshake steps. It's easier to think about adding a mandatory nbd_aio_go() called in between nbd_aio_connect*() and the first nbd_aio_pread now, before we have to start worrying about back-compat issues to all existing AIO clients.> @@ -2866,8 +2866,11 @@ let generate_lib_api_c ()As said above, I think you're still missing a change at the beginning of the wrapper that grabs the lock to do the state check after the lock.> let argnames = List.flatten (List.map name_of_arg args) in > List.iter (pr ", %s") argnames; > pr ");\n"; > - if is_locked then > - pr " pthread_mutex_unlock (&h->lock);\n"; > + if is_locked then ( > + pr " if (h->state != h->next_state)\n"; > + pr " h->state = h->next_state;\n"; > + pr " pthread_mutex_unlock (&h->lock);\n" > + ); > pr " return ret;\n"; > pr "}\n"; > pr "\n"; > diff --git a/lib/connect.c b/lib/connect.c > index b889f80..4e3141f 100644 > --- a/lib/connect.c> +++ b/lib/internal.h > @@ -80,7 +80,17 @@ struct nbd_handle { > /* Linked list of close callbacks. */ > struct close_callback *close_callbacks; > > - _Atomic enum state state; /* State machine. */ > + /* State machine. > + * > + * The actual current state is ‘next_state’. ‘state’ is updated > + * before we release the lock. > + * > + * Note don't access these fields directly, use the SET_NEXT_STATE > + * macro in generator/states* code, or the set_next_state, > + * get_next_state and get_state macros in regular code. > + */ > + _Atomic enum state state; > + enum state next_state; >Otherwise the patch looks like you caught the right places. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2019-Jun-08 17:45 UTC
Re: [Libguestfs] [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.
I think in general we should have a good idea of the thread model that we support with libnbd. At the moment we have a handle lock which only means that simultaneous calls into the handle won't corrupt it; however it doesn't mean that you don't expect the handle state to change between calls if the handle is accessible by two threads. Is that the model we trying to achieve or something else? Does libnbd need to support more complex thread models or can we rely on callers to use their own mutex on top? More comments below ... On Wed, Jun 05, 2019 at 10:23:57AM -0500, Eric Blake wrote:> I'm also wondering if we should treat external events more like > interrupts, where calling nbd_aio_notify_read() in an incorrect state > succeeds by setting a flag that is not cleared until we later reach a > state that can actually care about the flag, rather than by rejecting > the call out-right for being in the wrong state. (That is, states where > we are NOT expecting a notify_read() are the same as setting an IRQ mask > that we don't want to be interrupted by a a read notification yet, but > we don't want to lose the interrupt when we later clear the mask)Yes, I think this is a good idea, and not difficult to implement.> In fact, CmdIssue is somewhat already like this - the code determines if > the state machine is unmasked (in the READY state) to kick it off > immediately; or is masked (in a processing state) to queue things but > leave the interrupt set (the next time we enter READY, we unmask, > observe the queue is non-empty, and so fire off the next command).CmdIssue is also an external event. If it was a level-triggered interrupt then we'd set a flag in the handle when it happens, and the state machine could do all of the above in generated code. The only potential problem I can think of at the moment is the usual one with interrupts which is that second and subsequent interrupts are lost if the first interrupt hasn't been processed. However for the external events we have at the moment: | NotifyRead (* fd becomes ready to read *) | NotifyWrite (* fd becomes ready to write *) | CmdCreate (* [nbd_create] function called *) | CmdConnectSockAddr (* [nbd_aio_connect] function called *) | CmdConnectUnix (* [nbd_aio_connect_unix] *) | CmdConnectTCP (* [nbd_aio_connect_tcp] *) | CmdConnectCommand (* [nbd_aio_connect_command] *) | CmdIssue (* issuing an NBD command *) it seems as if this wouldn't be an issue for any of them. My reasoning is: NotifyRead/Write work like this in the kernel already. CmdCreate is only sent once. CmdConnect* are only sent once. CmdIssue can be sent multiple times but commands are queued so as long as the writer side always checks the issue queue at appropriate points we're OK.> Another thought - if we ever want to allow the user the ability to send > custom NBD_OPT_ commands during handshake, or even to emulate 'qemu-nbd > --list' where the client can make queries to see what the server > supports before finally settling on whether to run NBD_OPT_GO or > NBD_OPT_ABORT, we'll need to add an external event after nbd_aio_connect > but before nbd_aio_is_connected for doing those additional handshake > steps. It's easier to think about adding a mandatory nbd_aio_go() called > in between nbd_aio_connect*() and the first nbd_aio_pread now, before we > have to start worrying about back-compat issues to all existing AIO clients.I envisaged that we'd just set a flag in the handle to indicate that we were in "list mode".> > @@ -2866,8 +2866,11 @@ let generate_lib_api_c () > > As said above, I think you're still missing a change at the beginning of > the wrapper that grabs the lock to do the state check after the lock.AFAIK the code is safe now? I'm not sure what you mean by this.> > let argnames = List.flatten (List.map name_of_arg args) in > > List.iter (pr ", %s") argnames; > > pr ");\n"; > > - if is_locked then > > - pr " pthread_mutex_unlock (&h->lock);\n"; > > + if is_locked then ( > > + pr " if (h->state != h->next_state)\n"; > > + pr " h->state = h->next_state;\n"; > > + pr " pthread_mutex_unlock (&h->lock);\n" > > + ); > > pr " return ret;\n"; > > pr "}\n"; > > pr "\n"; > > diff --git a/lib/connect.c b/lib/connect.c > > index b889f80..4e3141f 100644 > > --- a/lib/connect.c > > > +++ b/lib/internal.h > > @@ -80,7 +80,17 @@ struct nbd_handle { > > /* Linked list of close callbacks. */ > > struct close_callback *close_callbacks; > > > > - _Atomic enum state state; /* State machine. */ > > + /* State machine. > > + * > > + * The actual current state is ‘next_state’. ‘state’ is updated > > + * before we release the lock. > > + * > > + * Note don't access these fields directly, use the SET_NEXT_STATE > > + * macro in generator/states* code, or the set_next_state, > > + * get_next_state and get_state macros in regular code. > > + */ > > + _Atomic enum state state; > > + enum state next_state; > > > > Otherwise the patch looks like you caught the right places.Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2019-Jun-08 17:50 UTC
Re: [Libguestfs] [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.
On Sat, Jun 08, 2019 at 06:45:53PM +0100, Richard W.M. Jones wrote:> > > @@ -2866,8 +2866,11 @@ let generate_lib_api_c () > > > > As said above, I think you're still missing a change at the beginning of > > the wrapper that grabs the lock to do the state check after the lock. > > AFAIK the code is safe now? I'm not sure what you mean by this.Ignore this, your review of the updated patch makes it clear what you mean. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Apparently Analagous Threads
- Re: [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.
- [PATCH libnbd 4/4] lib: Atomically update h->state when leaving the locked region.
- [PATCH libnbd v3] lib: Atomically update h->state when leaving the locked region.
- [PATCH libnbd v2] lib: Atomically update h->state when leaving the locked region.
- [PATCH libnbd 2/2] api: Add support for AF_VSOCK.