thr3ads.net - Libguestfs - [Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program [Aug 2022]

If this information is useful, please help other people find it:
Share via:

Ming Lei

2022-Aug-30 09:38 UTC

[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program

On Tue, Aug 30, 2022 at 10:07:40AM +0100, Richard W.M. Jones
wrote:> On Tue, Aug 30, 2022 at 04:30:40PM +0800, Ming Lei wrote:
> > On Tue, Aug 30, 2022 at 09:04:07AM +0100, Richard W.M. Jones wrote:
> > > On Tue, Aug 30, 2022 at 10:32:02AM +0800, Ming Lei wrote:
> > > > Hi Jones,
> > > > 
> > > > On Thu, Aug 25, 2022 at 01:10:55PM +0100, Richard W.M. Jones
wrote:
> > > > > This patch adds simple support for a ublk-based NBD
client.
> > > > > It is also available here:
> > > > > https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk/ublk
> > > > > 
> > > > > ublk is a way to write Linux block device drivers in
userspace:
> > > > 
> > > > Just looked at your nbdublk implementation a bit, basically
it is good,
> > > > and one really nice work.
> > > > 
> > > > Also follows two suggestions:
> > > > 
> > > > 1) the io_uring context is multilexed with ublk io command
handling, so
> > > > we should avoid to block in both ->handle_io_async() and
> > > > ->handle_event(), otherwise performance may be bad
> > > 
> > > The nbd_aio_* calls don't block.
> > > 
> > > However I noticed that I made a mistake with the trim and zero
paths
> > > because I am using synchronous (blocking) nbd_flush / nbd_trim /
> > > nbd_zero instead of nbd_aio_flush / nbd_aio_trim / nbd_aio_zero. 
I
> > > will fix this soon.
> > > 
> > > Nothing in handle_event should block except for the call to
> > > pthread_mutex_lock.  This lock is necessary because new commands
can
> > > be retired on the nbd_work_thread while handle_event is being
called
> > > from the io_uring thread.
> > 
> > Yes, I meant the mutex.
> > 
> > Also given nbd_work_thread() is required, it is reasonable to move
> > nbd_aio_* into nbd_work_thread(), so avoid to wait on the mutex in
> > io_uring context, then io command may not be forwarded to userspace
> > in time.
> 
> nbd_aio_* calls shouldn't block.  But how can we move those calls to
> nbd_work_thread()?  If we queue up commands in handle_io_async we
Please see aio_submitter() in my patch, the call has been moved to nbd
work thread already.
> would still have to use a lock on that queue.  I don't see how we can
> avoid some kind of locking in handle_io_async.
pthread_spin_lock isn't supposed to sleep, for protecting the list.

Completely lockless is possible too by using per-queue bitmap to mark which
requests is submitted & completed, one per-queue array to store the
result.

But per my test, nbd perf is still a bit low, so not sure this kind of
optimization makes difference.
> 
> > BTW what is the context for running callback of nbd_aio_*()?
> 
> It depends on whether nbd_aio_* can complete the command entirely
> without blocking.
> 
> The most common case is that nbd_aio_* calls a network function
> (eg. recv(2)) which returns EWOULDBLOCK.  Later, on nbd_work_thread,
> recv(2) is completed, the command is processed, and
> command_completed() is called.
> 
> It's unlikely, but possible, that the command_completed() function
> could be called directly from handle_io_async -> nbd_aio_* ->
> command_completed, eg. if recv(2) never blocks for some reason like
> it's talking over a Unix domain socket to a server which is very quick
> to reply.
OK, got it. I just tried to understand the whole flow.
> 
> > > > 2) in the implementation of nbd worker thread, there are two
sleep
> > > > points(wait for incoming io command, and network FD),
I'd suggest to use
> > > > poll to wait on any of them
> > > >
> > > > Recently I are working to add ublksrv io offloading or aio
> > > > interfaces on this sort of case in which io_uring can't
be used,
> > > > which may simplified this area, please see the attached
patch which
> > > > applies the above two points against your patch. And obvious
> > > > improvement can be observed on my simple fio test( randread,
io, 4k
> > > > bs, libaio) against backend of 'nbdkit file'.
> > > >
> > > > But these interfaces aren't merged to ublksrv github
tree yet, you can find
> > > > them in the aio branch, and demo_event.c is one example wrt.
how to use
> > > > them:
> > > > 
> > > >   https://github.com/ming1/ubdsrv/tree/aio
> > > >
> > > > Actually this interface can be improved further for nbdublk
case,
> > > > and the request allocation isn't needed actually for
this direct
> > > > offloading. But they are added for covering some IOs not
from ublk
> > > > driver, such as meta data, so 'struct ublksrv_aio'
is allocated.
> > > > I will try best to finalize them and merge to master branch.
> > > 
> > > I didn't really understand what these patches to ubdsrv do
when I
> > > looked at them before.  Maybe add some diagrams?
> > 
> > The patches are added recently. It is just for simplifying to offload
> > IO handling on another worker context, such as nbd_work_thread.
> > 
> > It uses one eventfd to notify the target worker thread when new
requests
> > are added to a list. Once the worker thread is wakeup, it fetches
> > requests from the list, after handle these requests, the worker thread
> > waits on both the eventfd and target IO(network FD) via poll().
> 
> In the patch, it seems there is no place which creates a pthread for
> nbd_work_thread.  Also ubdsrv aio branch does not call pthread_create.
> So I don't understand in what context nbd_work_thread is called.
nbd work thread is created by nbd target code just like before, but the thread
is
changed to the following way, basically bound with one aio_ctx:

  while (!ublksrv_aio_ctx_dead(aio_ctx)) {
      struct aio_list compl;

      aio_list_init(&compl);

	  //retrieve requests from submit list, and submit each one via
	  //aio_submitter(), if anyone is done, add it to &compl.
      ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl);

	  //add requests completed from command_completed() to &compl
      pthread_spin_lock(&c->lock);
      aio_list_splice(&c->list, &compl);
      pthread_spin_unlock(&c->lock);

      //notify io_uring thread for the completed requests in &compl,
	  //then batching complete & re-issue can be done in io_uring
	  //context
      ublksrv_aio_complete_worker(aio_ctx, &compl);

      //wait for network IO and evevfd from io_uring at the same time
	  //so if either one is ready, nbd_poll2() will return from sleep
      if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) {
        fprintf (stderr, "%s\n", nbd_get_error ());
        exit (EXIT_FAILURE);
      }
  }

> 
> I've never used eventfd before and the man page for it is very opaque.
> 
> > In your previous implementation, nbd work thread may wait on one
pthread
> > mutex and aio_poll(), this way may not be efficient, given when
waiting
> > on one event, another events can't be handled.
> 
> I'm not sure what "event" means in this context.  Does it
mean
> "NBD command"?  Or poll(2) event?
Here event is generic, I meant: NBD IO ready(exactly what aio_poll()
waits for) or io_uring eventfd ready(one write done from nbd_handle_io_async()).
> 
> There are multiple (usually 4) nbd_work threads, one for each NBD
> network connection.  Each NBD network connection can handle many
> commands in flight at once.
OK, but aio_poll() supposes to get notified if one command is done, so
here it is just the implementation detail, but correct me if I am wrong.


Thanks, 
Ming

Richard W.M. Jones

2022-Aug-30 10:51 UTC

head link

[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program

On Tue, Aug 30, 2022 at 05:38:35PM +0800, Ming Lei
wrote:> nbd work thread is created by nbd target code just like before, but
> the thread is changed to the following way, basically bound with one
> aio_ctx:
>   while (!ublksrv_aio_ctx_dead(aio_ctx)) {
>       struct aio_list compl;
> 
>       aio_list_init(&compl);
> 
> 	  //retrieve requests from submit list, and submit each one via
> 	  //aio_submitter(), if anyone is done, add it to &compl.
>       ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl);
> 
> 	  //add requests completed from command_completed() to &compl
>       pthread_spin_lock(&c->lock);
>       aio_list_splice(&c->list, &compl);
>       pthread_spin_unlock(&c->lock);
> 
>       //notify io_uring thread for the completed requests in &compl,
> 	  //then batching complete & re-issue can be done in io_uring
> 	  //context
>       ublksrv_aio_complete_worker(aio_ctx, &compl);
> 
>       //wait for network IO and evevfd from io_uring at the same time
> 	  //so if either one is ready, nbd_poll2() will return from sleep
>       if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) {
>         fprintf (stderr, "%s\n", nbd_get_error ());
>         exit (EXIT_FAILURE);
>       }
>   }
I think where I'm confused is where is pthread_create called to create
this thread?  (Or maybe this is being called on an io_uring thread context?)
> > I've never used eventfd before and the man page for it is very
opaque.
> > 
> > > In your previous implementation, nbd work thread may wait on one
pthread
> > > mutex and aio_poll(), this way may not be efficient, given when
waiting
> > > on one event, another events can't be handled.
> > 
> > I'm not sure what "event" means in this context.  Does
it mean
> > "NBD command"?  Or poll(2) event?
> 
> Here event is generic, I meant: NBD IO ready(exactly what aio_poll()
> waits for) or io_uring eventfd ready(one write done from
nbd_handle_io_async()).
> 
> > 
> > There are multiple (usually 4) nbd_work threads, one for each NBD
> > network connection.  Each NBD network connection can handle many
> > commands in flight at once.
> 
> OK, but aio_poll() supposes to get notified if one command is done, so
> here it is just the implementation detail, but correct me if I am wrong.
nbd_poll -> poll(2) -> POLLIN.  The state machine code (lib/states.c)
will call recv(2) and may complete zero, one or several NBD commands
before it blocks again.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Libguestfs - Aug 2022 - [PATCH libnbd] ublk: Add new nbdublk program

[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program

[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program