On Tue, Aug 30, 2022 at 10:07:40AM +0100, Richard W.M. Jones wrote:> On Tue, Aug 30, 2022 at 04:30:40PM +0800, Ming Lei wrote: > > On Tue, Aug 30, 2022 at 09:04:07AM +0100, Richard W.M. Jones wrote: > > > On Tue, Aug 30, 2022 at 10:32:02AM +0800, Ming Lei wrote: > > > > Hi Jones, > > > > > > > > On Thu, Aug 25, 2022 at 01:10:55PM +0100, Richard W.M. Jones wrote: > > > > > This patch adds simple support for a ublk-based NBD client. > > > > > It is also available here: > > > > > https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk/ublk > > > > > > > > > > ublk is a way to write Linux block device drivers in userspace: > > > > > > > > Just looked at your nbdublk implementation a bit, basically it is good, > > > > and one really nice work. > > > > > > > > Also follows two suggestions: > > > > > > > > 1) the io_uring context is multilexed with ublk io command handling, so > > > > we should avoid to block in both ->handle_io_async() and > > > > ->handle_event(), otherwise performance may be bad > > > > > > The nbd_aio_* calls don't block. > > > > > > However I noticed that I made a mistake with the trim and zero paths > > > because I am using synchronous (blocking) nbd_flush / nbd_trim / > > > nbd_zero instead of nbd_aio_flush / nbd_aio_trim / nbd_aio_zero. I > > > will fix this soon. > > > > > > Nothing in handle_event should block except for the call to > > > pthread_mutex_lock. This lock is necessary because new commands can > > > be retired on the nbd_work_thread while handle_event is being called > > > from the io_uring thread. > > > > Yes, I meant the mutex. > > > > Also given nbd_work_thread() is required, it is reasonable to move > > nbd_aio_* into nbd_work_thread(), so avoid to wait on the mutex in > > io_uring context, then io command may not be forwarded to userspace > > in time. > > nbd_aio_* calls shouldn't block. But how can we move those calls to > nbd_work_thread()? If we queue up commands in handle_io_async wePlease see aio_submitter() in my patch, the call has been moved to nbd work thread already.> would still have to use a lock on that queue. I don't see how we can > avoid some kind of locking in handle_io_async.pthread_spin_lock isn't supposed to sleep, for protecting the list. Completely lockless is possible too by using per-queue bitmap to mark which requests is submitted & completed, one per-queue array to store the result. But per my test, nbd perf is still a bit low, so not sure this kind of optimization makes difference.> > > BTW what is the context for running callback of nbd_aio_*()? > > It depends on whether nbd_aio_* can complete the command entirely > without blocking. > > The most common case is that nbd_aio_* calls a network function > (eg. recv(2)) which returns EWOULDBLOCK. Later, on nbd_work_thread, > recv(2) is completed, the command is processed, and > command_completed() is called. > > It's unlikely, but possible, that the command_completed() function > could be called directly from handle_io_async -> nbd_aio_* -> > command_completed, eg. if recv(2) never blocks for some reason like > it's talking over a Unix domain socket to a server which is very quick > to reply.OK, got it. I just tried to understand the whole flow.> > > > > 2) in the implementation of nbd worker thread, there are two sleep > > > > points(wait for incoming io command, and network FD), I'd suggest to use > > > > poll to wait on any of them > > > > > > > > Recently I are working to add ublksrv io offloading or aio > > > > interfaces on this sort of case in which io_uring can't be used, > > > > which may simplified this area, please see the attached patch which > > > > applies the above two points against your patch. And obvious > > > > improvement can be observed on my simple fio test( randread, io, 4k > > > > bs, libaio) against backend of 'nbdkit file'. > > > > > > > > But these interfaces aren't merged to ublksrv github tree yet, you can find > > > > them in the aio branch, and demo_event.c is one example wrt. how to use > > > > them: > > > > > > > > https://github.com/ming1/ubdsrv/tree/aio > > > > > > > > Actually this interface can be improved further for nbdublk case, > > > > and the request allocation isn't needed actually for this direct > > > > offloading. But they are added for covering some IOs not from ublk > > > > driver, such as meta data, so 'struct ublksrv_aio' is allocated. > > > > I will try best to finalize them and merge to master branch. > > > > > > I didn't really understand what these patches to ubdsrv do when I > > > looked at them before. Maybe add some diagrams? > > > > The patches are added recently. It is just for simplifying to offload > > IO handling on another worker context, such as nbd_work_thread. > > > > It uses one eventfd to notify the target worker thread when new requests > > are added to a list. Once the worker thread is wakeup, it fetches > > requests from the list, after handle these requests, the worker thread > > waits on both the eventfd and target IO(network FD) via poll(). > > In the patch, it seems there is no place which creates a pthread for > nbd_work_thread. Also ubdsrv aio branch does not call pthread_create. > So I don't understand in what context nbd_work_thread is called.nbd work thread is created by nbd target code just like before, but the thread is changed to the following way, basically bound with one aio_ctx: while (!ublksrv_aio_ctx_dead(aio_ctx)) { struct aio_list compl; aio_list_init(&compl); //retrieve requests from submit list, and submit each one via //aio_submitter(), if anyone is done, add it to &compl. ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl); //add requests completed from command_completed() to &compl pthread_spin_lock(&c->lock); aio_list_splice(&c->list, &compl); pthread_spin_unlock(&c->lock); //notify io_uring thread for the completed requests in &compl, //then batching complete & re-issue can be done in io_uring //context ublksrv_aio_complete_worker(aio_ctx, &compl); //wait for network IO and evevfd from io_uring at the same time //so if either one is ready, nbd_poll2() will return from sleep if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } }> > I've never used eventfd before and the man page for it is very opaque. > > > In your previous implementation, nbd work thread may wait on one pthread > > mutex and aio_poll(), this way may not be efficient, given when waiting > > on one event, another events can't be handled. > > I'm not sure what "event" means in this context. Does it mean > "NBD command"? Or poll(2) event?Here event is generic, I meant: NBD IO ready(exactly what aio_poll() waits for) or io_uring eventfd ready(one write done from nbd_handle_io_async()).> > There are multiple (usually 4) nbd_work threads, one for each NBD > network connection. Each NBD network connection can handle many > commands in flight at once.OK, but aio_poll() supposes to get notified if one command is done, so here it is just the implementation detail, but correct me if I am wrong. Thanks, Ming
Richard W.M. Jones
2022-Aug-30 10:51 UTC
[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program
On Tue, Aug 30, 2022 at 05:38:35PM +0800, Ming Lei wrote:> nbd work thread is created by nbd target code just like before, but > the thread is changed to the following way, basically bound with one > aio_ctx:> while (!ublksrv_aio_ctx_dead(aio_ctx)) { > struct aio_list compl; > > aio_list_init(&compl); > > //retrieve requests from submit list, and submit each one via > //aio_submitter(), if anyone is done, add it to &compl. > ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl); > > //add requests completed from command_completed() to &compl > pthread_spin_lock(&c->lock); > aio_list_splice(&c->list, &compl); > pthread_spin_unlock(&c->lock); > > //notify io_uring thread for the completed requests in &compl, > //then batching complete & re-issue can be done in io_uring > //context > ublksrv_aio_complete_worker(aio_ctx, &compl); > > //wait for network IO and evevfd from io_uring at the same time > //so if either one is ready, nbd_poll2() will return from sleep > if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) { > fprintf (stderr, "%s\n", nbd_get_error ()); > exit (EXIT_FAILURE); > } > }I think where I'm confused is where is pthread_create called to create this thread? (Or maybe this is being called on an io_uring thread context?)> > I've never used eventfd before and the man page for it is very opaque. > > > > > In your previous implementation, nbd work thread may wait on one pthread > > > mutex and aio_poll(), this way may not be efficient, given when waiting > > > on one event, another events can't be handled. > > > > I'm not sure what "event" means in this context. Does it mean > > "NBD command"? Or poll(2) event? > > Here event is generic, I meant: NBD IO ready(exactly what aio_poll() > waits for) or io_uring eventfd ready(one write done from nbd_handle_io_async()). > > > > > There are multiple (usually 4) nbd_work threads, one for each NBD > > network connection. Each NBD network connection can handle many > > commands in flight at once. > > OK, but aio_poll() supposes to get notified if one command is done, so > here it is just the implementation detail, but correct me if I am wrong.nbd_poll -> poll(2) -> POLLIN. The state machine code (lib/states.c) will call recv(2) and may complete zero, one or several NBD commands before it blocks again. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html