The following patches apply over linus's tree or mst's vhost branch and my cleanup patchset: https://lists.linuxfoundation.org/pipermail/virtualization/2021-May/054354.html These patches allow us to support multiple vhost workers per device. I ended up just doing Stefan's original idea where userspace has the kernel create a worker and we pass back the pid. This has the benefit over the workqueue and userspace thread approach where we only have one'ish code path in the kernel during setup to detect old tools. The main IO paths and device/vq setup/teardown paths all use common code. The kernel patches here allow us to then do N workers device and also share workers across devices. I've also included a patch for qemu so you can get an idea of how it works. If we are ok with the kernel code then I'll break that up into a patchset and send to qemu-devel. Results: -------- When running with the null_blk driver and vhost-scsi I can get 1.2 million IOPs by just running a simple fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --numjobs=8 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1 The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of 1024 cmds per devices. To get 1.2 million IOPs I did have to tune and ran the virsh emulatorpin command so the vhost threads were running on different CPUs than the VM. If the vhost threads share CPUs then I get around 800K. For a more real device that are also CPU hogs like iscsi, I can still get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths (natively it gets 1.1 million IOPs). Results/TODO Note: - I ported the vdpa sim code to support multiple workers and as-is now it made perf much worse. If I increase vdpa_sim_blk's num queues to 4-8 I get 700K IOPs with the fio command above. However with the multiple worker support it drops to 400K. The problem is the vdpa_sim lock and the iommu_lock. If I hack (like comment out locks or not worry about data corruption or crashes) then I can get around 1.2M - 1.6M IOPs with 8 queues and fio command above. So these patches could help other drivers, but it will just take more work to remove those types of locks. I was hoping the 2 items could be done indepentently since it helps vhost-scsi immediately. TODO: - Stefano has 2 questions about security issues passing the pid back to userspace and if we should do a feature bit. We are waiting to hear back from the list. v2: - change loop that we take a refcount to the worker in - replaced pid == -1 with define. - fixed tabbing/spacing coding style issue - use hash instead of list to lookup workers. - I dropped the patch that added an ioctl cmd to get a vq's worker's pid. I saw we might do a generic netlink interface instead.
Mike Christie
2021-May-25 18:05 UTC
[PATCH 1/9] vhost: move worker thread fields to new struct
This is just a prep patch. It moves the worker related fields to a new vhost_worker struct and moves the code around to create some helpers that will be used in the next patches. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/vhost.c | 94 +++++++++++++++++++++++++++++-------------- drivers/vhost/vhost.h | 9 ++++- 2 files changed, 70 insertions(+), 33 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index b9e853e6094d..0cd19b1a832e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -263,8 +263,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(&work->node, &dev->work_list); - wake_up_process(dev->worker); + llist_add(&work->node, &dev->worker->work_list); + wake_up_process(dev->worker->task); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -272,7 +272,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return !llist_empty(&dev->work_list); + return dev->worker && !llist_empty(&dev->worker->work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -343,7 +343,8 @@ static void vhost_vq_reset(struct vhost_dev *dev, static int vhost_worker(void *data) { - struct vhost_dev *dev = data; + struct vhost_worker *worker = data; + struct vhost_dev *dev = worker->dev; struct vhost_work *work, *work_next; struct llist_node *node; @@ -358,7 +359,7 @@ static int vhost_worker(void *data) break; } - node = llist_del_all(&dev->work_list); + node = llist_del_all(&worker->work_list); if (!node) schedule(); @@ -487,7 +488,6 @@ void vhost_dev_init(struct vhost_dev *dev, dev->byte_weight = byte_weight; dev->use_worker = use_worker; dev->msg_handler = msg_handler; - init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); @@ -579,10 +579,59 @@ static void vhost_detach_mm(struct vhost_dev *dev) dev->mm = NULL; } +static void vhost_worker_free(struct vhost_dev *dev) +{ + struct vhost_worker *worker = dev->worker; + + if (!worker) + return; + + dev->worker = NULL; + WARN_ON(!llist_empty(&worker->work_list)); + kthread_stop(worker->task); + kfree(worker); +} + +static int vhost_worker_create(struct vhost_dev *dev) +{ + struct vhost_worker *worker; + struct task_struct *task; + int ret; + + worker = kzalloc(sizeof(*worker), GFP_KERNEL); + if (!worker) + return -ENOMEM; + + dev->worker = worker; + worker->dev = dev; + init_llist_head(&worker->work_list); + + task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); + if (IS_ERR(task)) { + ret = PTR_ERR(task); + goto free_worker; + } + + worker->task = task; + wake_up_process(task); /* avoid contributing to loadavg */ + + ret = vhost_attach_cgroups(dev); + if (ret) + goto stop_worker; + + return 0; + +stop_worker: + kthread_stop(worker->task); +free_worker: + kfree(worker); + dev->worker = NULL; + return ret; +} + /* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { - struct task_struct *worker; int err; /* Is there an owner already? */ @@ -595,31 +644,18 @@ long vhost_dev_set_owner(struct vhost_dev *dev) dev->kcov_handle = kcov_common_handle(); if (dev->use_worker) { - worker = kthread_create(vhost_worker, dev, - "vhost-%d", current->pid); - if (IS_ERR(worker)) { - err = PTR_ERR(worker); - goto err_worker; - } - - dev->worker = worker; - wake_up_process(worker); /* avoid contributing to loadavg */ - - err = vhost_attach_cgroups(dev); + err = vhost_worker_create(dev); if (err) - goto err_cgroup; + goto err_worker; } err = vhost_dev_alloc_iovecs(dev); if (err) - goto err_cgroup; + goto err_iovecs; return 0; -err_cgroup: - if (dev->worker) { - kthread_stop(dev->worker); - dev->worker = NULL; - } +err_iovecs: + vhost_worker_free(dev); err_worker: vhost_detach_mm(dev); dev->kcov_handle = 0; @@ -712,13 +748,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) dev->iotlb = NULL; vhost_clear_msg(dev); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM); - WARN_ON(!llist_empty(&dev->work_list)); - if (dev->worker) { - kthread_stop(dev->worker); - dev->worker = NULL; - dev->kcov_handle = 0; - } + vhost_worker_free(dev); vhost_detach_mm(dev); + dev->kcov_handle = 0; } EXPORT_SYMBOL_GPL(vhost_dev_cleanup); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 7d5306d1229d..bfc4563e612f 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -25,6 +25,12 @@ struct vhost_work { unsigned long flags; }; +struct vhost_worker { + struct task_struct *task; + struct llist_head work_list; + struct vhost_dev *dev; +}; + /* Poll a file (eventfd or socket) */ /* Note: there's nothing vhost specific about this structure. */ struct vhost_poll { @@ -149,8 +155,7 @@ struct vhost_dev { struct vhost_virtqueue **vqs; int nvqs; struct eventfd_ctx *log_ctx; - struct llist_head work_list; - struct task_struct *worker; + struct vhost_worker *worker; struct vhost_iotlb *umem; struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 2/9] vhost: move vhost worker creation to kick setup
The next patch will add new ioctls that allows userspace to create workers and bind them to devs and vqs after VHOST_SET_OWNER. To support older tools, newer tools that want to go wild with worker threads, and newer tools that want the old/default behavior this patch moves the default worker setup to the kick setup. After the first vq's kick/poll setup is done we could start to get works queued, so this is the point when we must have a worker setup. If we are using older tools or the newer tools just want the default single vhost thread per dev behavior then it will automatically be done here. If the tools are using the newer ioctls that have already created the workers then we also detect that here and do nothing. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/vhost.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 0cd19b1a832e..a291cde95c43 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -629,6 +629,15 @@ static int vhost_worker_create(struct vhost_dev *dev) return ret; } +/* Caller must have device mutex */ +static int vhost_worker_try_create_def(struct vhost_dev *dev) +{ + if (!dev->use_worker || dev->worker) + return 0; + + return vhost_worker_create(dev); +} + /* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { @@ -643,11 +652,6 @@ long vhost_dev_set_owner(struct vhost_dev *dev) vhost_attach_mm(dev); dev->kcov_handle = kcov_common_handle(); - if (dev->use_worker) { - err = vhost_worker_create(dev); - if (err) - goto err_worker; - } err = vhost_dev_alloc_iovecs(dev); if (err) @@ -655,8 +659,6 @@ long vhost_dev_set_owner(struct vhost_dev *dev) return 0; err_iovecs: - vhost_worker_free(dev); -err_worker: vhost_detach_mm(dev); dev->kcov_handle = 0; err_mm: @@ -1665,6 +1667,13 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg r = -EFAULT; break; } + + if (f.fd != VHOST_FILE_UNBIND) { + r = vhost_worker_try_create_def(d); + if (r) + break; + } + eventfp = f.fd == VHOST_FILE_UNBIND ? NULL : eventfd_fget(f.fd); if (IS_ERR(eventfp)) { r = PTR_ERR(eventfp); -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 3/9] vhost: modify internal functions to take a vhost_worker
The final patches will allow us to have multiple vhost_workers per device and be able to share them across devices. To prepare for that, this patch allow us our internal work queueing, flush and cgroup attach functions to take a vhost_worker as an arg. The poll code required a change to the drivers, so the next patch will convert that code. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/vhost.c | 136 ++++++++++++++++++++++++++++-------------- drivers/vhost/vhost.h | 4 +- 2 files changed, 94 insertions(+), 46 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a291cde95c43..4bfa9a7a51bb 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -231,20 +231,6 @@ void vhost_poll_stop(struct vhost_poll *poll) } EXPORT_SYMBOL_GPL(vhost_poll_stop); -void vhost_work_dev_flush(struct vhost_dev *dev) -{ - struct vhost_flush_struct flush; - - if (dev->worker) { - init_completion(&flush.wait_event); - vhost_work_init(&flush.work, vhost_flush_work); - - vhost_work_queue(dev, &flush.work); - wait_for_completion(&flush.wait_event); - } -} -EXPORT_SYMBOL_GPL(vhost_work_dev_flush); - /* Flush any work that has been scheduled. When calling this, don't hold any * locks that are also used by the callback. */ void vhost_poll_flush(struct vhost_poll *poll) @@ -253,26 +239,62 @@ void vhost_poll_flush(struct vhost_poll *poll) } EXPORT_SYMBOL_GPL(vhost_poll_flush); -void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) +static void vhost_work_queue_on(struct vhost_work *work, + struct vhost_worker *worker) { - if (!dev->worker) - return; - if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { /* We can only add the work to the list after we're * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(&work->node, &dev->worker->work_list); - wake_up_process(dev->worker->task); + llist_add(&work->node, &worker->work_list); + wake_up_process(worker->task); } } + +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) +{ + if (!dev->workers) + return; + /* + * devs with use_worker=true created by tools that do not support the + * worker creation ioctl will always have at least one worker. + */ + vhost_work_queue_on(work, dev->workers[0]); +} EXPORT_SYMBOL_GPL(vhost_work_queue); +static void vhost_work_dev_flush_on(struct vhost_worker *worker) +{ + struct vhost_flush_struct flush; + + init_completion(&flush.wait_event); + vhost_work_init(&flush.work, vhost_flush_work); + + vhost_work_queue_on(&flush.work, worker); + wait_for_completion(&flush.wait_event); +} + +void vhost_work_dev_flush(struct vhost_dev *dev) +{ + int i; + + for (i = 0; i < dev->num_workers; i++) + vhost_work_dev_flush_on(dev->workers[i]); +} +EXPORT_SYMBOL_GPL(vhost_work_dev_flush); + /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return dev->worker && !llist_empty(&dev->worker->work_list); + int i; + + for (i = 0; i < dev->num_workers; i++) { + if (!llist_empty(&dev->workers[i]->work_list)) + return true; + } + + return false; } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -482,7 +504,8 @@ void vhost_dev_init(struct vhost_dev *dev, dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; - dev->worker = NULL; + dev->workers = NULL; + dev->num_workers = 0; dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -531,14 +554,14 @@ static void vhost_attach_cgroups_work(struct vhost_work *work) s->ret = cgroup_attach_task_all(s->owner, current); } -static int vhost_attach_cgroups(struct vhost_dev *dev) +static int vhost_attach_cgroups_on(struct vhost_worker *worker) { struct vhost_attach_cgroups_struct attach; attach.owner = current; vhost_work_init(&attach.work, vhost_attach_cgroups_work); - vhost_work_queue(dev, &attach.work); - vhost_work_dev_flush(dev); + vhost_work_queue_on(&attach.work, worker); + vhost_work_dev_flush_on(worker); return attach.ret; } @@ -579,20 +602,29 @@ static void vhost_detach_mm(struct vhost_dev *dev) dev->mm = NULL; } -static void vhost_worker_free(struct vhost_dev *dev) +static void vhost_worker_free(struct vhost_worker *worker) { - struct vhost_worker *worker = dev->worker; - - if (!worker) - return; - - dev->worker = NULL; WARN_ON(!llist_empty(&worker->work_list)); kthread_stop(worker->task); kfree(worker); } -static int vhost_worker_create(struct vhost_dev *dev) +static void vhost_workers_free(struct vhost_dev *dev) +{ + int i; + + if (!dev->workers) + return; + + for (i = 0; i < dev->num_workers; i++) + vhost_worker_free(dev->workers[i]); + + kfree(dev->workers); + dev->num_workers = 0; + dev->workers = NULL; +} + +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) { struct vhost_worker *worker; struct task_struct *task; @@ -600,42 +632,56 @@ static int vhost_worker_create(struct vhost_dev *dev) worker = kzalloc(sizeof(*worker), GFP_KERNEL); if (!worker) - return -ENOMEM; + return NULL; - dev->worker = worker; + worker->id = dev->num_workers; worker->dev = dev; init_llist_head(&worker->work_list); task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); - if (IS_ERR(task)) { - ret = PTR_ERR(task); + if (IS_ERR(task)) goto free_worker; - } worker->task = task; wake_up_process(task); /* avoid contributing to loadavg */ - ret = vhost_attach_cgroups(dev); + ret = vhost_attach_cgroups_on(worker); if (ret) goto stop_worker; - return 0; + return worker; stop_worker: kthread_stop(worker->task); free_worker: kfree(worker); - dev->worker = NULL; - return ret; + return NULL; } /* Caller must have device mutex */ static int vhost_worker_try_create_def(struct vhost_dev *dev) { - if (!dev->use_worker || dev->worker) + struct vhost_worker *worker; + + if (!dev->use_worker || dev->workers) return 0; - return vhost_worker_create(dev); + dev->workers = kcalloc(1, sizeof(struct vhost_worker *), GFP_KERNEL); + if (!dev->workers) + return -ENOMEM; + + worker = vhost_worker_create(dev); + if (!worker) + goto free_workers; + + dev->workers[worker->id] = worker; + dev->num_workers++; + return 0; + +free_workers: + kfree(dev->workers); + dev->workers = NULL; + return -ENOMEM; } /* Caller should have device mutex */ @@ -750,7 +796,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev) dev->iotlb = NULL; vhost_clear_msg(dev); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM); - vhost_worker_free(dev); + vhost_workers_free(dev); vhost_detach_mm(dev); dev->kcov_handle = 0; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index bfc4563e612f..fa1af4106755 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -29,6 +29,7 @@ struct vhost_worker { struct task_struct *task; struct llist_head work_list; struct vhost_dev *dev; + int id; }; /* Poll a file (eventfd or socket) */ @@ -155,7 +156,8 @@ struct vhost_dev { struct vhost_virtqueue **vqs; int nvqs; struct eventfd_ctx *log_ctx; - struct vhost_worker *worker; + struct vhost_worker **workers; + int num_workers; struct vhost_iotlb *umem; struct vhost_iotlb *iotlb; spinlock_t iotlb_lock; -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 4/9] vhost: allow vhost_polls to use different vhost_workers
This allows vhost_polls to use the worker the vq the poll is associated with. This also exports a helper vhost_vq_work_queue which is used here internally, and will be used in the vhost-scsi patches. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/net.c | 6 ++++-- drivers/vhost/vhost.c | 19 ++++++++++++++++--- drivers/vhost/vhost.h | 6 +++++- 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index df82b124170e..948bc3d361ab 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1334,8 +1334,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true, NULL); - vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); - vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev); + vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev, + vqs[VHOST_NET_VQ_TX]); + vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev, + vqs[VHOST_NET_VQ_RX]); f->private_data = n; n->page_frag.page = NULL; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 4bfa9a7a51bb..3cc1196d465b 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -187,13 +187,15 @@ EXPORT_SYMBOL_GPL(vhost_work_init); /* Init poll structure */ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, - __poll_t mask, struct vhost_dev *dev) + __poll_t mask, struct vhost_dev *dev, + struct vhost_virtqueue *vq) { init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup); init_poll_funcptr(&poll->table, vhost_poll_func); poll->mask = mask; poll->dev = dev; poll->wqh = NULL; + poll->vq = vq; vhost_work_init(&poll->work, fn); } @@ -298,9 +300,15 @@ bool vhost_has_work(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_has_work); +void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work) +{ + vhost_work_queue_on(work, vq->worker); +} +EXPORT_SYMBOL_GPL(vhost_vq_work_queue); + void vhost_poll_queue(struct vhost_poll *poll) { - vhost_work_queue(poll->dev, &poll->work); + vhost_vq_work_queue(poll->vq, &poll->work); } EXPORT_SYMBOL_GPL(vhost_poll_queue); @@ -359,6 +367,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->busyloop_timeout = 0; vq->umem = NULL; vq->iotlb = NULL; + vq->worker = NULL; vhost_vring_call_reset(&vq->call_ctx); __vhost_vq_meta_reset(vq); } @@ -527,7 +536,7 @@ void vhost_dev_init(struct vhost_dev *dev, vhost_vq_reset(dev, vq); if (vq->handle_kick) vhost_poll_init(&vq->poll, vq->handle_kick, - EPOLLIN, dev); + EPOLLIN, dev, vq); } } EXPORT_SYMBOL_GPL(vhost_dev_init); @@ -662,6 +671,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) static int vhost_worker_try_create_def(struct vhost_dev *dev) { struct vhost_worker *worker; + int i; if (!dev->use_worker || dev->workers) return 0; @@ -674,6 +684,9 @@ static int vhost_worker_try_create_def(struct vhost_dev *dev) if (!worker) goto free_workers; + for (i = 0; i < dev->nvqs; i++) + dev->vqs[i]->worker = worker; + dev->workers[worker->id] = worker; dev->num_workers++; return 0; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index fa1af4106755..6880e7a29872 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -41,14 +41,17 @@ struct vhost_poll { struct vhost_work work; __poll_t mask; struct vhost_dev *dev; + struct vhost_virtqueue *vq; }; void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work); bool vhost_has_work(struct vhost_dev *dev); +void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work); void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, - __poll_t mask, struct vhost_dev *dev); + __poll_t mask, struct vhost_dev *dev, + struct vhost_virtqueue *vq); int vhost_poll_start(struct vhost_poll *poll, struct file *file); void vhost_poll_stop(struct vhost_poll *poll); void vhost_poll_flush(struct vhost_poll *poll); @@ -76,6 +79,7 @@ struct vhost_vring_call { /* The virtqueue structure describes a queue attached to a device. */ struct vhost_virtqueue { struct vhost_dev *dev; + struct vhost_worker *worker; /* The actual ring of buffers. */ struct mutex mutex; -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 5/9] vhost-scsi: flush IO vqs then send TMF rsp
With one worker we will always send the scsi cmd responses then send the TMF rsp, because LIO will always complete the scsi cmds first which calls vhost_scsi_release_cmd to add them to the work queue. When the next patches adds multiple worker support, the worker threads could still be sending their responses when the tmf's work is run. So this patch has vhost-scsi flush the IO vqs on other worker threads before we send the tmf response. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/scsi.c | 17 ++++++++++++++--- drivers/vhost/vhost.c | 6 ++++++ drivers/vhost/vhost.h | 1 + 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 46f897e41217..e585f2180457 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1168,12 +1168,23 @@ static void vhost_scsi_tmf_resp_work(struct vhost_work *work) { struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf, vwork); - int resp_code; + int resp_code, i; + + if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE) { + /* + * When processing a TMF lio completes the cmds then the TMF, + * so with one worker the TMF always completes after cmds. For + * multiple worker support we must flush the IO vqs to make + * sure if they have differrent workers then the cmds have + * completed before we send the TMF response. + */ + for (i = 1; i < tmf->vhost->dev.num_workers; i++) + vhost_vq_work_flush(&tmf->vhost->vqs[i + VHOST_SCSI_VQ_IO].vq); - if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE) resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED; - else + } else { resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED; + } vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs, tmf->vq_desc, &tmf->resp_iov, resp_code); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 3cc1196d465b..345ade0af133 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -300,6 +300,12 @@ bool vhost_has_work(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_has_work); +void vhost_vq_work_flush(struct vhost_virtqueue *vq) +{ + vhost_work_dev_flush_on(vq->worker); +} +EXPORT_SYMBOL_GPL(vhost_vq_work_flush); + void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work) { vhost_work_queue_on(work, vq->worker); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 6880e7a29872..0a252dd45101 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -47,6 +47,7 @@ struct vhost_poll { void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work); bool vhost_has_work(struct vhost_dev *dev); +void vhost_vq_work_flush(struct vhost_virtqueue *vq); void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work); void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 6/9] vhost-scsi: make SCSI cmd completion per vq
This patch separates the scsi cmd completion code paths so we can complete cmds based on their vq instead of having all cmds complete on the same worker/CPU. This will be useful with the next patch that allows us to create mulitple worker threads and bind them to different vqs, so we can have completions running on different threads/CPUs. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/scsi.c | 48 +++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 23 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index e585f2180457..b607bff41074 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -176,6 +176,7 @@ enum { struct vhost_scsi_virtqueue { struct vhost_virtqueue vq; + struct vhost_scsi *vs; /* * Reference counting for inflight reqs, used for flush operation. At * each time, one reference tracks new commands submitted, while we @@ -190,6 +191,9 @@ struct vhost_scsi_virtqueue { struct vhost_scsi_cmd *scsi_cmds; struct sbitmap scsi_tags; int max_cmds; + + struct vhost_work completion_work; + struct llist_head completion_list; }; struct vhost_scsi { @@ -200,9 +204,6 @@ struct vhost_scsi { struct vhost_dev dev; struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ]; - struct vhost_work vs_completion_work; /* cmd completion work item */ - struct llist_head vs_completion_list; /* cmd completion queue */ - struct vhost_work vs_event_work; /* evt injection work item */ struct llist_head vs_event_list; /* evt injection queue */ @@ -377,10 +378,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) } else { struct vhost_scsi_cmd *cmd = container_of(se_cmd, struct vhost_scsi_cmd, tvc_se_cmd); - struct vhost_scsi *vs = cmd->tvc_vhost; + struct vhost_scsi_virtqueue *svq = container_of(cmd->tvc_vq, + struct vhost_scsi_virtqueue, vq); - llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list); - vhost_work_queue(&vs->dev, &vs->vs_completion_work); + llist_add(&cmd->tvc_completion_list, &svq->completion_list); + vhost_vq_work_queue(&svq->vq, &svq->completion_work); } } @@ -543,18 +545,17 @@ static void vhost_scsi_evt_work(struct vhost_work *work) */ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) { - struct vhost_scsi *vs = container_of(work, struct vhost_scsi, - vs_completion_work); - DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ); + struct vhost_scsi_virtqueue *svq = container_of(work, + struct vhost_scsi_virtqueue, completion_work); struct virtio_scsi_cmd_resp v_rsp; struct vhost_scsi_cmd *cmd, *t; struct llist_node *llnode; struct se_cmd *se_cmd; struct iov_iter iov_iter; - int ret, vq; + bool signal = false; + int ret; - bitmap_zero(signal, VHOST_SCSI_MAX_VQ); - llnode = llist_del_all(&vs->vs_completion_list); + llnode = llist_del_all(&svq->completion_list); llist_for_each_entry_safe(cmd, t, llnode, tvc_completion_list) { se_cmd = &cmd->tvc_se_cmd; @@ -574,21 +575,16 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) cmd->tvc_in_iovs, sizeof(v_rsp)); ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter); if (likely(ret == sizeof(v_rsp))) { - struct vhost_scsi_virtqueue *q; + signal = true; vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0); - q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq); - vq = q - vs->vqs; - __set_bit(vq, signal); } else pr_err("Faulted on virtio_scsi_cmd_resp\n"); vhost_scsi_release_cmd_res(se_cmd); } - vq = -1; - while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1)) - < VHOST_SCSI_MAX_VQ) - vhost_signal(&vs->dev, &vs->vqs[vq].vq); + if (signal) + vhost_signal(&svq->vs->dev, &svq->vq); } static struct vhost_scsi_cmd * @@ -1799,6 +1795,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features) static int vhost_scsi_open(struct inode *inode, struct file *f) { + struct vhost_scsi_virtqueue *svq; struct vhost_scsi *vs; struct vhost_virtqueue **vqs; int r = -ENOMEM, i; @@ -1811,7 +1808,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) if (!vqs) goto err_vqs; - vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work); vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work); vs->vs_events_nr = 0; @@ -1822,8 +1818,14 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick; vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick; for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) { - vqs[i] = &vs->vqs[i].vq; - vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; + svq = &vs->vqs[i]; + + vqs[i] = &svq->vq; + svq->vs = vs; + init_llist_head(&svq->completion_list); + vhost_work_init(&svq->completion_work, + vhost_scsi_complete_cmd_work); + svq->vq.handle_kick = vhost_scsi_handle_kick; } vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL); -- 2.25.1
This patch allows userspace to create workers and bind them to vqs, so you can have N workers per dev and also share N workers with M vqs. The next patch will allow sharing across devices. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/vhost.c | 94 +++++++++++++++++++++++++++++++- drivers/vhost/vhost.h | 3 + include/uapi/linux/vhost.h | 6 ++ include/uapi/linux/vhost_types.h | 12 ++++ 4 files changed, 113 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 345ade0af133..981e9bac7a31 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -30,6 +30,7 @@ #include <linux/interval_tree_generic.h> #include <linux/nospec.h> #include <linux/kcov.h> +#include <linux/hashtable.h> #include "vhost.h" @@ -42,6 +43,9 @@ module_param(max_iotlb_entries, int, 0444); MODULE_PARM_DESC(max_iotlb_entries, "Maximum number of iotlb entries. (default: 2048)"); +static DEFINE_HASHTABLE(vhost_workers, 5); +static DEFINE_SPINLOCK(vhost_workers_lock); + enum { VHOST_MEMORY_F_LOG = 0x1, }; @@ -617,8 +621,17 @@ static void vhost_detach_mm(struct vhost_dev *dev) dev->mm = NULL; } -static void vhost_worker_free(struct vhost_worker *worker) +static void vhost_worker_put(struct vhost_worker *worker) { + spin_lock(&vhost_workers_lock); + if (!refcount_dec_and_test(&worker->refcount)) { + spin_unlock(&vhost_workers_lock); + return; + } + + hash_del(&worker->h_node); + spin_unlock(&vhost_workers_lock); + WARN_ON(!llist_empty(&worker->work_list)); kthread_stop(worker->task); kfree(worker); @@ -632,7 +645,7 @@ static void vhost_workers_free(struct vhost_dev *dev) return; for (i = 0; i < dev->num_workers; i++) - vhost_worker_free(dev->workers[i]); + vhost_worker_put(dev->workers[i]); kfree(dev->workers); dev->num_workers = 0; @@ -652,6 +665,8 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) worker->id = dev->num_workers; worker->dev = dev; init_llist_head(&worker->work_list); + INIT_HLIST_NODE(&worker->h_node); + refcount_set(&worker->refcount, 1); task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); if (IS_ERR(task)) @@ -664,6 +679,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) if (ret) goto stop_worker; + spin_lock(&vhost_workers_lock); + hash_add(vhost_workers, &worker->h_node, worker->task->pid); + spin_unlock(&vhost_workers_lock); return worker; stop_worker: @@ -673,6 +691,67 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) return NULL; } +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid) +{ + struct vhost_worker *worker, *found_worker = NULL; + + spin_lock(&vhost_workers_lock); + hash_for_each_possible(vhost_workers, worker, h_node, pid) { + if (worker->task->pid == pid) { + /* tmp - next patch allows sharing across devs */ + if (worker->dev != dev) + break; + + found_worker = worker; + refcount_inc(&worker->refcount); + break; + } + } + spin_unlock(&vhost_workers_lock); + return found_worker; +} + +/* Caller must have device mutex */ +static int vhost_vq_set_worker(struct vhost_virtqueue *vq, + struct vhost_vring_worker *info) +{ + struct vhost_dev *dev = vq->dev; + struct vhost_worker *worker; + + if (vq->worker) { + /* TODO - support changing while works are running */ + return -EBUSY; + } + + if (info->pid == VHOST_VRING_NEW_WORKER) { + worker = vhost_worker_create(dev); + if (!worker) + return -ENOMEM; + + info->pid = worker->task->pid; + } else { + worker = vhost_worker_find(dev, info->pid); + if (!worker) + return -ENODEV; + } + + if (!dev->workers) { + dev->workers = kcalloc(vq->dev->nvqs, + sizeof(struct vhost_worker *), + GFP_KERNEL); + if (!dev->workers) { + vhost_worker_put(worker); + return -ENOMEM; + } + } + + vq->worker = worker; + + dev->workers[dev->num_workers] = worker; + dev->num_workers++; + return 0; +} + /* Caller must have device mutex */ static int vhost_worker_try_create_def(struct vhost_dev *dev) { @@ -1680,6 +1759,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg struct eventfd_ctx *ctx = NULL; u32 __user *idxp = argp; struct vhost_virtqueue *vq; + struct vhost_vring_worker w; struct vhost_vring_state s; struct vhost_vring_file f; u32 idx; @@ -1794,6 +1874,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg if (copy_to_user(argp, &s, sizeof(s))) r = -EFAULT; break; + case VHOST_SET_VRING_WORKER: + if (copy_from_user(&w, argp, sizeof(w))) { + r = -EFAULT; + break; + } + r = vhost_vq_set_worker(vq, &w); + if (!r && copy_to_user(argp, &w, sizeof(w))) + r = -EFAULT; + break; default: r = -ENOIOCTLCMD; } @@ -2726,6 +2815,7 @@ EXPORT_SYMBOL_GPL(vhost_set_backend_features); static int __init vhost_init(void) { + hash_init(vhost_workers); return 0; } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0a252dd45101..75b884ad1f17 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -14,6 +14,7 @@ #include <linux/atomic.h> #include <linux/vhost_iotlb.h> #include <linux/irqbypass.h> +#include <linux/refcount.h> struct vhost_work; typedef void (*vhost_work_fn_t)(struct vhost_work *work); @@ -28,6 +29,8 @@ struct vhost_work { struct vhost_worker { struct task_struct *task; struct llist_head work_list; + struct hlist_node h_node; + refcount_t refcount; struct vhost_dev *dev; int id; }; diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index c998860d7bbc..ce32119cb139 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -70,6 +70,12 @@ #define VHOST_VRING_BIG_ENDIAN 1 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) +/* Create/bind a vhost worker to a virtqueue. If pid > 0 and matches an existing + * vhost_worker thread it will be bound to the vq. If pid is + * VHOST_VRING_NEW_WORKER, then a new worker will be created and bound to the + * vq. + */ +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker) /* The following ioctls use eventfd file descriptors to signal and poll * for events. */ diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index f7f6a3a28977..5113baa8bc3e 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -47,6 +47,18 @@ struct vhost_vring_addr { __u64 log_guest_addr; }; +#define VHOST_VRING_NEW_WORKER -1 + +struct vhost_vring_worker { + unsigned int index; + /* + * The pid of the vhost worker that the vq will be bound to. If + * pid is VHOST_VRING_NEW_WORKER a new worker will be created and it's + * pid will be returned in pid. + */ + __kernel_pid_t pid; +}; + /* no alignment requirement */ struct vhost_iotlb_msg { __u64 iova; -- 2.25.1
Mike Christie
2021-May-25 18:05 UTC
[PATCH 8/9] vhost: add vhost_dev pointer to vhost_work
The next patch allows a vhost_worker to handle different devices. To prepare for that, this patch adds a pointer to the device on the work so we can get to the different mms in the vhost_worker thread. Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/scsi.c | 7 ++++--- drivers/vhost/vhost.c | 24 ++++++++++++++---------- drivers/vhost/vhost.h | 4 +++- drivers/vhost/vsock.c | 3 ++- 4 files changed, 23 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index b607bff41074..073b20bca257 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1808,7 +1808,7 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) if (!vqs) goto err_vqs; - vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work); + vhost_work_init(&vs->dev, &vs->vs_event_work, vhost_scsi_evt_work); vs->vs_events_nr = 0; vs->vs_events_missed = false; @@ -1823,7 +1823,7 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vqs[i] = &svq->vq; svq->vs = vs; init_llist_head(&svq->completion_list); - vhost_work_init(&svq->completion_work, + vhost_work_init(&vs->dev, &svq->completion_work, vhost_scsi_complete_cmd_work); svq->vq.handle_kick = vhost_scsi_handle_kick; } @@ -2017,7 +2017,8 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg, if (!tmf) return -ENOMEM; INIT_LIST_HEAD(&tmf->queue_entry); - vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work); + vhost_work_init(&tpg->vhost_scsi->dev, &tmf->vwork, + vhost_scsi_tmf_resp_work); mutex_lock(&vhost_scsi_mutex); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 981e9bac7a31..eb16eb2bbee0 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -182,10 +182,12 @@ static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, return 0; } -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn) +void vhost_work_init(struct vhost_dev *dev, struct vhost_work *work, + vhost_work_fn_t fn) { clear_bit(VHOST_WORK_QUEUED, &work->flags); work->fn = fn; + work->dev = dev; } EXPORT_SYMBOL_GPL(vhost_work_init); @@ -201,7 +203,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, poll->wqh = NULL; poll->vq = vq; - vhost_work_init(&poll->work, fn); + vhost_work_init(dev, &poll->work, fn); } EXPORT_SYMBOL_GPL(vhost_poll_init); @@ -270,12 +272,13 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) } EXPORT_SYMBOL_GPL(vhost_work_queue); -static void vhost_work_dev_flush_on(struct vhost_worker *worker) +static void vhost_work_dev_flush_on(struct vhost_dev *dev, + struct vhost_worker *worker) { struct vhost_flush_struct flush; init_completion(&flush.wait_event); - vhost_work_init(&flush.work, vhost_flush_work); + vhost_work_init(dev, &flush.work, vhost_flush_work); vhost_work_queue_on(&flush.work, worker); wait_for_completion(&flush.wait_event); @@ -286,7 +289,7 @@ void vhost_work_dev_flush(struct vhost_dev *dev) int i; for (i = 0; i < dev->num_workers; i++) - vhost_work_dev_flush_on(dev->workers[i]); + vhost_work_dev_flush_on(dev, dev->workers[i]); } EXPORT_SYMBOL_GPL(vhost_work_dev_flush); @@ -306,7 +309,7 @@ EXPORT_SYMBOL_GPL(vhost_has_work); void vhost_vq_work_flush(struct vhost_virtqueue *vq) { - vhost_work_dev_flush_on(vq->worker); + vhost_work_dev_flush_on(vq->dev, vq->worker); } EXPORT_SYMBOL_GPL(vhost_vq_work_flush); @@ -573,14 +576,15 @@ static void vhost_attach_cgroups_work(struct vhost_work *work) s->ret = cgroup_attach_task_all(s->owner, current); } -static int vhost_attach_cgroups_on(struct vhost_worker *worker) +static int vhost_attach_cgroups_on(struct vhost_dev *dev, + struct vhost_worker *worker) { struct vhost_attach_cgroups_struct attach; attach.owner = current; - vhost_work_init(&attach.work, vhost_attach_cgroups_work); + vhost_work_init(dev, &attach.work, vhost_attach_cgroups_work); vhost_work_queue_on(&attach.work, worker); - vhost_work_dev_flush_on(worker); + vhost_work_dev_flush_on(dev, worker); return attach.ret; } @@ -675,7 +679,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) worker->task = task; wake_up_process(task); /* avoid contributing to loadavg */ - ret = vhost_attach_cgroups_on(worker); + ret = vhost_attach_cgroups_on(dev, worker); if (ret) goto stop_worker; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 75b884ad1f17..75ad3aa5adca 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -24,6 +24,7 @@ struct vhost_work { struct llist_node node; vhost_work_fn_t fn; unsigned long flags; + struct vhost_dev *dev; }; struct vhost_worker { @@ -47,7 +48,8 @@ struct vhost_poll { struct vhost_virtqueue *vq; }; -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn); +void vhost_work_init(struct vhost_dev *dev, struct vhost_work *work, + vhost_work_fn_t fn); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work); bool vhost_has_work(struct vhost_dev *dev); void vhost_vq_work_flush(struct vhost_virtqueue *vq); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index f954f4d29c95..302415b6460b 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -648,7 +648,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) file->private_data = vsock; spin_lock_init(&vsock->send_pkt_list_lock); INIT_LIST_HEAD(&vsock->send_pkt_list); - vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work); + vhost_work_init(&vsock->dev, &vsock->send_pkt_work, + vhost_transport_send_pkt_work); return 0; out: -- 2.25.1
Mike Christie
2021-May-25 18:06 UTC
[PATCH 9/9] vhost: support sharing workers across devs
This allows a worker to handle multiple device's vqs. TODO: - The worker is attached to the cgroup of the device that created it. In this patch you can share workers with devices with different owners which could be in different cgroups. Do we want to restict sharing workers with devices that have the same owner (dev->mm value)? Signed-off-by: Mike Christie <michael.christie at oracle.com> --- drivers/vhost/vhost.c | 16 +++++++--------- drivers/vhost/vhost.h | 1 - 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index eb16eb2bbee0..c32f72b1901c 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -388,12 +388,10 @@ static void vhost_vq_reset(struct vhost_dev *dev, static int vhost_worker(void *data) { struct vhost_worker *worker = data; - struct vhost_dev *dev = worker->dev; struct vhost_work *work, *work_next; + struct vhost_dev *dev; struct llist_node *node; - kthread_use_mm(dev->mm); - for (;;) { /* mb paired w/ kthread_stop */ set_current_state(TASK_INTERRUPTIBLE); @@ -412,15 +410,20 @@ static int vhost_worker(void *data) smp_wmb(); llist_for_each_entry_safe(work, work_next, node, node) { clear_bit(VHOST_WORK_QUEUED, &work->flags); + dev = work->dev; + + kthread_use_mm(dev->mm); + __set_current_state(TASK_RUNNING); kcov_remote_start_common(dev->kcov_handle); work->fn(work); kcov_remote_stop(); if (need_resched()) schedule(); + + kthread_unuse_mm(dev->mm); } } - kthread_unuse_mm(dev->mm); return 0; } @@ -667,7 +670,6 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) return NULL; worker->id = dev->num_workers; - worker->dev = dev; init_llist_head(&worker->work_list); INIT_HLIST_NODE(&worker->h_node); refcount_set(&worker->refcount, 1); @@ -702,10 +704,6 @@ static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid) spin_lock(&vhost_workers_lock); hash_for_each_possible(vhost_workers, worker, h_node, pid) { if (worker->task->pid == pid) { - /* tmp - next patch allows sharing across devs */ - if (worker->dev != dev) - break; - found_worker = worker; refcount_inc(&worker->refcount); break; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 75ad3aa5adca..40c400172a84 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -32,7 +32,6 @@ struct vhost_worker { struct llist_head work_list; struct hlist_node h_node; refcount_t refcount; - struct vhost_dev *dev; int id; }; -- 2.25.1
On Tue, May 25, 2021 at 01:05:51PM -0500, Mike Christie wrote:> Results: > -------- > When running with the null_blk driver and vhost-scsi I can get 1.2 > million IOPs by just running a simple > > fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio > --iodepth=128 --numjobs=8 --time_based --group_reporting --name=iops > --runtime=60 --eta-newline=1 > > The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of > 1024 cmds per devices. To get 1.2 million IOPs I did have to tune and > ran the virsh emulatorpin command so the vhost threads were running > on different CPUs than the VM. If the vhost threads share CPUs then I > get around 800K. > > For a more real device that are also CPU hogs like iscsi, I can still > get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths > (natively it gets 1.1 million IOPs).There is no comparison against a baseline, but I guess it would be the same 8 vCPU guest with single queue vhost-scsi?> > Results/TODO Note: > > - I ported the vdpa sim code to support multiple workers and as-is now > it made perf much worse. If I increase vdpa_sim_blk's num queues to 4-8 > I get 700K IOPs with the fio command above. However with the multiple > worker support it drops to 400K. The problem is the vdpa_sim lock > and the iommu_lock. If I hack (like comment out locks or not worry about > data corruption or crashes) then I can get around 1.2M - 1.6M IOPs with > 8 queues and fio command above. > > So these patches could help other drivers, but it will just take more > work to remove those types of locks. I was hoping the 2 items could be > done indepentently since it helps vhost-scsi immediately.Cool, thank you for taking a look. That's useful info for Stefano. vDPA and vhost can be handled independently though in the long term hopefully they can share code. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/312a410e/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 10:45 UTC
[PATCH 3/9] vhost: modify internal functions to take a vhost_worker
On Tue, May 25, 2021 at 01:05:54PM -0500, Mike Christie wrote:> -void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) > +static void vhost_work_queue_on(struct vhost_work *work, > + struct vhost_worker *worker) > { > - if (!dev->worker) > - return; > - > if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { > /* We can only add the work to the list after we're > * sure it was not in the list. > * test_and_set_bit() implies a memory barrier. > */ > - llist_add(&work->node, &dev->worker->work_list); > - wake_up_process(dev->worker->task); > + llist_add(&work->node, &worker->work_list); > + wake_up_process(worker->task); > } > } > + > +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)When should this function still be used? A doc comment contrasting it to vhost_work_queue_on() would be helpful. I would expect callers to switch to that instead of queuing work on dev->workers[0].> /* A lockless hint for busy polling code to exit the loop */ > bool vhost_has_work(struct vhost_dev *dev) > { > - return dev->worker && !llist_empty(&dev->worker->work_list); > + int i; > + > + for (i = 0; i < dev->num_workers; i++) { > + if (!llist_empty(&dev->workers[i]->work_list)) > + return true; > + } > + > + return false; > } > EXPORT_SYMBOL_GPL(vhost_has_work);It's probably not necessary to poll all workers: drivers/vhost/net.c calls vhost_has_work() to busy poll a specific virtqueue. If the vq:worker mapping is 1:1 or N:1 then vhost_has_work() should be extended to include the struct vhost_virtqueue so we can poll just that vq worker's work_list.> /* Caller must have device mutex */ > static int vhost_worker_try_create_def(struct vhost_dev *dev) > { > - if (!dev->use_worker || dev->worker) > + struct vhost_worker *worker; > + > + if (!dev->use_worker || dev->workers) > return 0; > > - return vhost_worker_create(dev); > + dev->workers = kcalloc(1, sizeof(struct vhost_worker *), GFP_KERNEL);GFP_KERNEL_ACCOUNT so that vhost memory associated with a process (the one that invoked the ioctl) is accounted? This may get trickier if the workers are shared between processes. The same applies for struct vhost_worker in vhost_worker_create(). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/3aec093c/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 13:51 UTC
[PATCH 4/9] vhost: allow vhost_polls to use different vhost_workers
On Tue, May 25, 2021 at 01:05:55PM -0500, Mike Christie wrote:> This allows vhost_polls to use the worker the vq the poll is associated > with. > > This also exports a helper vhost_vq_work_queue which is used here > internally, and will be used in the vhost-scsi patches. > > Signed-off-by: Mike Christie <michael.christie at oracle.com> > --- > drivers/vhost/net.c | 6 ++++-- > drivers/vhost/vhost.c | 19 ++++++++++++++++--- > drivers/vhost/vhost.h | 6 +++++- > 3 files changed, 25 insertions(+), 6 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/9ac03590/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 13:54 UTC
[PATCH 5/9] vhost-scsi: flush IO vqs then send TMF rsp
On Tue, May 25, 2021 at 01:05:56PM -0500, Mike Christie wrote:> With one worker we will always send the scsi cmd responses then send the > TMF rsp, because LIO will always complete the scsi cmds first which > calls vhost_scsi_release_cmd to add them to the work queue. > > When the next patches adds multiple worker support, the worker threads > could still be sending their responses when the tmf's work is run. > So this patch has vhost-scsi flush the IO vqs on other worker threads > before we send the tmf response. > > Signed-off-by: Mike Christie <michael.christie at oracle.com> > --- > drivers/vhost/scsi.c | 17 ++++++++++++++--- > drivers/vhost/vhost.c | 6 ++++++ > drivers/vhost/vhost.h | 1 + > 3 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > index 46f897e41217..e585f2180457 100644 > --- a/drivers/vhost/scsi.c > +++ b/drivers/vhost/scsi.c > @@ -1168,12 +1168,23 @@ static void vhost_scsi_tmf_resp_work(struct vhost_work *work) > { > struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf, > vwork); > - int resp_code; > + int resp_code, i; > + > + if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE) { > + /* > + * When processing a TMF lio completes the cmds then the TMF, > + * so with one worker the TMF always completes after cmds. For > + * multiple worker support we must flush the IO vqs to make > + * sure if they have differrent workers then the cmds haves/differrent/different/ Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/73d20517/attachment-0001.sig>
Stefan Hajnoczi
2021-Jun-03 13:57 UTC
[PATCH 6/9] vhost-scsi: make SCSI cmd completion per vq
On Tue, May 25, 2021 at 01:05:57PM -0500, Mike Christie wrote:> This patch separates the scsi cmd completion code paths so we can complete > cmds based on their vq instead of having all cmds complete on the same > worker/CPU. This will be useful with the next patch that allows us to > create mulitple worker threads and bind them to different vqs, so we can > have completions running on different threads/CPUs. > > Signed-off-by: Mike Christie <michael.christie at oracle.com> > --- > drivers/vhost/scsi.c | 48 +++++++++++++++++++++++--------------------- > 1 file changed, 25 insertions(+), 23 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/e51a2718/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 14:30 UTC
[PATCH 7/9] vhost: allow userspace to create workers
On Tue, May 25, 2021 at 01:05:58PM -0500, Mike Christie wrote:> This patch allows userspace to create workers and bind them to vqs, so you > can have N workers per dev and also share N workers with M vqs. The next > patch will allow sharing across devices. > > Signed-off-by: Mike Christie <michael.christie at oracle.com> > --- > drivers/vhost/vhost.c | 94 +++++++++++++++++++++++++++++++- > drivers/vhost/vhost.h | 3 + > include/uapi/linux/vhost.h | 6 ++ > include/uapi/linux/vhost_types.h | 12 ++++ > 4 files changed, 113 insertions(+), 2 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 345ade0af133..981e9bac7a31 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -30,6 +30,7 @@ > #include <linux/interval_tree_generic.h> > #include <linux/nospec.h> > #include <linux/kcov.h> > +#include <linux/hashtable.h> > > #include "vhost.h" > > @@ -42,6 +43,9 @@ module_param(max_iotlb_entries, int, 0444); > MODULE_PARM_DESC(max_iotlb_entries, > "Maximum number of iotlb entries. (default: 2048)"); > > +static DEFINE_HASHTABLE(vhost_workers, 5); > +static DEFINE_SPINLOCK(vhost_workers_lock); > + > enum { > VHOST_MEMORY_F_LOG = 0x1, > }; > @@ -617,8 +621,17 @@ static void vhost_detach_mm(struct vhost_dev *dev) > dev->mm = NULL; > } > > -static void vhost_worker_free(struct vhost_worker *worker) > +static void vhost_worker_put(struct vhost_worker *worker) > { > + spin_lock(&vhost_workers_lock); > + if (!refcount_dec_and_test(&worker->refcount)) { > + spin_unlock(&vhost_workers_lock); > + return; > + } > + > + hash_del(&worker->h_node); > + spin_unlock(&vhost_workers_lock); > + > WARN_ON(!llist_empty(&worker->work_list)); > kthread_stop(worker->task); > kfree(worker); > @@ -632,7 +645,7 @@ static void vhost_workers_free(struct vhost_dev *dev) > return; > > for (i = 0; i < dev->num_workers; i++) > - vhost_worker_free(dev->workers[i]); > + vhost_worker_put(dev->workers[i]); > > kfree(dev->workers); > dev->num_workers = 0; > @@ -652,6 +665,8 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) > worker->id = dev->num_workers; > worker->dev = dev; > init_llist_head(&worker->work_list); > + INIT_HLIST_NODE(&worker->h_node); > + refcount_set(&worker->refcount, 1); > > task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); > if (IS_ERR(task)) > @@ -664,6 +679,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) > if (ret) > goto stop_worker; > > + spin_lock(&vhost_workers_lock); > + hash_add(vhost_workers, &worker->h_node, worker->task->pid); > + spin_unlock(&vhost_workers_lock); > return worker; > > stop_worker: > @@ -673,6 +691,67 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) > return NULL; > } > > +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid) > +{ > + struct vhost_worker *worker, *found_worker = NULL; > + > + spin_lock(&vhost_workers_lock); > + hash_for_each_possible(vhost_workers, worker, h_node, pid) { > + if (worker->task->pid == pid) { > + /* tmp - next patch allows sharing across devs */ > + if (worker->dev != dev) > + break; > + > + found_worker = worker; > + refcount_inc(&worker->refcount); > + break; > + } > + } > + spin_unlock(&vhost_workers_lock); > + return found_worker; > +} > + > +/* Caller must have device mutex */ > +static int vhost_vq_set_worker(struct vhost_virtqueue *vq, > + struct vhost_vring_worker *info) > +{ > + struct vhost_dev *dev = vq->dev; > + struct vhost_worker *worker; > + > + if (vq->worker) { > + /* TODO - support changing while works are running */ > + return -EBUSY; > + } > + > + if (info->pid == VHOST_VRING_NEW_WORKER) { > + worker = vhost_worker_create(dev);The maximum number of kthreads created is limited by vhost_dev_init(nvqs)? For example VHOST_SCSI_MAX_VQ 128. IIUC kthread_create is not limited by or accounted against the current task, so I'm a little worried that a process can create a lot of kthreads. I haven't investigated other kthread_create() users reachable from userspace applications to see how they bound the number of threads effectively. Any thoughts?> + if (!worker) > + return -ENOMEM; > + > + info->pid = worker->task->pid; > + } else { > + worker = vhost_worker_find(dev, info->pid); > + if (!worker) > + return -ENODEV; > + } > + > + if (!dev->workers) { > + dev->workers = kcalloc(vq->dev->nvqs, > + sizeof(struct vhost_worker *), > + GFP_KERNEL);Another candidate for GFP_KERNEL_ACCOUNT.> + if (!dev->workers) { > + vhost_worker_put(worker); > + return -ENOMEM; > + } > + } > + > + vq->worker = worker; > + > + dev->workers[dev->num_workers] = worker; > + dev->num_workers++;Hmm...should we really append to workers[] in the vhost_worker_find() case?> + return 0; > +} > + > /* Caller must have device mutex */ > static int vhost_worker_try_create_def(struct vhost_dev *dev) > { > @@ -1680,6 +1759,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg > struct eventfd_ctx *ctx = NULL; > u32 __user *idxp = argp; > struct vhost_virtqueue *vq; > + struct vhost_vring_worker w; > struct vhost_vring_state s; > struct vhost_vring_file f; > u32 idx; > @@ -1794,6 +1874,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg > if (copy_to_user(argp, &s, sizeof(s))) > r = -EFAULT; > break; > + case VHOST_SET_VRING_WORKER: > + if (copy_from_user(&w, argp, sizeof(w))) { > + r = -EFAULT; > + break; > + } > + r = vhost_vq_set_worker(vq, &w); > + if (!r && copy_to_user(argp, &w, sizeof(w))) > + r = -EFAULT; > + break; > default: > r = -ENOIOCTLCMD; > } > @@ -2726,6 +2815,7 @@ EXPORT_SYMBOL_GPL(vhost_set_backend_features); > > static int __init vhost_init(void) > { > + hash_init(vhost_workers); > return 0; > } > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 0a252dd45101..75b884ad1f17 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -14,6 +14,7 @@ > #include <linux/atomic.h> > #include <linux/vhost_iotlb.h> > #include <linux/irqbypass.h> > +#include <linux/refcount.h> > > struct vhost_work; > typedef void (*vhost_work_fn_t)(struct vhost_work *work); > @@ -28,6 +29,8 @@ struct vhost_work { > struct vhost_worker { > struct task_struct *task; > struct llist_head work_list; > + struct hlist_node h_node;h_node is a generic name. If you're willing to use a longer name then vhost_workers_node would make it clear which hlist this is associated with.> + refcount_t refcount; > struct vhost_dev *dev; > int id; > }; > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > index c998860d7bbc..ce32119cb139 100644 > --- a/include/uapi/linux/vhost.h > +++ b/include/uapi/linux/vhost.h > @@ -70,6 +70,12 @@ > #define VHOST_VRING_BIG_ENDIAN 1 > #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state) > #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state) > +/* Create/bind a vhost worker to a virtqueue. If pid > 0 and matches an existing > + * vhost_worker thread it will be bound to the vq. If pid is > + * VHOST_VRING_NEW_WORKER, then a new worker will be created and bound to the > + * vq. > + */Please document when this ioctl must be called (before kick is set up).> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker) > > /* The following ioctls use eventfd file descriptors to signal and poll > * for events. */ > diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h > index f7f6a3a28977..5113baa8bc3e 100644 > --- a/include/uapi/linux/vhost_types.h > +++ b/include/uapi/linux/vhost_types.h > @@ -47,6 +47,18 @@ struct vhost_vring_addr { > __u64 log_guest_addr; > }; > > +#define VHOST_VRING_NEW_WORKER -1 > + > +struct vhost_vring_worker { > + unsigned int index; > + /* > + * The pid of the vhost worker that the vq will be bound to. If > + * pid is VHOST_VRING_NEW_WORKER a new worker will be created and it'ss/it's/its/> + * pid will be returned in pid. > + */ > + __kernel_pid_t pid; > +}; > + > /* no alignment requirement */ > struct vhost_iotlb_msg { > __u64 iova; > -- > 2.25.1 >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/14e9f60e/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 14:31 UTC
[PATCH 8/9] vhost: add vhost_dev pointer to vhost_work
On Tue, May 25, 2021 at 01:05:59PM -0500, Mike Christie wrote:> The next patch allows a vhost_worker to handle different devices. To > prepare for that, this patch adds a pointer to the device on the work so > we can get to the different mms in the vhost_worker thread. > > Signed-off-by: Mike Christie <michael.christie at oracle.com> > --- > drivers/vhost/scsi.c | 7 ++++--- > drivers/vhost/vhost.c | 24 ++++++++++++++---------- > drivers/vhost/vhost.h | 4 +++- > drivers/vhost/vsock.c | 3 ++- > 4 files changed, 23 insertions(+), 15 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/3712c940/attachment.sig>
Stefan Hajnoczi
2021-Jun-03 14:32 UTC
[PATCH 9/9] vhost: support sharing workers across devs
On Tue, May 25, 2021 at 01:06:00PM -0500, Mike Christie wrote:> This allows a worker to handle multiple device's vqs. > > TODO: > - The worker is attached to the cgroup of the device that created it. In > this patch you can share workers with devices with different owners which > could be in different cgroups. Do we want to restict sharing workers with > devices that have the same owner (dev->mm value)?Question for Michael or Jason. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/a05ba1f4/attachment-0001.sig>
On Tue, May 25, 2021 at 01:05:51PM -0500, Mike Christie wrote:> The following patches apply over linus's tree or mst's vhost branch > and my cleanup patchset: > > https://lists.linuxfoundation.org/pipermail/virtualization/2021-May/054354.html > > These patches allow us to support multiple vhost workers per device. I > ended up just doing Stefan's original idea where userspace has the > kernel create a worker and we pass back the pid. This has the benefit > over the workqueue and userspace thread approach where we only have > one'ish code path in the kernel during setup to detect old tools. The > main IO paths and device/vq setup/teardown paths all use common code. > > The kernel patches here allow us to then do N workers device and also > share workers across devices. > > I've also included a patch for qemu so you can get an idea of how it > works. If we are ok with the kernel code then I'll break that up into > a patchset and send to qemu-devel.It seems risky to allow userspace process A to "share" a vhost worker thread with userspace process B based on a matching pid alone. Should they have ptrace_may_access() or similar? For example, two competing users could sabotague each other by running lots of work items on their competitor's vhost worker thread. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/00927312/attachment.sig>