thr3ads.net - Linux Virtualization - [PATCH v6 11/11] vhost: allow userspace to create workers [Apr 2023]

If this information is useful, please help other people find it:
Share via:

Mike Christie

2023-Apr-04 23:08 UTC

[PATCH v6 11/11] vhost: allow userspace to create workers

On 4/4/23 3:00 AM, Jason Wang wrote:>>
>> -static void vhost_worker_free(struct vhost_dev *dev)
>> +static void vhost_worker_put(struct vhost_dev *dev, struct
vhost_worker *worker)
>>  {
>> -       struct vhost_worker *worker = dev->worker;
>> -
>>         if (!worker)
>>                 return;
>>
>> -       dev->worker = NULL;
>> +       if (!refcount_dec_and_test(&worker->refcount))
>> +               return;
>> +
>>         WARN_ON(!llist_empty(&worker->work_list));
>>         vhost_task_stop(worker->vtsk);
>>         kfree(worker);
>>  }
>>
>> +static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
>> +{
>> +       if (vq->worker)
> 
> What happens to the pending work that queues for the old worker?
I didn't think there would be works queued at this time. I see your comment
below about my assumption about the backend being set being wrong. Will
discuss down there.

>>
>> +/* Caller must have device and virtqueue mutex */
>> +static void __vhost_vq_attach_worker(struct vhost_virtqueue *vq,
>> +                                    struct vhost_worker *worker)
>> +{
>> +       refcount_inc(&worker->refcount);
>> +       vhost_vq_detach_worker(vq);())
>> +       vq->worker = worker;
> 
> What happens if there's a pending flush in a specific device (e.g in
> vhost_scsi_tmf_resp_work())?
We wouldn't hit that specific case where we are running the above code and
vhost_scsi_tmf_resp_work.

Either:

1. The backend is NULL and has never been set, and so we would never have
called vhost_scsi_tmf_resp_work, because it can only be called after the
backend is set.

2. If the backed has been set and vhost_scsi_tmf_resp_work has
run or is running, then we when we would not have called
__vhost_vq_attach_worker
from vhost_vq_attach_worker because it would see vhost_vq_get_backend
returning a non-NULL value.

If vhost_scsi later sets the backend to NULL, then vhost_scsi_clear_endpoint
will have made sure the flush has completed when the clear function returns.
It does that with the device mutex so when we run __vhost_vq_attach_worker
It will only see a vq/worker with no flushes in progress.

For the general case of can we be doing a vhost_dev_flush/vhost_work_flush_on
and __vhost_vq_attach_worker, then I thought we are ok as well because I
thought we have to currently have the device mutex when we flush so those
can't
race with ioctl calls to vhost_vq_attach_worker since we hold the dev mutex
during that ioctls. Or we call flush from the file_operations release function 
so the device is closed and can't race with ioctls.
> 
> Does this mean we need to hold vq mutex when doing the flush? (which
> seems not the case of vhost_scsi_tmf_resp_work()).
> 
>> +}
>> +
>> +/* Caller must have device and virtqueue mutex */
>> +static int vhost_vq_attach_worker(struct vhost_virtqueue *vq,
>> +                                 struct vhost_vring_worker *info)
>> +{
>> +       unsigned long index = info->worker_id;
>> +       struct vhost_dev *dev = vq->dev;
>> +       struct vhost_worker *worker;
>> +
>> +       if (!dev->use_worker)
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * We don't support setting a worker on an active vq to
make flushing
>> +        * and removal simple.
>> +        */
>> +       if (vhost_vq_get_backend(vq))
>> +               return -EBUSY;
> 
> This assumes the worker won't be started until the backend is set
> which is not true.
I can see how we get flushes before setting the backend, but I thought we are
ok because we have the device mutex held.

What are the other possible cases? Is one case we could get a
VHOST_SET_VRING_KICK before setting the backend and so vhost_poll_start starts
to allow vhost_poll_queue before the backend is set? Is there any thing else?

I'll fix this issue.

>> +
>> +/* Caller must have device mutex */
>> +static int vhost_free_worker(struct vhost_dev *dev,
>> +                            struct vhost_worker_state *info)
>> +{
>> +       unsigned long index = info->worker_id;
>> +       struct vhost_worker *worker;
>> +
>> +       if (!dev->use_worker)
>> +               return -EINVAL;
>> +
>> +       worker = xa_find(&dev->worker_xa, &index, UINT_MAX,
XA_PRESENT);
> 
> So we use int for worker_id which conflicts with UINT_MAX here?
I switched from idr in the last versions to xa last second and added this
mistake.
Will fix.

> 
> struct vhost_worker_state {
>         /*
>          * For VHOST_NEW_WORKER the kernel will return the new vhost_worker
id.
>          * For VHOST_FREE_WORKER this must be set to the id of the
vhost_worker
>          * to free.
>          */
>         int worker_id;
> };
> 
> A side effect of using xa_index directly is that userspace can guess
> the xa_index of the default worker and free it here, is this intended?I don't need the functionality. It was only something that I didn't
think mattered much, because you can only free the worker if it's not in
use by any virtqueues, so I didn't add any special code to handle it.
The user would have had to do an ATTACH to the default worker and replace
it then it could free it, so it works like the other workers.
> Should we hide the default worker from xa?
I can change it if you are worried about future problems.

>> -       default:
>> +       case VHOST_ATTACH_VRING_WORKER:
>> +               if (copy_from_user(&w, argp, sizeof(w))) {
>> +                       r = -EFAULT;
>> +                       break;
>> +               }
>> +               r = vhost_vq_attach_worker(vq, &w);
>> +               if (!r && copy_to_user(argp, &w,
sizeof(w)))
>> +                       r = -EFAULT;
>> +               break;
> 
> It's a little odd that we have new and attach but only a free.
Do you mean we would also want a detach? I've been debating that.
I'm not sure what the user wanted though.

Would it leave the virtqueue with no worker? Or, does it pick the first
worker it finds? If there is no worker because we did the former or
because userspace detached all of them, then do we just drop work that
gets queued?

It seemed like a user would never want to drop work, so I made it so
you can only tell it to use new workers (attach which I guess is more
like a swap worker) so we always have a worker and I don't have to
worry about dropping work.

Jason Wang

2023-Apr-10 07:04 UTC

head link

[PATCH v6 11/11] vhost: allow userspace to create workers

On Wed, Apr 5, 2023 at 7:08?AM Mike Christie
<michael.christie at oracle.com> wrote:>
> On 4/4/23 3:00 AM, Jason Wang wrote:
> >>
> >> -static void vhost_worker_free(struct vhost_dev *dev)
> >> +static void vhost_worker_put(struct vhost_dev *dev, struct
vhost_worker *worker)
> >>  {
> >> -       struct vhost_worker *worker = dev->worker;
> >> -
> >>         if (!worker)
> >>                 return;
> >>
> >> -       dev->worker = NULL;
> >> +       if (!refcount_dec_and_test(&worker->refcount))
> >> +               return;
> >> +
> >>         WARN_ON(!llist_empty(&worker->work_list));
> >>         vhost_task_stop(worker->vtsk);
> >>         kfree(worker);
> >>  }
> >>
> >> +static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
> >> +{
> >> +       if (vq->worker)
> >
> > What happens to the pending work that queues for the old worker?
>
> I didn't think there would be works queued at this time. I see your
comment
> below about my assumption about the backend being set being wrong. Will
> discuss down there.
>
>
> >>
> >> +/* Caller must have device and virtqueue mutex */
> >> +static void __vhost_vq_attach_worker(struct vhost_virtqueue *vq,
> >> +                                    struct vhost_worker *worker)
> >> +{
> >> +       refcount_inc(&worker->refcount);
> >> +       vhost_vq_detach_worker(vq);())
> >> +       vq->worker = worker;
> >
> > What happens if there's a pending flush in a specific device (e.g
in
> > vhost_scsi_tmf_resp_work())?
>
> We wouldn't hit that specific case where we are running the above code
and
> vhost_scsi_tmf_resp_work.
>
> Either:
>
> 1. The backend is NULL and has never been set, and so we would never have
> called vhost_scsi_tmf_resp_work, because it can only be called after the
> backend is set.
>
> 2. If the backed has been set and vhost_scsi_tmf_resp_work has
> run or is running, then we when we would not have called
__vhost_vq_attach_worker
> from vhost_vq_attach_worker because it would see vhost_vq_get_backend
> returning a non-NULL value.
>
> If vhost_scsi later sets the backend to NULL, then
vhost_scsi_clear_endpoint
> will have made sure the flush has completed when the clear function
returns.
> It does that with the device mutex so when we run __vhost_vq_attach_worker
> It will only see a vq/worker with no flushes in progress.
Ok.
>
> For the general case of can we be doing a
vhost_dev_flush/vhost_work_flush_on
> and __vhost_vq_attach_worker, then I thought we are ok as well because I
> thought we have to currently have the device mutex when we flush so those
can't
> race with ioctl calls to vhost_vq_attach_worker since we hold the dev mutex
> during that ioctls.
I'm not sure I understand here, but we can't assume the user of
vhost_work_queue() is called in the ioctl context. See
vhost_zerocopy_callback(). But since you want to limit the call before
set_backend, another question comes, do we really need the dynamic
attaching/creating in this case? How about a simple one ioctl that is
used to build the queue to workers mapping instead?
> Or we call flush from the file_operations release function
> so the device is closed and can't race with ioctls.
>
> >
> > Does this mean we need to hold vq mutex when doing the flush? (which
> > seems not the case of vhost_scsi_tmf_resp_work()).
> >
> >> +}
> >> +
> >> +/* Caller must have device and virtqueue mutex */
> >> +static int vhost_vq_attach_worker(struct vhost_virtqueue *vq,
> >> +                                 struct vhost_vring_worker *info)
> >> +{
> >> +       unsigned long index = info->worker_id;
> >> +       struct vhost_dev *dev = vq->dev;
> >> +       struct vhost_worker *worker;
> >> +
> >> +       if (!dev->use_worker)
> >> +               return -EINVAL;
> >> +
> >> +       /*
> >> +        * We don't support setting a worker on an active vq
to make flushing
> >> +        * and removal simple.
> >> +        */
> >> +       if (vhost_vq_get_backend(vq))
> >> +               return -EBUSY;
> >
> > This assumes the worker won't be started until the backend is set
> > which is not true.
>
> I can see how we get flushes before setting the backend, but I thought we
are
> ok because we have the device mutex held.
>
> What are the other possible cases? Is one case we could get a
> VHOST_SET_VRING_KICK before setting the backend and so vhost_poll_start
starts
> to allow vhost_poll_queue before the backend is set?
Yes, and it's not necessarily the kick, the networking core could wake
up workers before set_backend.
> Is there any thing else?
Haven't found time to do this, but what I want to say is, if we want
this assumption, we need to document it and change the devices that
are affected by this change.
>
> I'll fix this issue.
>
>
> >> +
> >> +/* Caller must have device mutex */
> >> +static int vhost_free_worker(struct vhost_dev *dev,
> >> +                            struct vhost_worker_state *info)
> >> +{
> >> +       unsigned long index = info->worker_id;
> >> +       struct vhost_worker *worker;
> >> +
> >> +       if (!dev->use_worker)
> >> +               return -EINVAL;
> >> +
> >> +       worker = xa_find(&dev->worker_xa, &index,
UINT_MAX, XA_PRESENT);
> >
> > So we use int for worker_id which conflicts with UINT_MAX here?
>
> I switched from idr in the last versions to xa last second and added this
mistake.
> Will fix.
>
>
> >
> > struct vhost_worker_state {
> >         /*
> >          * For VHOST_NEW_WORKER the kernel will return the new
vhost_worker id.
> >          * For VHOST_FREE_WORKER this must be set to the id of the
vhost_worker
> >          * to free.
> >          */
> >         int worker_id;
> > };
> >
> > A side effect of using xa_index directly is that userspace can guess
> > the xa_index of the default worker and free it here, is this intended?
> I don't need the functionality. It was only something that I didn't
> think mattered much, because you can only free the worker if it's not
in
> use by any virtqueues, so I didn't add any special code to handle it.
> The user would have had to do an ATTACH to the default worker and replace
> it then it could free it, so it works like the other workers.
It's about the robustness of the uAPI:

struct vhost_worker_state {
        /*
         * For VHOST_NEW_WORKER the kernel will return the new vhost_worker id.
         * For VHOST_FREE_WORKER this must be set to the id of the vhost_worker
         * to free.
         */
        int worker_id;
};

It looks like the workder_id is the one userspace gets from
VHOST_NEW_WORKER. If yes, we should hide the default worker.

>
> > Should we hide the default worker from xa?
>
> I can change it if you are worried about future problems.
>
>
> >> -       default:
> >> +       case VHOST_ATTACH_VRING_WORKER:
> >> +               if (copy_from_user(&w, argp, sizeof(w))) {
> >> +                       r = -EFAULT;
> >> +                       break;
> >> +               }
> >> +               r = vhost_vq_attach_worker(vq, &w);
> >> +               if (!r && copy_to_user(argp, &w,
sizeof(w)))
> >> +                       r = -EFAULT;
> >> +               break;
> >
> > It's a little odd that we have new and attach but only a free.
>
> Do you mean we would also want a detach? I've been debating that.
> I'm not sure what the user wanted though.
>
> Would it leave the virtqueue with no worker? Or, does it pick the first
> worker it finds? If there is no worker because we did the former or
> because userspace detached all of them, then do we just drop work that
> gets queued?
>
> It seemed like a user would never want to drop work, so I made it so
> you can only tell it to use new workers (attach which I guess is more
> like a swap worker)
swap and free old worker indeed?

static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
{
        if (vq->worker)
                vhost_worker_put(vq->dev, vq->worker);
        vq->worker = NULL;
}

That makes me think under which case we should use VHOST_FREE_WORKER?

Thanks
> so we always have a worker and I don't have to
> worry about dropping work.
>

Seemingly Similar Threads

Search for more maybe matching threads

Linux Virtualization - Apr 2023 - [PATCH v6 11/11] vhost: allow userspace to create workers

[PATCH v6 11/11] vhost: allow userspace to create workers

[PATCH v6 11/11] vhost: allow userspace to create workers

Seemingly Similar Threads