thr3ads.net - Virtualization - [RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity [Dec 2020]

If this information is useful, please help other people find it:
Share via:

Stefano Garzarella

2020-Dec-09 15:58 UTC

[RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity

Hi Mike,
sorry for the delay but there were holidays.

On Fri, Dec 04, 2020 at 11:33:11AM -0600, Mike Christie
wrote:>On 12/4/20 11:10 AM, Mike Christie wrote:
>>On 12/4/20 10:06 AM, Stefano Garzarella wrote:
>>>Hi Mike,
>>>
>>>On Fri, Dec 04, 2020 at 01:56:25AM -0600, Mike Christie wrote:
>>>>These patches were made over mst's vhost branch.
>>>>
>>>>The following patches, made over mst's vhost branch, allow
userspace
>>>>to set each vq's cpu affinity. Currently, with cgroups the
worker thread
>>>>inherits the affinity settings, but we are at the mercy of the
CPU
>>>>scheduler for where the vq's IO will be executed on. This
can result in
>>>>the scheduler sometimes hammering a couple queues on the host
instead of
>>>>spreading it out like how the guest's app might have
intended if it was
>>>>mq aware.
>>>>
>>>>This version of the patches is not what you guys were talking
about
>>>>initially like with the interface that was similar to nbd's
old
>>>>(3.x kernel days) NBD_DO_IT ioctl where userspace calls down to
the
>>>>kernel and we run from that context. These patches instead just
>>>>allow userspace to tell the kernel which CPU a vq should run on.
>>>>We then use the kernel's workqueue code to handle the thread
>>>>management.
>>>
>>>I agree that reusing kernel's workqueue code would be a good
strategy.
>>>
>>>One concern is how easy it is to implement an adaptive polling 
>>>strategy using workqueues. From what I've seen, adding some 
>>>polling of both backend and virtqueue helps to eliminate 
>>>interrupts and reduce latency.
>>>
>>Would the polling you need to do be similar to the vhost net poll 
>>code like in vhost_net_busy_poll (different algorithm though)? But, 
>>we want to be able to poll multiple devs/vqs from the same CPU 
>>right? Something like:
>>
>>retry:
>>
>>for each poller on CPU N
>> ????????if poller has work
>> ?????????????? driver->run work fn
>>
>>if (poll limit hit)
>> ????????return
>>else
>> ????????cpu_relax();
>>goto retry:
>>
>>?
Yeah, something similar. IIUC vhost_net_busy_poll() polls both vring and 
backend (socket).

Maybe we need to limit the work->fn amount of work to avoid starvation.
>>
>>If so, I had an idea for it. Let me send an additional patch on top 
>>of this set.
Sure :-)
>
>Oh yeah, just to make sure I am on the same page for vdpa, because 
>scsi and net work so differnetly.
>
>Were you thinking that you would initially run from
>
>vhost_poll_wakeup -> work->fn
>
>then in the vdpa work->fn you would do the kick_vq still, but then 
>also kick off a group backend/vq poller. This would then poll the 
>vqs/devs that were bound to that CPU from the worker/wq thread.
Yes, this seams reasonable!
>
>So I was thinking you want something similar to network's NAPI. Here 
I don't know NAPI very well, but IIUC the goal is the same: try to avoid 
notifications (IRQs from the device, vm-exit from the guest) doing an 
adaptive polling.
>our work->fn is the hard irq, and then the worker is like their softirq 
>we poll from.
>
I'm a little lost here...

Thanks,
Stefano

Virtualization - Dec 2020 - [RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity

[RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity