Stefano Garzarella
2020-Dec-09 15:58 UTC
[RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity
Hi Mike, sorry for the delay but there were holidays. On Fri, Dec 04, 2020 at 11:33:11AM -0600, Mike Christie wrote:>On 12/4/20 11:10 AM, Mike Christie wrote: >>On 12/4/20 10:06 AM, Stefano Garzarella wrote: >>>Hi Mike, >>> >>>On Fri, Dec 04, 2020 at 01:56:25AM -0600, Mike Christie wrote: >>>>These patches were made over mst's vhost branch. >>>> >>>>The following patches, made over mst's vhost branch, allow userspace >>>>to set each vq's cpu affinity. Currently, with cgroups the worker thread >>>>inherits the affinity settings, but we are at the mercy of the CPU >>>>scheduler for where the vq's IO will be executed on. This can result in >>>>the scheduler sometimes hammering a couple queues on the host instead of >>>>spreading it out like how the guest's app might have intended if it was >>>>mq aware. >>>> >>>>This version of the patches is not what you guys were talking about >>>>initially like with the interface that was similar to nbd's old >>>>(3.x kernel days) NBD_DO_IT ioctl where userspace calls down to the >>>>kernel and we run from that context. These patches instead just >>>>allow userspace to tell the kernel which CPU a vq should run on. >>>>We then use the kernel's workqueue code to handle the thread >>>>management. >>> >>>I agree that reusing kernel's workqueue code would be a good strategy. >>> >>>One concern is how easy it is to implement an adaptive polling >>>strategy using workqueues. From what I've seen, adding some >>>polling of both backend and virtqueue helps to eliminate >>>interrupts and reduce latency. >>> >>Would the polling you need to do be similar to the vhost net poll >>code like in vhost_net_busy_poll (different algorithm though)? But, >>we want to be able to poll multiple devs/vqs from the same CPU >>right? Something like: >> >>retry: >> >>for each poller on CPU N >> ????????if poller has work >> ?????????????? driver->run work fn >> >>if (poll limit hit) >> ????????return >>else >> ????????cpu_relax(); >>goto retry: >> >>?Yeah, something similar. IIUC vhost_net_busy_poll() polls both vring and backend (socket). Maybe we need to limit the work->fn amount of work to avoid starvation.>> >>If so, I had an idea for it. Let me send an additional patch on top >>of this set.Sure :-)> >Oh yeah, just to make sure I am on the same page for vdpa, because >scsi and net work so differnetly. > >Were you thinking that you would initially run from > >vhost_poll_wakeup -> work->fn > >then in the vdpa work->fn you would do the kick_vq still, but then >also kick off a group backend/vq poller. This would then poll the >vqs/devs that were bound to that CPU from the worker/wq thread.Yes, this seams reasonable!> >So I was thinking you want something similar to network's NAPI. HereI don't know NAPI very well, but IIUC the goal is the same: try to avoid notifications (IRQs from the device, vm-exit from the guest) doing an adaptive polling.>our work->fn is the hard irq, and then the worker is like their softirq >we poll from. >I'm a little lost here... Thanks, Stefano