Michael S. Tsirkin
2018-Jul-12 03:34 UTC
[PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote:> > > On 2018?07?11? 19:59, Michael S. Tsirkin wrote: > > On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: > > > > > > On 2018?07?11? 11:49, Tonghao Zhang wrote: > > > > On Wed, Jul 11, 2018 at 10:56 AM Jason Wang <jasowang at redhat.com> wrote: > > > > > > > > > > On 2018?07?04? 12:31, xiangxia.m.yue at gmail.com wrote: > > > > > > From: Tonghao Zhang <xiangxia.m.yue at gmail.com> > > > > > > > > > > > > This patches improve the guest receive and transmit performance. > > > > > > On the handle_tx side, we poll the sock receive queue at the same time. > > > > > > handle_rx do that in the same way. > > > > > > > > > > > > For more performance report, see patch 4. > > > > > > > > > > > > v4 -> v5: > > > > > > fix some issues > > > > > > > > > > > > v3 -> v4: > > > > > > fix some issues > > > > > > > > > > > > v2 -> v3: > > > > > > This patches are splited from previous big patch: > > > > > > http://patchwork.ozlabs.org/patch/934673/ > > > > > > > > > > > > Tonghao Zhang (4): > > > > > > vhost: lock the vqs one by one > > > > > > net: vhost: replace magic number of lock annotation > > > > > > net: vhost: factor out busy polling logic to vhost_net_busy_poll() > > > > > > net: vhost: add rx busy polling in tx path > > > > > > > > > > > > drivers/vhost/net.c | 108 ++++++++++++++++++++++++++++---------------------- > > > > > > drivers/vhost/vhost.c | 24 ++++------- > > > > > > 2 files changed, 67 insertions(+), 65 deletions(-) > > > > > > > > > > > Hi, any progress on the new version? > > > > > > > > > > I plan to send a new series of packed virtqueue support of vhost. If you > > > > > plan to send it soon, I can wait. Otherwise, I will send my series. > > > > I rebase the codes. and find there is no improvement anymore, the > > > > patches of makita may solve the problem. jason you may send your > > > > patches, and I will do some research on busypoll. > > > I see. Maybe you can try some bi-directional traffic. > > > > > > Btw, lots of optimizations could be done for busy polling. E.g integrating > > > with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome > > > to work or propose new ideas. > > > > > > Thanks > > It seems clear we do need adaptive polling. > > Yes. > > > The difficulty with NAPI > > polling is it can't access guest memory easily. But maybe > > get_user_pages on the polled memory+NAPI polling can work. > > You mean something like zerocopy? Looks like we can do busy polling without > it. I mean something like https://patchwork.kernel.org/patch/8707511/. > > ThanksHow does this patch work? vhost_vq_avail_empty can sleep, you are calling it within an rcu read side critical section. That's not the only problem btw, another one is that the CPU time spent polling isn't accounted with the VM.> > > > > > > Thanks
Jason Wang
2018-Jul-12 05:21 UTC
[PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On 2018?07?12? 11:34, Michael S. Tsirkin wrote:> On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: >> >> On 2018?07?11? 19:59, Michael S. Tsirkin wrote: >>> On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: >>>> On 2018?07?11? 11:49, Tonghao Zhang wrote: >>>>> On Wed, Jul 11, 2018 at 10:56 AM Jason Wang <jasowang at redhat.com> wrote: >>>>>> On 2018?07?04? 12:31, xiangxia.m.yue at gmail.com wrote: >>>>>>> From: Tonghao Zhang <xiangxia.m.yue at gmail.com> >>>>>>> >>>>>>> This patches improve the guest receive and transmit performance. >>>>>>> On the handle_tx side, we poll the sock receive queue at the same time. >>>>>>> handle_rx do that in the same way. >>>>>>> >>>>>>> For more performance report, see patch 4. >>>>>>> >>>>>>> v4 -> v5: >>>>>>> fix some issues >>>>>>> >>>>>>> v3 -> v4: >>>>>>> fix some issues >>>>>>> >>>>>>> v2 -> v3: >>>>>>> This patches are splited from previous big patch: >>>>>>> http://patchwork.ozlabs.org/patch/934673/ >>>>>>> >>>>>>> Tonghao Zhang (4): >>>>>>> vhost: lock the vqs one by one >>>>>>> net: vhost: replace magic number of lock annotation >>>>>>> net: vhost: factor out busy polling logic to vhost_net_busy_poll() >>>>>>> net: vhost: add rx busy polling in tx path >>>>>>> >>>>>>> drivers/vhost/net.c | 108 ++++++++++++++++++++++++++++---------------------- >>>>>>> drivers/vhost/vhost.c | 24 ++++------- >>>>>>> 2 files changed, 67 insertions(+), 65 deletions(-) >>>>>>> >>>>>> Hi, any progress on the new version? >>>>>> >>>>>> I plan to send a new series of packed virtqueue support of vhost. If you >>>>>> plan to send it soon, I can wait. Otherwise, I will send my series. >>>>> I rebase the codes. and find there is no improvement anymore, the >>>>> patches of makita may solve the problem. jason you may send your >>>>> patches, and I will do some research on busypoll. >>>> I see. Maybe you can try some bi-directional traffic. >>>> >>>> Btw, lots of optimizations could be done for busy polling. E.g integrating >>>> with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome >>>> to work or propose new ideas. >>>> >>>> Thanks >>> It seems clear we do need adaptive polling. >> Yes. >> >>> The difficulty with NAPI >>> polling is it can't access guest memory easily. But maybe >>> get_user_pages on the polled memory+NAPI polling can work. >> You mean something like zerocopy? Looks like we can do busy polling without >> it. I mean something like https://patchwork.kernel.org/patch/8707511/. >> >> Thanks > How does this patch work? vhost_vq_avail_empty can sleep, > you are calling it within an rcu read side critical section.Ok, I get your meaning. I have patches to access vring through get_user_pages + vmap() which should help here. (And it increase PPS about 10%-20%).> > That's not the only problem btw, another one is that the > CPU time spent polling isn't accounted with the VM.Yes, but it's not the 'issue' of this patch. And I believe cgroup can help? Thanks> >>>>>> Thanks
Michael S. Tsirkin
2018-Jul-12 05:24 UTC
[PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On Thu, Jul 12, 2018 at 01:21:03PM +0800, Jason Wang wrote:> > > On 2018?07?12? 11:34, Michael S. Tsirkin wrote: > > On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: > > > > > > On 2018?07?11? 19:59, Michael S. Tsirkin wrote: > > > > On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: > > > > > On 2018?07?11? 11:49, Tonghao Zhang wrote: > > > > > > On Wed, Jul 11, 2018 at 10:56 AM Jason Wang <jasowang at redhat.com> wrote: > > > > > > > On 2018?07?04? 12:31, xiangxia.m.yue at gmail.com wrote: > > > > > > > > From: Tonghao Zhang <xiangxia.m.yue at gmail.com> > > > > > > > > > > > > > > > > This patches improve the guest receive and transmit performance. > > > > > > > > On the handle_tx side, we poll the sock receive queue at the same time. > > > > > > > > handle_rx do that in the same way. > > > > > > > > > > > > > > > > For more performance report, see patch 4. > > > > > > > > > > > > > > > > v4 -> v5: > > > > > > > > fix some issues > > > > > > > > > > > > > > > > v3 -> v4: > > > > > > > > fix some issues > > > > > > > > > > > > > > > > v2 -> v3: > > > > > > > > This patches are splited from previous big patch: > > > > > > > > http://patchwork.ozlabs.org/patch/934673/ > > > > > > > > > > > > > > > > Tonghao Zhang (4): > > > > > > > > vhost: lock the vqs one by one > > > > > > > > net: vhost: replace magic number of lock annotation > > > > > > > > net: vhost: factor out busy polling logic to vhost_net_busy_poll() > > > > > > > > net: vhost: add rx busy polling in tx path > > > > > > > > > > > > > > > > drivers/vhost/net.c | 108 ++++++++++++++++++++++++++++---------------------- > > > > > > > > drivers/vhost/vhost.c | 24 ++++------- > > > > > > > > 2 files changed, 67 insertions(+), 65 deletions(-) > > > > > > > > > > > > > > > Hi, any progress on the new version? > > > > > > > > > > > > > > I plan to send a new series of packed virtqueue support of vhost. If you > > > > > > > plan to send it soon, I can wait. Otherwise, I will send my series. > > > > > > I rebase the codes. and find there is no improvement anymore, the > > > > > > patches of makita may solve the problem. jason you may send your > > > > > > patches, and I will do some research on busypoll. > > > > > I see. Maybe you can try some bi-directional traffic. > > > > > > > > > > Btw, lots of optimizations could be done for busy polling. E.g integrating > > > > > with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome > > > > > to work or propose new ideas. > > > > > > > > > > Thanks > > > > It seems clear we do need adaptive polling. > > > Yes. > > > > > > > The difficulty with NAPI > > > > polling is it can't access guest memory easily. But maybe > > > > get_user_pages on the polled memory+NAPI polling can work. > > > You mean something like zerocopy? Looks like we can do busy polling without > > > it. I mean something like https://patchwork.kernel.org/patch/8707511/. > > > > > > Thanks > > How does this patch work? vhost_vq_avail_empty can sleep, > > you are calling it within an rcu read side critical section. > > Ok, I get your meaning. I have patches to access vring through > get_user_pages + vmap() which should help here. (And it increase PPS about > 10%-20%).Remember you must mark it as dirty on unpin too ...> > > > That's not the only problem btw, another one is that the > > CPU time spent polling isn't accounted with the VM. > > > Yes, but it's not the 'issue' of this patch.Yes it is. polling within thread context accounts CPU correctly.> And I believe cgroup can help? > > Thankscgroups are what's broken by polling in irq context.> > > > > > > > > Thanks
Maybe Matching Threads
- [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
- [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
- [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
- [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
- [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop