thr3ads.net - Linux Virtualization - [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2018-Dec-14 04:29 UTC

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

On 2018/12/14 ??4:12, Michael S. Tsirkin wrote:> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
>> Hi:
>>
>> This series tries to access virtqueue metadata through kernel virtual
>> address instead of copy_user() friends since they had too much
>> overheads like checks, spec barriers or even hardware feature
>> toggling.
>>
>> Test shows about 24% improvement on TX PPS. It should benefit other
>> cases as well.
>>
>> Please review
> I think the idea of speeding up userspace access is a good one.
> However I think that moving all checks to start is way too aggressive.

So did packet and AF_XDP. Anyway, sharing address space and access them 
directly is the fastest way. Performance is the major consideration for 
people to choose backend. Compare to userspace implementation, vhost 
does not have security advantages at any level. If vhost is still slow, 
people will start to develop backends based on e.g AF_XDP.

> Instead, let's batch things up but let's not keep them
> around forever.
> Here are some ideas:
>
>
> 1. Disable preemption, process a small number of small packets
>     directly in an atomic context. This should cut latency
>     down significantly, the tricky part is to only do it
>     on a light load and disable this
>     for the streaming case otherwise it's unfair.
>     This might fail, if it does just bounce things out to
>     a thread.

I'm not sure what context you meant here. Is this for TX path of TUN? 
But a fundamental difference is my series is targeted for extreme heavy 
load not light one, 100% cpu for vhost is expected.

>
> 2. Switch to unsafe_put_user/unsafe_get_user,
>     and batch up multiple accesses.

As I said, unless we can batch accessing of two difference places of 
three of avail, descriptor and used. It won't help for batching the 
accessing of a single place like used. I'm even not sure this can be 
done consider the case of packed virtqueue, we have a single descriptor 
ring. Batching through unsafe helpers may not help in this case since 
it's equivalent to safe ones . And This requires non trivial refactoring 
of vhost. And such refactoring itself make give us noticeable impact 
(e.g it may lead regression).

>
> 3. Allow adding a fixup point manually,
>     such that multiple independent get_user accesses
>     can get a single fixup (will allow better compiler
>     optimizations).
>
So for metadata access, I don't see how you suggest here can help in the 
case of heavy workload.

For data access, this may help but I've played to batch the data copy to 
reduce SMAP/spec barriers in vhost-net but I don't see performance 
improvement.

Thanks

>
>
>
>> Jason Wang (3):
>>    vhost: generalize adding used elem
>>    vhost: fine grain userspace memory accessors
>>    vhost: access vq metadata through kernel virtual address
>>
>>   drivers/vhost/vhost.c | 281
++++++++++++++++++++++++++++++++++++++----
>>   drivers/vhost/vhost.h |  11 ++
>>   2 files changed, 266 insertions(+), 26 deletions(-)
>>
>> -- 
>> 2.17.1

Michael S. Tsirkin

2018-Dec-14 12:52 UTC

head link

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

On Fri, Dec 14, 2018 at 12:29:54PM +0800, Jason Wang
wrote:> 
> On 2018/12/14 ??4:12, Michael S. Tsirkin wrote:
> > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
> > > Hi:
> > > 
> > > This series tries to access virtqueue metadata through kernel
virtual
> > > address instead of copy_user() friends since they had too much
> > > overheads like checks, spec barriers or even hardware feature
> > > toggling.
> > > 
> > > Test shows about 24% improvement on TX PPS. It should benefit
other
> > > cases as well.
> > > 
> > > Please review
> > I think the idea of speeding up userspace access is a good one.
> > However I think that moving all checks to start is way too aggressive.
> 
> 
> So did packet and AF_XDP. Anyway, sharing address space and access them
> directly is the fastest way. Performance is the major consideration for
> people to choose backend. Compare to userspace implementation, vhost does
> not have security advantages at any level. If vhost is still slow, people
> will start to develop backends based on e.g AF_XDP.
> 
Let them what's wrong with that?
> > Instead, let's batch things up but let's not keep them
> > around forever.
> > Here are some ideas:
> > 
> > 
> > 1. Disable preemption, process a small number of small packets
> >     directly in an atomic context. This should cut latency
> >     down significantly, the tricky part is to only do it
> >     on a light load and disable this
> >     for the streaming case otherwise it's unfair.
> >     This might fail, if it does just bounce things out to
> >     a thread.
> 
> 
> I'm not sure what context you meant here. Is this for TX path of TUN?
But a
> fundamental difference is my series is targeted for extreme heavy load not
> light one, 100% cpu for vhost is expected.
Interesting. You only shared a TCP RR result though.
What's the performance gain in a heavy load case?
> 
> > 
> > 2. Switch to unsafe_put_user/unsafe_get_user,
> >     and batch up multiple accesses.
> 
> 
> As I said, unless we can batch accessing of two difference places of three
> of avail, descriptor and used. It won't help for batching the accessing
of a
> single place like used. I'm even not sure this can be done consider the
case
> of packed virtqueue, we have a single descriptor ring.
So that's one of the reasons packed should be faster. Single access
and you get the descriptor no messy redirects. Somehow your
benchmarking so far didn't show a gain with vhost and
packed though - do you know what's wrong?
> Batching through
> unsafe helpers may not help in this case since it's equivalent to safe
ones
> . And This requires non trivial refactoring of vhost. And such refactoring
> itself make give us noticeable impact (e.g it may lead regression).
> 
> 
> > 
> > 3. Allow adding a fixup point manually,
> >     such that multiple independent get_user accesses
> >     can get a single fixup (will allow better compiler
> >     optimizations).
> > 
> 
> So for metadata access, I don't see how you suggest here can help in
the
> case of heavy workload.
> 
> For data access, this may help but I've played to batch the data copy
to
> reduce SMAP/spec barriers in vhost-net but I don't see performance
> improvement.
> 
> Thanks
So how about we try to figure what's going on actually?
Can you drop the barriers and show the same gain?
E.g. vmap does not use a huge page IIRC so in fact it
can be slower than direct access. It's not a magic
faster way.


> 
> > 
> > 
> > 
> > > Jason Wang (3):
> > >    vhost: generalize adding used elem
> > >    vhost: fine grain userspace memory accessors
> > >    vhost: access vq metadata through kernel virtual address
> > > 
> > >   drivers/vhost/vhost.c | 281
++++++++++++++++++++++++++++++++++++++----
> > >   drivers/vhost/vhost.h |  11 ++
> > >   2 files changed, 266 insertions(+), 26 deletions(-)
> > > 
> > > -- 
> > > 2.17.1

David Miller

2018-Dec-15 19:43 UTC

head link

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

From: Jason Wang <jasowang at redhat.com>
Date: Fri, 14 Dec 2018 12:29:54 +0800
> 
> On 2018/12/14 ??4:12, Michael S. Tsirkin wrote:
>> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
>>> Hi:
>>>
>>> This series tries to access virtqueue metadata through kernel
virtual
>>> address instead of copy_user() friends since they had too much
>>> overheads like checks, spec barriers or even hardware feature
>>> toggling.
>>>
>>> Test shows about 24% improvement on TX PPS. It should benefit other
>>> cases as well.
>>>
>>> Please review
>> I think the idea of speeding up userspace access is a good one.
>> However I think that moving all checks to start is way too aggressive.
> 
> 
> So did packet and AF_XDP. Anyway, sharing address space and access
> them directly is the fastest way. Performance is the major
> consideration for people to choose backend. Compare to userspace
> implementation, vhost does not have security advantages at any
> level. If vhost is still slow, people will start to develop backends
> based on e.g AF_XDP.
Exactly, this is precisely how this kind of problem should be solved.

Michael, I strongly support the approach Jason is taking here, and I
would like to ask you to seriously reconsider your objections.

Thank you.

Michael S. Tsirkin

2018-Dec-16 19:57 UTC

head link

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller
wrote:> From: Jason Wang <jasowang at redhat.com>
> Date: Fri, 14 Dec 2018 12:29:54 +0800
> 
> > 
> > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote:
> >> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
> >>> Hi:
> >>>
> >>> This series tries to access virtqueue metadata through kernel
virtual
> >>> address instead of copy_user() friends since they had too much
> >>> overheads like checks, spec barriers or even hardware feature
> >>> toggling.
> >>>
> >>> Test shows about 24% improvement on TX PPS. It should benefit
other
> >>> cases as well.
> >>>
> >>> Please review
> >> I think the idea of speeding up userspace access is a good one.
> >> However I think that moving all checks to start is way too
aggressive.
> > 
> > 
> > So did packet and AF_XDP. Anyway, sharing address space and access
> > them directly is the fastest way. Performance is the major
> > consideration for people to choose backend. Compare to userspace
> > implementation, vhost does not have security advantages at any
> > level. If vhost is still slow, people will start to develop backends
> > based on e.g AF_XDP.
> 
> Exactly, this is precisely how this kind of problem should be solved.
> 
> Michael, I strongly support the approach Jason is taking here, and I
> would like to ask you to seriously reconsider your objections.
> 
> Thank you.
Okay. Won't be the first time I'm wrong.

Let's say we ignore security aspects, but we need to make sure the
following all keep working (broken with this revision):
- file backed memory (I didn't see where we mark memory dirty -
  if we don't we get guest memory corruption on close, if we do
  then host crash as https://lwn.net/Articles/774411/ seems to apply here?)
- THP
- auto-NUMA

Because vhost isn't like AF_XDP where you can just tell people "use
hugetlbfs" and "data is removed on close" - people are using it
in lots
of configurations with guest memory shared between rings and unrelated
data.

Jason, thoughts on these?

-- 
MST

Possibly Parallel Threads

Search for more apparently analagous threads

Linux Virtualization - Dec 2018 - [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

Possibly Parallel Threads