Jason Wang
2018-Dec-14 04:29 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On 2018/12/14 ??4:12, Michael S. Tsirkin wrote:> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: >> Hi: >> >> This series tries to access virtqueue metadata through kernel virtual >> address instead of copy_user() friends since they had too much >> overheads like checks, spec barriers or even hardware feature >> toggling. >> >> Test shows about 24% improvement on TX PPS. It should benefit other >> cases as well. >> >> Please review > I think the idea of speeding up userspace access is a good one. > However I think that moving all checks to start is way too aggressive.So did packet and AF_XDP. Anyway, sharing address space and access them directly is the fastest way. Performance is the major consideration for people to choose backend. Compare to userspace implementation, vhost does not have security advantages at any level. If vhost is still slow, people will start to develop backends based on e.g AF_XDP.> Instead, let's batch things up but let's not keep them > around forever. > Here are some ideas: > > > 1. Disable preemption, process a small number of small packets > directly in an atomic context. This should cut latency > down significantly, the tricky part is to only do it > on a light load and disable this > for the streaming case otherwise it's unfair. > This might fail, if it does just bounce things out to > a thread.I'm not sure what context you meant here. Is this for TX path of TUN? But a fundamental difference is my series is targeted for extreme heavy load not light one, 100% cpu for vhost is expected.> > 2. Switch to unsafe_put_user/unsafe_get_user, > and batch up multiple accesses.As I said, unless we can batch accessing of two difference places of three of avail, descriptor and used. It won't help for batching the accessing of a single place like used. I'm even not sure this can be done consider the case of packed virtqueue, we have a single descriptor ring. Batching through unsafe helpers may not help in this case since it's equivalent to safe ones . And This requires non trivial refactoring of vhost. And such refactoring itself make give us noticeable impact (e.g it may lead regression).> > 3. Allow adding a fixup point manually, > such that multiple independent get_user accesses > can get a single fixup (will allow better compiler > optimizations). >So for metadata access, I don't see how you suggest here can help in the case of heavy workload. For data access, this may help but I've played to batch the data copy to reduce SMAP/spec barriers in vhost-net but I don't see performance improvement. Thanks> > > >> Jason Wang (3): >> vhost: generalize adding used elem >> vhost: fine grain userspace memory accessors >> vhost: access vq metadata through kernel virtual address >> >> drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++---- >> drivers/vhost/vhost.h | 11 ++ >> 2 files changed, 266 insertions(+), 26 deletions(-) >> >> -- >> 2.17.1
Michael S. Tsirkin
2018-Dec-14 12:52 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On Fri, Dec 14, 2018 at 12:29:54PM +0800, Jason Wang wrote:> > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > > > Hi: > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > address instead of copy_user() friends since they had too much > > > overheads like checks, spec barriers or even hardware feature > > > toggling. > > > > > > Test shows about 24% improvement on TX PPS. It should benefit other > > > cases as well. > > > > > > Please review > > I think the idea of speeding up userspace access is a good one. > > However I think that moving all checks to start is way too aggressive. > > > So did packet and AF_XDP. Anyway, sharing address space and access them > directly is the fastest way. Performance is the major consideration for > people to choose backend. Compare to userspace implementation, vhost does > not have security advantages at any level. If vhost is still slow, people > will start to develop backends based on e.g AF_XDP. >Let them what's wrong with that?> > Instead, let's batch things up but let's not keep them > > around forever. > > Here are some ideas: > > > > > > 1. Disable preemption, process a small number of small packets > > directly in an atomic context. This should cut latency > > down significantly, the tricky part is to only do it > > on a light load and disable this > > for the streaming case otherwise it's unfair. > > This might fail, if it does just bounce things out to > > a thread. > > > I'm not sure what context you meant here. Is this for TX path of TUN? But a > fundamental difference is my series is targeted for extreme heavy load not > light one, 100% cpu for vhost is expected.Interesting. You only shared a TCP RR result though. What's the performance gain in a heavy load case?> > > > > 2. Switch to unsafe_put_user/unsafe_get_user, > > and batch up multiple accesses. > > > As I said, unless we can batch accessing of two difference places of three > of avail, descriptor and used. It won't help for batching the accessing of a > single place like used. I'm even not sure this can be done consider the case > of packed virtqueue, we have a single descriptor ring.So that's one of the reasons packed should be faster. Single access and you get the descriptor no messy redirects. Somehow your benchmarking so far didn't show a gain with vhost and packed though - do you know what's wrong?> Batching through > unsafe helpers may not help in this case since it's equivalent to safe ones > . And This requires non trivial refactoring of vhost. And such refactoring > itself make give us noticeable impact (e.g it may lead regression). > > > > > > 3. Allow adding a fixup point manually, > > such that multiple independent get_user accesses > > can get a single fixup (will allow better compiler > > optimizations). > > > > So for metadata access, I don't see how you suggest here can help in the > case of heavy workload. > > For data access, this may help but I've played to batch the data copy to > reduce SMAP/spec barriers in vhost-net but I don't see performance > improvement. > > ThanksSo how about we try to figure what's going on actually? Can you drop the barriers and show the same gain? E.g. vmap does not use a huge page IIRC so in fact it can be slower than direct access. It's not a magic faster way.> > > > > > > > > > Jason Wang (3): > > > vhost: generalize adding used elem > > > vhost: fine grain userspace memory accessors > > > vhost: access vq metadata through kernel virtual address > > > > > > drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++---- > > > drivers/vhost/vhost.h | 11 ++ > > > 2 files changed, 266 insertions(+), 26 deletions(-) > > > > > > -- > > > 2.17.1
David Miller
2018-Dec-15 19:43 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
From: Jason Wang <jasowang at redhat.com> Date: Fri, 14 Dec 2018 12:29:54 +0800> > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: >> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: >>> Hi: >>> >>> This series tries to access virtqueue metadata through kernel virtual >>> address instead of copy_user() friends since they had too much >>> overheads like checks, spec barriers or even hardware feature >>> toggling. >>> >>> Test shows about 24% improvement on TX PPS. It should benefit other >>> cases as well. >>> >>> Please review >> I think the idea of speeding up userspace access is a good one. >> However I think that moving all checks to start is way too aggressive. > > > So did packet and AF_XDP. Anyway, sharing address space and access > them directly is the fastest way. Performance is the major > consideration for people to choose backend. Compare to userspace > implementation, vhost does not have security advantages at any > level. If vhost is still slow, people will start to develop backends > based on e.g AF_XDP.Exactly, this is precisely how this kind of problem should be solved. Michael, I strongly support the approach Jason is taking here, and I would like to ask you to seriously reconsider your objections. Thank you.
Michael S. Tsirkin
2018-Dec-16 19:57 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote:> From: Jason Wang <jasowang at redhat.com> > Date: Fri, 14 Dec 2018 12:29:54 +0800 > > > > > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: > >> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > >>> Hi: > >>> > >>> This series tries to access virtqueue metadata through kernel virtual > >>> address instead of copy_user() friends since they had too much > >>> overheads like checks, spec barriers or even hardware feature > >>> toggling. > >>> > >>> Test shows about 24% improvement on TX PPS. It should benefit other > >>> cases as well. > >>> > >>> Please review > >> I think the idea of speeding up userspace access is a good one. > >> However I think that moving all checks to start is way too aggressive. > > > > > > So did packet and AF_XDP. Anyway, sharing address space and access > > them directly is the fastest way. Performance is the major > > consideration for people to choose backend. Compare to userspace > > implementation, vhost does not have security advantages at any > > level. If vhost is still slow, people will start to develop backends > > based on e.g AF_XDP. > > Exactly, this is precisely how this kind of problem should be solved. > > Michael, I strongly support the approach Jason is taking here, and I > would like to ask you to seriously reconsider your objections. > > Thank you.Okay. Won't be the first time I'm wrong. Let's say we ignore security aspects, but we need to make sure the following all keep working (broken with this revision): - file backed memory (I didn't see where we mark memory dirty - if we don't we get guest memory corruption on close, if we do then host crash as https://lwn.net/Articles/774411/ seems to apply here?) - THP - auto-NUMA Because vhost isn't like AF_XDP where you can just tell people "use hugetlbfs" and "data is removed on close" - people are using it in lots of configurations with guest memory shared between rings and unrelated data. Jason, thoughts on these? -- MST
Possibly Parallel Threads
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()