Michael S. Tsirkin
2018-Dec-16 19:57 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote:> From: Jason Wang <jasowang at redhat.com> > Date: Fri, 14 Dec 2018 12:29:54 +0800 > > > > > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: > >> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > >>> Hi: > >>> > >>> This series tries to access virtqueue metadata through kernel virtual > >>> address instead of copy_user() friends since they had too much > >>> overheads like checks, spec barriers or even hardware feature > >>> toggling. > >>> > >>> Test shows about 24% improvement on TX PPS. It should benefit other > >>> cases as well. > >>> > >>> Please review > >> I think the idea of speeding up userspace access is a good one. > >> However I think that moving all checks to start is way too aggressive. > > > > > > So did packet and AF_XDP. Anyway, sharing address space and access > > them directly is the fastest way. Performance is the major > > consideration for people to choose backend. Compare to userspace > > implementation, vhost does not have security advantages at any > > level. If vhost is still slow, people will start to develop backends > > based on e.g AF_XDP. > > Exactly, this is precisely how this kind of problem should be solved. > > Michael, I strongly support the approach Jason is taking here, and I > would like to ask you to seriously reconsider your objections. > > Thank you.Okay. Won't be the first time I'm wrong. Let's say we ignore security aspects, but we need to make sure the following all keep working (broken with this revision): - file backed memory (I didn't see where we mark memory dirty - if we don't we get guest memory corruption on close, if we do then host crash as https://lwn.net/Articles/774411/ seems to apply here?) - THP - auto-NUMA Because vhost isn't like AF_XDP where you can just tell people "use hugetlbfs" and "data is removed on close" - people are using it in lots of configurations with guest memory shared between rings and unrelated data. Jason, thoughts on these? -- MST
Jason Wang
2018-Dec-24 08:44 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On 2018/12/17 ??3:57, Michael S. Tsirkin wrote:> On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote: >> From: Jason Wang <jasowang at redhat.com> >> Date: Fri, 14 Dec 2018 12:29:54 +0800 >> >>> On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: >>>> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: >>>>> Hi: >>>>> >>>>> This series tries to access virtqueue metadata through kernel virtual >>>>> address instead of copy_user() friends since they had too much >>>>> overheads like checks, spec barriers or even hardware feature >>>>> toggling. >>>>> >>>>> Test shows about 24% improvement on TX PPS. It should benefit other >>>>> cases as well. >>>>> >>>>> Please review >>>> I think the idea of speeding up userspace access is a good one. >>>> However I think that moving all checks to start is way too aggressive. >>> >>> So did packet and AF_XDP. Anyway, sharing address space and access >>> them directly is the fastest way. Performance is the major >>> consideration for people to choose backend. Compare to userspace >>> implementation, vhost does not have security advantages at any >>> level. If vhost is still slow, people will start to develop backends >>> based on e.g AF_XDP. >> Exactly, this is precisely how this kind of problem should be solved. >> >> Michael, I strongly support the approach Jason is taking here, and I >> would like to ask you to seriously reconsider your objections. >> >> Thank you. > Okay. Won't be the first time I'm wrong. > > Let's say we ignore security aspects, but we need to make sure the > following all keep working (broken with this revision): > - file backed memory (I didn't see where we mark memory dirty - > if we don't we get guest memory corruption on close, if we do > then host crash as https://lwn.net/Articles/774411/ seems to apply here?)We only pin metadata pages, so I don't think they can be used for DMA. So it was probably not an issue. The real issue is zerocopy codes, maybe it's time to disable it by default?> - THPWe will miss 2 or 4 pages for THP, I wonder whether or not it's measurable.> - auto-NUMAI'm not sure auto-NUMA will help for the case of IPC. It can damage the performance in the worst case if vhost and userspace are running in two different nodes. Anyway I can measure.> > Because vhost isn't like AF_XDP where you can just tell people "use > hugetlbfs" and "data is removed on close" - people are using it in lots > of configurations with guest memory shared between rings and unrelated > data.This series doesn't share data, only metadata is shared.> > Jason, thoughts on these? >Based on the above, I can measure the impact of THP to see how it impacts. For unsafe variants, it can only work for when we can batch the access and it needs non trivial rework on the vhost codes with unexpected amount of work for archs other than x86. I'm not sure it's worth to try. Thanks
Michael S. Tsirkin
2018-Dec-24 19:09 UTC
[PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
On Mon, Dec 24, 2018 at 04:44:14PM +0800, Jason Wang wrote:> > On 2018/12/17 ??3:57, Michael S. Tsirkin wrote: > > On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote: > > > From: Jason Wang <jasowang at redhat.com> > > > Date: Fri, 14 Dec 2018 12:29:54 +0800 > > > > > > > On 2018/12/14 ??4:12, Michael S. Tsirkin wrote: > > > > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > > > > > > Hi: > > > > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > address instead of copy_user() friends since they had too much > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > toggling. > > > > > > > > > > > > Test shows about 24% improvement on TX PPS. It should benefit other > > > > > > cases as well. > > > > > > > > > > > > Please review > > > > > I think the idea of speeding up userspace access is a good one. > > > > > However I think that moving all checks to start is way too aggressive. > > > > > > > > So did packet and AF_XDP. Anyway, sharing address space and access > > > > them directly is the fastest way. Performance is the major > > > > consideration for people to choose backend. Compare to userspace > > > > implementation, vhost does not have security advantages at any > > > > level. If vhost is still slow, people will start to develop backends > > > > based on e.g AF_XDP. > > > Exactly, this is precisely how this kind of problem should be solved. > > > > > > Michael, I strongly support the approach Jason is taking here, and I > > > would like to ask you to seriously reconsider your objections. > > > > > > Thank you. > > Okay. Won't be the first time I'm wrong. > > > > Let's say we ignore security aspects, but we need to make sure the > > following all keep working (broken with this revision): > > - file backed memory (I didn't see where we mark memory dirty - > > if we don't we get guest memory corruption on close, if we do > > then host crash as https://lwn.net/Articles/774411/ seems to apply here?) > > > We only pin metadata pages, so I don't think they can be used for DMA. So it > was probably not an issue. The real issue is zerocopy codes, maybe it's time > to disable it by default? > > > > - THP > > > We will miss 2 or 4 pages for THP, I wonder whether or not it's measurable. > > > > - auto-NUMA > > > I'm not sure auto-NUMA will help for the case of IPC. It can damage the > performance in the worst case if vhost and userspace are running in two > different nodes. Anyway I can measure. > > > > > > Because vhost isn't like AF_XDP where you can just tell people "use > > hugetlbfs" and "data is removed on close" - people are using it in lots > > of configurations with guest memory shared between rings and unrelated > > data. > > > This series doesn't share data, only metadata is shared.Let me clarify - I mean that metadata is in same huge page with unrelated guest data.> > > > > Jason, thoughts on these? > > > > Based on the above, I can measure the impact of THP to see how it impacts. > > For unsafe variants, it can only work for when we can batch the access and > it needs non trivial rework on the vhost codes with unexpected amount of > work for archs other than x86. I'm not sure it's worth to try. > > ThanksYes I think we need better APIs in vhost. Right now we have an API to get and translate a single buffer. We should have one that gets a batch of descriptors and stores it, then one that translates this batch. IMHO this will benefit everyone even if we do vmap due to better code locality. -- MST
Apparently Analagous Threads
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
- [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()