Benjamin Herrenschmidt
2018-Aug-02 21:13 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:> > Yes, this is the purpose of Anshuman original patch (I haven't looked > > at the details of the patch in a while but that's what I told him to > > implement ;-) : > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > of "transparent" ops for legacy) > > > > and > > > > - Provide an arch hook allowing us to "override" those "transparent" > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > Cheers, > > Ben. > > > > Right but as I tried to say doing that brings us to a bunch of issues > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > guest to hypervisor communication.I'm not sure I see the problem, see below> When we do (as is the case with PLATFORM_IOMMU right now) this adds a > bunch of overhead which we need to get rid of if we are to switch to > PLATFORM_IOMMU by default. We need to fix that.So let's differenciate the two problems of having an IOMMU (real or emulated) which indeeds adds overhead etc... and using the DMA API. At the moment, virtio does this all over the place: if (use_dma_api) dma_map/alloc_something(...) else use_pa The idea of the patch set is to do two, somewhat orthogonal, changes that together achieve what we want. Let me know where you think there is "a bunch of issues" because I'm missing it: 1- Replace the above if/else constructs with just calling the DMA API, and have virtio, at initialization, hookup its own dma_ops that just "return pa" (roughly) when the IOMMU stuff isn't used. This adds an indirect function call to the path that previously didn't have one (the else case above). Is that a significant/measurable overhead ? This change stands alone, and imho "cleans" up virtio by avoiding all that if/else "2 path" and unless it adds a measurable overhead, should probably be done. 2- Make virtio use the DMA API with our custom platform-provided swiotlb callbacks when needed, that is when not using IOMMU *and* running on a secure VM in our case. This benefits from -1- by making us just plumb in a different set of DMA ops we would have cooked up specifically for virtio in our arch code (or in virtio itself but build arch-conditionally in a separate file). But it doesn't strictly need it -1-: Now, -2- doesn't strictly needs -1-. We could have just done another xen-like hack that forces the DMA API "ON" for virtio when running in a secure VM. The problem if we do that however is that we also then need the arch PCI code to make sure it hooks up the virtio PCI devices with the special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, not the same as other PCI devices. So it will have to play games such as checking vendor/device IDs for virtio, checking the IOMMU flag, etc... from the arch code which really bloody sucks when assigning PCI DMA ops. However, if we do it the way we plan here, on top of -1-, with a hook called from virtio into the arch to "override" the virtio DMA ops, then we avoid the problem completely: The arch hook would only be called by virtio if the IOMMU flag is *not* set. IE only when using that special "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal PCI dma ops as usual. That way, we have a very clear semantic: This hook is purely about replacing those "null" DMA ops that just return PA introduced in -1- with some arch provided specially cooked up DMA ops for non-IOMMU virtio that know about the arch special requirements. For us bounce buffering. Is there something I'm missing ? Cheers, Ben.
On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:> On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote: > > > Yes, this is the purpose of Anshuman original patch (I haven't looked > > > at the details of the patch in a while but that's what I told him to > > > implement ;-) : > > > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > > of "transparent" ops for legacy) > > > > > > and > > > > > > - Provide an arch hook allowing us to "override" those "transparent" > > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > > > Cheers, > > > Ben. > > > > > > > Right but as I tried to say doing that brings us to a bunch of issues > > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > > guest to hypervisor communication. > > I'm not sure I see the problem, see below > > > When we do (as is the case with PLATFORM_IOMMU right now) this adds a > > bunch of overhead which we need to get rid of if we are to switch to > > PLATFORM_IOMMU by default. We need to fix that. > > So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API.Well actually it's the other way around. An iommu in theory doesn't need to bring overhead if you set it in bypass mode. Which does imply the iommu supports bypass mode. Is that universally the case? DMA API does see Christoph's list of things it does some of which add overhead.> At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ?Seems to be :( Jason reports about 4%. I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. Would this cover all platforms with kvm (which is where we care most about performance)?> This change stands alone, and imho "cleans" up virtio by avoiding all > that if/else "2 path" and unless it adds a measurable overhead, should > probably be done. > > 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case. > > This benefits from -1- by making us just plumb in a different set of > DMA ops we would have cooked up specifically for virtio in our arch > code (or in virtio itself but build arch-conditionally in a separate > file). But it doesn't strictly need it -1-: > > Now, -2- doesn't strictly needs -1-. We could have just done another > xen-like hack that forces the DMA API "ON" for virtio when running in a > secure VM. > > The problem if we do that however is that we also then need the arch > PCI code to make sure it hooks up the virtio PCI devices with the > special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, > not the same as other PCI devices. So it will have to play games such > as checking vendor/device IDs for virtio, checking the IOMMU flag, > etc... from the arch code which really bloody sucks when assigning PCI > DMA ops. > > However, if we do it the way we plan here, on top of -1-, with a hook > called from virtio into the arch to "override" the virtio DMA ops, then > we avoid the problem completely: The arch hook would only be called by > virtio if the IOMMU flag is *not* set. IE only when using that special > "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal > PCI dma ops as usual. > > That way, we have a very clear semantic: This hook is purely about > replacing those "null" DMA ops that just return PA introduced in -1- > with some arch provided specially cooked up DMA ops for non-IOMMU > virtio that know about the arch special requirements. For us bounce > buffering. > > Is there something I'm missing ? > > Cheers, > Ben.Right so I was trying to write it up in a systematic way, but just to give you one example, if there is a system where DMA API handles coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU flag causes that to happen. And we kinda worked around this without the IOMMU by basically saying "ok we do not really need DMA API so let's just bypass it" and it was kind of ok except now everyone is switching to vIOMMU just in case. So now people do want some parts of what DMA API does, such as the bounce buffer use, or IOMMU mappings. And maybe in the end the solution is going to be to do something similar to virt_Xmb except for DMA APIs: add APIs that handle just the addressing bits but without the overhead. See commit 6a65d26385bf487926a0616650927303058551e3 asm-generic: implement virt_xxx memory barriers for reference, it's a similar set of issues. So it's not a problem with your patches as such, it's just that they don't solve that harder problem. -- MST
On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:> So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API. > > At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ?If you call it often enough it does: https://www.spinics.net/lists/netdev/msg495413.html> 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case.And total NAK the customer platform-provided part of this. We need a flag passed in from the hypervisor that the device needs all bus specific dma api treatment, and then just use the normal plaform dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI dma-range property to limit the addressable range, and a swiotlb capable plaform will use swiotlb automatically.
Benjamin Herrenschmidt
2018-Aug-03 15:58 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:> > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup.Christoph, as I have explained already, we do NOT have a way to provide such a flag as neither the hypervisor nor qemu knows anything about this when the VM is created.> To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically.This cannot be done as you describe it. The VM is created as a *normal* VM. The DT stuff is generated by qemu at a point where it has *no idea* that the VM will later become secure and thus will have to restrict which pages can be used for "DMA". The VM will *at runtime* turn itself into a secure VM via interactions with the security HW and the Ultravisor layer (which sits below the HV). This happens way after the DT has been created and consumed, the qemu devices instanciated etc... Only the guest kernel knows because it initates the transition. When that happens, the virtio devices have already been used by the guest firmware, bootloader, possibly another kernel that kexeced the "secure" one, etc... So instead of running around saying NAK NAK NAK, please explain how we can solve that differently. Ben.
On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote:> On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > > So let's differenciate the two problems of having an IOMMU (real or > > emulated) which indeeds adds overhead etc... and using the DMA API. > > > > At the moment, virtio does this all over the place: > > > > if (use_dma_api) > > dma_map/alloc_something(...) > > else > > use_pa > > > > The idea of the patch set is to do two, somewhat orthogonal, changes > > that together achieve what we want. Let me know where you think there > > is "a bunch of issues" because I'm missing it: > > > > 1- Replace the above if/else constructs with just calling the DMA API, > > and have virtio, at initialization, hookup its own dma_ops that just > > "return pa" (roughly) when the IOMMU stuff isn't used. > > > > This adds an indirect function call to the path that previously didn't > > have one (the else case above). Is that a significant/measurable > > overhead ? > > If you call it often enough it does: > > https://www.spinics.net/lists/netdev/msg495413.html > > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically.It seems reasonable to teach a platform to override dma-range for a specific device e.g. in case it knows about bugs in ACPI. -- MST