Benjamin Herrenschmidt
2018-Aug-02 17:53 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote:> > I see. So yes, given that device does not know or care, using > virtio features is an awkward fit. > > So let's say as a quick fix for you maybe we could generalize the > xen_domain hack, instead of just checking xen_domain check some static > branch. Then teach xen and others to enable that.>> OK but problem then becomes this: if you do this and virtio device appears > behind a vIOMMU and it does not advertize the IOMMU flag, the > code will try to use the vIOMMU mappings and fail. > > It does look like even with trick above, you need a special version of > DMA ops that does just swiotlb but not any of the other things DMA API > might do. > > Thoughts?Yes, this is the purpose of Anshuman original patch (I haven't looked at the details of the patch in a while but that's what I told him to implement ;-) : - Make virtio always use DMA ops to simplify the code path (with a set of "transparent" ops for legacy) and - Provide an arch hook allowing us to "override" those "transparent" DMA ops with some custom ones that do the appropriate swiotlb gunk. Cheers, Ben.
On Thu, Aug 02, 2018 at 12:53:41PM -0500, Benjamin Herrenschmidt wrote:> On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote: > > > > I see. So yes, given that device does not know or care, using > > virtio features is an awkward fit. > > > > So let's say as a quick fix for you maybe we could generalize the > > xen_domain hack, instead of just checking xen_domain check some static > > branch. Then teach xen and others to enable that.> > > > OK but problem then becomes this: if you do this and virtio device appears > > behind a vIOMMU and it does not advertize the IOMMU flag, the > > code will try to use the vIOMMU mappings and fail. > > > > It does look like even with trick above, you need a special version of > > DMA ops that does just swiotlb but not any of the other things DMA API > > might do. > > > > Thoughts? > > Yes, this is the purpose of Anshuman original patch (I haven't looked > at the details of the patch in a while but that's what I told him to > implement ;-) : > > - Make virtio always use DMA ops to simplify the code path (with a set > of "transparent" ops for legacy) > > and > > - Provide an arch hook allowing us to "override" those "transparent" > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > Cheers, > Ben. >Right but as I tried to say doing that brings us to a bunch of issues with using DMA APIs in virtio. Put simply DMA APIs weren't designed for guest to hypervisor communication. When we do (as is the case with PLATFORM_IOMMU right now) this adds a bunch of overhead which we need to get rid of if we are to switch to PLATFORM_IOMMU by default. We need to fix that. -- MST
Benjamin Herrenschmidt
2018-Aug-02 21:13 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:> > Yes, this is the purpose of Anshuman original patch (I haven't looked > > at the details of the patch in a while but that's what I told him to > > implement ;-) : > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > of "transparent" ops for legacy) > > > > and > > > > - Provide an arch hook allowing us to "override" those "transparent" > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > Cheers, > > Ben. > > > > Right but as I tried to say doing that brings us to a bunch of issues > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > guest to hypervisor communication.I'm not sure I see the problem, see below> When we do (as is the case with PLATFORM_IOMMU right now) this adds a > bunch of overhead which we need to get rid of if we are to switch to > PLATFORM_IOMMU by default. We need to fix that.So let's differenciate the two problems of having an IOMMU (real or emulated) which indeeds adds overhead etc... and using the DMA API. At the moment, virtio does this all over the place: if (use_dma_api) dma_map/alloc_something(...) else use_pa The idea of the patch set is to do two, somewhat orthogonal, changes that together achieve what we want. Let me know where you think there is "a bunch of issues" because I'm missing it: 1- Replace the above if/else constructs with just calling the DMA API, and have virtio, at initialization, hookup its own dma_ops that just "return pa" (roughly) when the IOMMU stuff isn't used. This adds an indirect function call to the path that previously didn't have one (the else case above). Is that a significant/measurable overhead ? This change stands alone, and imho "cleans" up virtio by avoiding all that if/else "2 path" and unless it adds a measurable overhead, should probably be done. 2- Make virtio use the DMA API with our custom platform-provided swiotlb callbacks when needed, that is when not using IOMMU *and* running on a secure VM in our case. This benefits from -1- by making us just plumb in a different set of DMA ops we would have cooked up specifically for virtio in our arch code (or in virtio itself but build arch-conditionally in a separate file). But it doesn't strictly need it -1-: Now, -2- doesn't strictly needs -1-. We could have just done another xen-like hack that forces the DMA API "ON" for virtio when running in a secure VM. The problem if we do that however is that we also then need the arch PCI code to make sure it hooks up the virtio PCI devices with the special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, not the same as other PCI devices. So it will have to play games such as checking vendor/device IDs for virtio, checking the IOMMU flag, etc... from the arch code which really bloody sucks when assigning PCI DMA ops. However, if we do it the way we plan here, on top of -1-, with a hook called from virtio into the arch to "override" the virtio DMA ops, then we avoid the problem completely: The arch hook would only be called by virtio if the IOMMU flag is *not* set. IE only when using that special "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal PCI dma ops as usual. That way, we have a very clear semantic: This hook is purely about replacing those "null" DMA ops that just return PA introduced in -1- with some arch provided specially cooked up DMA ops for non-IOMMU virtio that know about the arch special requirements. For us bounce buffering. Is there something I'm missing ? Cheers, Ben.