Benjamin Herrenschmidt
2018-Aug-06 22:13 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:> On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > something here that is really what's the VM is doing to itself, which > > > > is to stash its memory away in an inaccessible place. > > > > > > > > Cheers, > > > > Ben. > > > > > > I think Christoph merely objects to the specific implementation. If > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > device I think he won't object. > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > So, something like that would be a possibility, but the problem is that > > the current virtio (guest side) implementation doesn't honor this when > > not using dma ops and will not use dma ops if not using iommu, so back > > to square one. > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > problematic itself IMHO, for now that RFC is blocked > by its perfromance overhead for now but Christoph says > he's trying to remove that for direct mappings, > so we should hopefully be able to get there in X weeks.That would be good yes. ../..> > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > *vdev) > > * the DMA API if we're a Xen guest, which at least allows > > * all of the sensible Xen configurations to work correctly. > > */ > > - if (xen_domain()) > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > return true; > > > > return false; > > Right but can't we fix the retpoline overhead such that > vring_use_dma_api will not be called on data path any longer, making > this a setup time check?Yes it needs to be a setup time check regardless actually ! The above is broken, sorry I was a bit quick here (too early in the morning... ugh). We don't want the arch to go override the dma ops every time that is callled. But yes, if we can fix the overhead, it becomes just a matter of setting up the "right" ops automatically.> > (Passing the dev allows the arch to know this is a virtio device in > > "direct" mode or whatever we want to call the !iommu case, and > > construct appropriate DMA ops for it, which aren't the same as the DMA > > ops of any other PCI device who *do* use the iommu). > > I think that's where Christoph might have specific ideas about it.OK well, assuming Christoph can solve the direct case in a way that also work for the virtio !iommu case, we still want some bit of logic somewhere that will "switch" to swiotlb based ops if the DMA mask is limited. You mentioned an RFC for that ? Do you happen to have a link ? It would be indeed ideal if all we had to do was setup some kind of bus_dma_mask on all PCI devices and have virtio automagically insert swiotlb when necessary. Cheers, Ben.
Benjamin Herrenschmidt
2018-Aug-06 23:16 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Tue, 2018-08-07 at 08:13 +1000, Benjamin Herrenschmidt wrote:> > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ? > > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary.Actually... I can think of a simpler option (Anshuman, didn't you prototype this earlier ?): Since that limitaiton of requiring bounce buffering via swiotlb is true of any device in a secure VM, whether it goes through the iommu or not, the iommu remapping is essentially pointless. Thus, we could ensure that the iommu maps 1:1 the swiotlb bounce buffer (either that or we configure it as "disabled" which is equivalent in this case). That way, we can now use the basic swiotlb ops everywhere, the same dma_ops (swiotlb) will work whether a device uses the iommu or not. Which boils down now to only making virtio use dma ops, there is no need to override the dma_ops. Which means all we have to do is either make xen_domain() return true (yuck) or replace that one test with arch_virtio_force_dma_api() which resolves to xen_domain() on x86 and can do something else for us. As to using a virtio feature flag for that, which is what Christoph proposes, I'm not too fan of it because this means effectively exposing this to the peer, ie the interface. I don't think it belong there. The interface, from the hypervisor perspective, whether it's qemu, vmware, hyperz etc... have no business knowing how the guest manages its dma operations, and may not even be aware of the access limitations (in our case they are somewhat guest self-imposed). Now, if this flag really is what we have to do, then we'd probably need a qemu hack which will go set that flag on all virtio devices when it detects that the VM is going secure. But I don't think that's where that information "need to use the dma API even for direct mode" belongs. Cheers, Ben.
On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:> On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > > something here that is really what's the VM is doing to itself, which > > > > > is to stash its memory away in an inaccessible place. > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > I think Christoph merely objects to the specific implementation. If > > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > > device I think he won't object. > > > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > > > So, something like that would be a possibility, but the problem is that > > > the current virtio (guest side) implementation doesn't honor this when > > > not using dma ops and will not use dma ops if not using iommu, so back > > > to square one. > > > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > > problematic itself IMHO, for now that RFC is blocked > > by its perfromance overhead for now but Christoph says > > he's trying to remove that for direct mappings, > > so we should hopefully be able to get there in X weeks. > > That would be good yes. > > ../.. > > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > > *vdev) > > > * the DMA API if we're a Xen guest, which at least allows > > > * all of the sensible Xen configurations to work correctly. > > > */ > > > - if (xen_domain()) > > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > > return true; > > > > > > return false; > > > > Right but can't we fix the retpoline overhead such that > > vring_use_dma_api will not be called on data path any longer, making > > this a setup time check? > > Yes it needs to be a setup time check regardless actually ! > > The above is broken, sorry I was a bit quick here (too early in the > morning... ugh). We don't want the arch to go override the dma ops > every time that is callled. > > But yes, if we can fix the overhead, it becomes just a matter of > setting up the "right" ops automatically. > > > > (Passing the dev allows the arch to know this is a virtio device in > > > "direct" mode or whatever we want to call the !iommu case, and > > > construct appropriate DMA ops for it, which aren't the same as the DMA > > > ops of any other PCI device who *do* use the iommu). > > > > I think that's where Christoph might have specific ideas about it. > > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ?No but Christoph did I think.> It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. > > Cheers, > Ben. >
Benjamin Herrenschmidt
2018-Aug-07 00:18 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Tue, 2018-08-07 at 02:45 +0300, Michael S. Tsirkin wrote:> > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think.Ok I missed that, sorry, I'll dig it out. Thanks. Cheers, Ben.
On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:> It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary.For 4.20 I plan to remove the swiotlb ops and instead do the bounce buffering in the common code, including a direct call to the direct ops to avoid retpoline overhead. For that you still need a flag in virtio that instead of blindly working physical addresses it needs to be treated like a real device in terms of DMA. And for powerpc to make use of that I need to get the dma series I posted last week reviewed and included, otherwise powerpc will have to be excepted (like arm, where rmk didn't like the way the code was factored, everything else has already been taken care of). https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html
On Tue, Aug 07, 2018 at 02:45:25AM +0300, Michael S. Tsirkin wrote:> > > I think that's where Christoph might have specific ideas about it. > > > > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think.Do you mean the direct map retpoline mitigation? It is here: https://www.spinics.net/lists/netdev/msg495413.html https://www.spinics.net/lists/netdev/msg495785.html
Benjamin Herrenschmidt
2018-Aug-07 06:44 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Mon, 2018-08-06 at 23:27 -0700, Christoph Hellwig wrote:> On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > > It would be indeed ideal if all we had to do was setup some kind of > > bus_dma_mask on all PCI devices and have virtio automagically insert > > swiotlb when necessary. > > For 4.20 I plan to remove the swiotlb ops and instead do the bounce > buffering in the common code, including a direct call to the direct > ops to avoid retpoline overhead. For that you still need a flag > in virtio that instead of blindly working physical addresses it needs > to be treated like a real device in terms of DMA.But you will still call the swiotlb infrastructure, right ? IE, I sitll need to control where/how the swiotlb "pool" is allocated.> > And for powerpc to make use of that I need to get the dma series I > posted last week reviewed and included, otherwise powerpc will have > to be excepted (like arm, where rmk didn't like the way the code > was factored, everything else has already been taken care of). > > https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.htmlYes, I saw your series. I'm just back from a week of travel, I plan to start reviewing it this week if Michael doesn't beat me to it. Cheers, Ben.