Benjamin Herrenschmidt
2018-Aug-06 19:56 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:> > > Right, we'll need some quirk to disable balloons in the guest I > > suppose. > > > > Passing something from libvirt is cumbersome because the end user may > > not even need to know about secure VMs. There are use cases where the > > security is a contract down to some special application running inside > > the secure VM, the sysadmin knows nothing about. > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > so it's fairly wide ranging. > > > > So as long as we only need to quirk a couple of devices, it's much > > better contained that way. > > So just the balloon thing already means that yes management and all the > way to the user tools must know this is going on. Otherwise > user will try to inflate the balloon and wonder why this does not work.There is *dozens* of management systems out there, not even all open source, we won't ever be able to see the end of the tunnel if we need to teach every single of them, including end users, about platform specific new VM flags like that. .../...> Here's another example: you can't migrate a secure vm to hypervisor > which doesn't support this feature. Again management tools above libvirt > need to know otherwise they will try.There will have to be a new machine type for that I suppose, yes, though it's not just the hypervisor that needs to know about the modified migration stream, it's also the need to have a compatible ultravisor with the right keys on the other side. So migration is going to be special and require extra admin work in all cases yes. But not all secure VMs are meant to be migratable. In any case, back to the problem at hand. What a qemu flag gives us is just a way to force iommu at VM creation time. This is rather sub-optimal, we don't really want the iommu in the way, so it's at best a "workaround", and it's not really solving the real problem. As I said replying to Christoph, we are "leaking" into the interface something here that is really what's the VM is doing to itself, which is to stash its memory away in an inaccessible place. Cheers, Ben.
On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:> On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > suppose. > > > > > > Passing something from libvirt is cumbersome because the end user may > > > not even need to know about secure VMs. There are use cases where the > > > security is a contract down to some special application running inside > > > the secure VM, the sysadmin knows nothing about. > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > so it's fairly wide ranging. > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > better contained that way. > > > > So just the balloon thing already means that yes management and all the > > way to the user tools must know this is going on. Otherwise > > user will try to inflate the balloon and wonder why this does not work. > > There is *dozens* of management systems out there, not even all open > source, we won't ever be able to see the end of the tunnel if we need > to teach every single of them, including end users, about platform > specific new VM flags like that. > > .../...In the end I suspect you will find you have to.> > Here's another example: you can't migrate a secure vm to hypervisor > > which doesn't support this feature. Again management tools above libvirt > > need to know otherwise they will try. > > There will have to be a new machine type for that I suppose, yes, > though it's not just the hypervisor that needs to know about the > modified migration stream, it's also the need to have a compatible > ultravisor with the right keys on the other side. > > So migration is going to be special and require extra admin work in all > cases yes. But not all secure VMs are meant to be migratable. > > In any case, back to the problem at hand. What a qemu flag gives us is > just a way to force iommu at VM creation time.I don't think a qemu flag is strictly required for a problem at hand.> This is rather sub-optimal, we don't really want the iommu in the way, > so it's at best a "workaround", and it's not really solving the real > problem.This specific problem, I think I agree.> As I said replying to Christoph, we are "leaking" into the interface > something here that is really what's the VM is doing to itself, which > is to stash its memory away in an inaccessible place. > > Cheers, > Ben.I think Christoph merely objects to the specific implementation. If instead you do something like tweak dev->bus_dma_mask for the virtio device I think he won't object. -- MST
Benjamin Herrenschmidt
2018-Aug-06 21:26 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:> > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object.Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? So, something like that would be a possibility, but the problem is that the current virtio (guest side) implementation doesn't honor this when not using dma ops and will not use dma ops if not using iommu, so back to square one. Christoph seems to be wanting to use a flag in the interface to make the guest use dma_ops which is what I don't understand. What would be needed then would be something along the lines of virtio noticing that dma_mask isn't big enough to cover all of memory (which isn't something generic code can easily do here for various reasons I can elaborate if you want, but that specific test more/less has to be arch specific), and in that case, force itself to use DMA ops routed to swiotlb. I'd rather have arch code do the bulk of that work, don't you think ? Which brings me back to this option, which may be the simplest and avoids the overhead of the proposed series (I found the series to be a nice cleanup but retpoline does kick us in the nuts here). So what about this ? --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device *vdev) * the DMA API if we're a Xen guest, which at least allows * all of the sensible Xen configurations to work correctly. */ - if (xen_domain()) + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) return true; return false; (Passing the dev allows the arch to know this is a virtio device in "direct" mode or whatever we want to call the !iommu case, and construct appropriate DMA ops for it, which aren't the same as the DMA ops of any other PCI device who *do* use the iommu). Otherwise, the harder option would be for us to hack so that xen_domain() returns true in our setup (gross), and have the arch code, when it sets up PCI device DMA ops, have a gross hack to identify virtio PCI devices, checks their F_IOMMU flag itself, and sets up the different ops at that point. As for those "special" ops, they are of course just normal swiotlb ops, there's nothing "special" other that they aren't the ops that other PCI device on that bus use. Cheers, Ben.
Benjamin Herrenschmidt
2018-Aug-06 23:18 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:> On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > > suppose. > > > > > > > > Passing something from libvirt is cumbersome because the end user may > > > > not even need to know about secure VMs. There are use cases where the > > > > security is a contract down to some special application running inside > > > > the secure VM, the sysadmin knows nothing about. > > > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > > so it's fairly wide ranging. > > > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > > better contained that way. > > > > > > So just the balloon thing already means that yes management and all the > > > way to the user tools must know this is going on. Otherwise > > > user will try to inflate the balloon and wonder why this does not work. > > > > There is *dozens* of management systems out there, not even all open > > source, we won't ever be able to see the end of the tunnel if we need > > to teach every single of them, including end users, about platform > > specific new VM flags like that. > > > > .../... > > In the end I suspect you will find you have to.Maybe... we'll tackle this if/when we have to. For balloon I suspect it's not such a big deal because once secure, all the guest memory goes into the secure memory which isn't visible or accounted by the hypervisor, so there's nothing to steal but the guest is also using no HV memory (other than the few "non-secure" pages used for swiotlb and a couple of other kernel things). Future versions of our secure architecture might allow to turn arbitrary pages of memory secure/non-secure rather than relying on a separate physical pool, in which case, the balloon will be able to work normally. Cheers, Ben.
On Mon, Aug 06, 2018 at 11:35:39PM +0300, Michael S. Tsirkin wrote:> > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object.As long as we also document how dev->bus_dma_mask is tweaked for this particular virtual bus, yes.