Benjamin Herrenschmidt
2018-Aug-03 15:58 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:> > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup.Christoph, as I have explained already, we do NOT have a way to provide such a flag as neither the hypervisor nor qemu knows anything about this when the VM is created.> To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically.This cannot be done as you describe it. The VM is created as a *normal* VM. The DT stuff is generated by qemu at a point where it has *no idea* that the VM will later become secure and thus will have to restrict which pages can be used for "DMA". The VM will *at runtime* turn itself into a secure VM via interactions with the security HW and the Ultravisor layer (which sits below the HV). This happens way after the DT has been created and consumed, the qemu devices instanciated etc... Only the guest kernel knows because it initates the transition. When that happens, the virtio devices have already been used by the guest firmware, bootloader, possibly another kernel that kexeced the "secure" one, etc... So instead of running around saying NAK NAK NAK, please explain how we can solve that differently. Ben.
On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created.Well, if your setup is so fucked up I see no way to support it in Linux. Let's end the discussion right now then.
Benjamin Herrenschmidt
2018-Aug-03 18:58 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > Well, if your setup is so fucked up I see no way to support it in Linux. > > Let's end the discussion right now then.You are saying something along the lines of "I don't like an instruction in your ISA, let's not support your entire CPU architecture in Linux". Our setup is not fucked. It makes a LOT of sense and it's a very sensible design. It's hitting a problem due to a corner case oddity in virtio bypassing the MMU, we've worked around such corner cases many times in the past without any problem, I fail to see what the problem is here. We aren't going to cancel years of HW and SW development for our security infrastructure bcs you don't like a 2 lines hook into virtio to make things work and aren't willing to even consider the options. Ben.
On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created.I think the fact you can't add flags from the hypervisor is a sign of a problematic architecture, you should look at adding that down the road - you will likely need it at some point. However in this specific case, the flag does not need to come from the hypervisor, it can be set by arch boot code I think. Christoph do you see a problem with that?> > To get swiotlb you'll need to then use the DT/ACPI > > dma-range property to limit the addressable range, and a swiotlb > > capable plaform will use swiotlb automatically. > > This cannot be done as you describe it. > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > at a point where it has *no idea* that the VM will later become secure > and thus will have to restrict which pages can be used for "DMA". > > The VM will *at runtime* turn itself into a secure VM via interactions > with the security HW and the Ultravisor layer (which sits below the > HV). This happens way after the DT has been created and consumed, the > qemu devices instanciated etc... > > Only the guest kernel knows because it initates the transition. When > that happens, the virtio devices have already been used by the guest > firmware, bootloader, possibly another kernel that kexeced the "secure" > one, etc... > > So instead of running around saying NAK NAK NAK, please explain how we > can solve that differently. > > Ben. >
Benjamin Herrenschmidt
2018-Aug-04 01:11 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point.Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of.> However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that?The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? Cheers, Ben.> > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > >
Benjamin Herrenschmidt
2018-Aug-04 01:16 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point.Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of.> However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that?The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben.> > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > >
Benjamin Herrenschmidt
2018-Aug-04 01:18 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point.Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of.> However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that?The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben.> > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > >
Benjamin Herrenschmidt
2018-Aug-04 01:22 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point.(Appologies if you got this twice, my mailer had a brain fart and I don't know if the first one got through & am about to disappear in a plane for 17h) Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of.> However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that?The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben.> > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > >