thr3ads.net - Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2018-Aug-04 01:16 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin
wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom
platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU
*and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We
need
> > > a flag passed in from the hypervisor that the device needs all
bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to
provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.
Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.
> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?
The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.
> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for
"DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the
"secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> >

Michael S. Tsirkin

2018-Aug-05 00:22 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt
wrote:> On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt
wrote:
> > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > > >   2- Make virtio use the DMA API with our custom
platform-provided
> > > > > swiotlb callbacks when needed, that is when not using
IOMMU *and*
> > > > > running on a secure VM in our case.
> > > > 
> > > > And total NAK the customer platform-provided part of this. 
We need
> > > > a flag passed in from the hypervisor that the device needs
all bus
> > > > specific dma api treatment, and then just use the normal
plaform
> > > > dma mapping setup. 
> > > 
> > > Christoph, as I have explained already, we do NOT have a way to
provide
> > > such a flag as neither the hypervisor nor qemu knows anything
about
> > > this when the VM is created.
> > 
> > I think the fact you can't add flags from the hypervisor is
> > a sign of a problematic architecture, you should look at
> > adding that down the road - you will likely need it at some point.
> 
> Well, we can later in the boot process. At VM creation time, it's just
> a normal VM. The VM firmware, bootloader etc... are just operating
> normally etc...
I see the allure of this, but I think down the road you will
discover passing a flag in libvirt XML saying
"please use a secure mode" or whatever is a good idea.

Even thought it is probably not required to address this
specific issue.

For example, I don't think ballooning works in secure mode,
you will be able to teach libvirt not to try to add a
balloon to the guest.
> Later on, (we may have even already run Linux at that point,
> unsecurely, as we can use Linux as a bootloader under some
> circumstances), we start a "secure image".
> 
> This is a kernel zImage that includes a "ticket" that has the
> appropriate signature etc... so that when that kernel starts, it can
> authenticate with the ultravisor, be verified (along with its ramdisk)
> etc... and copied (by the UV) into secure memory & run from there.
> 
> At that point, the hypervisor is informed that the VM has become
> secure.
> 
> So at that point, we could exit to qemu to inform it of the change,
That's probably a good idea too.
> and
> have it walk the qtree and "Switch" all the virtio devices to use
the
> IOMMU I suppose, but it feels a lot grosser to me.
That part feels gross, yes.
> That's the only other option I can think of.
> 
> > However in this specific case, the flag does not need to come from the
> > hypervisor, it can be set by arch boot code I think.
> > Christoph do you see a problem with that?
> 
> The above could do that yes. Another approach would be to do it from a
> small virtio "quirk" that pokes a bit in the device to force it
to
> iommu mode when it detects that we are running in a secure VM. That's a
> bit warty on the virito side but probably not as much as having a qemu
> one that walks of the virtio devices to change how they behave.
> 
> What do you reckon ?
I think you are right that for the dma limit the hypervisor doesn't seem
to need to know.
> What we want to avoid is to expose any of this to the *end user* or
> libvirt or any other higher level of the management stack. We really
> want that stuff to remain contained between the VM itself, KVM and
> maybe qemu.
>
> We will need some other qemu changes for migration so that's ok. But
> the minute you start touching libvirt and the higher levels it becomes
> a nightmare.
> 
> Cheers,
> Ben.
I don't believe you'll be able to avoid that entirely. The split between
libvirt and qemu is more about community than about code, random bits of
functionality tend to land on random sides of that fence.  Better add a
tag in domain XML early is my advice. Having said that, it's your
hypervisor. I'm just suggesting that when hypervisor does somehow need
to care then I suspect most people won't be receptive to the argument
that changing libvirt is a nightmare.
> > > >  To get swiotlb you'll need to then use the DT/ACPI
> > > > dma-range property to limit the addressable range, and a
swiotlb
> > > > capable plaform will use swiotlb automatically.
> > > 
> > > This cannot be done as you describe it.
> > > 
> > > The VM is created as a *normal* VM. The DT stuff is generated by
qemu
> > > at a point where it has *no idea* that the VM will later become
secure
> > > and thus will have to restrict which pages can be used for
"DMA".
> > > 
> > > The VM will *at runtime* turn itself into a secure VM via
interactions
> > > with the security HW and the Ultravisor layer (which sits below
the
> > > HV). This happens way after the DT has been created and consumed,
the
> > > qemu devices instanciated etc...
> > > 
> > > Only the guest kernel knows because it initates the transition.
When
> > > that happens, the virtio devices have already been used by the
guest
> > > firmware, bootloader, possibly another kernel that kexeced the
"secure"
> > > one, etc... 
> > > 
> > > So instead of running around saying NAK NAK NAK, please explain
how we
> > > can solve that differently.
> > > 
> > > Ben.
> > >

Benjamin Herrenschmidt

2018-Aug-05 04:52 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin
wrote:> I see the allure of this, but I think down the road you will
> discover passing a flag in libvirt XML saying
> "please use a secure mode" or whatever is a good idea.
> 
> Even thought it is probably not required to address this
> specific issue.
> 
> For example, I don't think ballooning works in secure mode,
> you will be able to teach libvirt not to try to add a
> balloon to the guest.
Right, we'll need some quirk to disable balloons  in the guest I
suppose.

Passing something from libvirt is cumbersome because the end user may
not even need to know about secure VMs. There are use cases where the
security is a contract down to some special application running inside
the secure VM, the sysadmin knows nothing about.

Also there's repercussions all the way to admin tools, web UIs etc...
so it's fairly wide ranging.

So as long as we only need to quirk a couple of devices, it's much
better contained that way.
> > Later on, (we may have even already run Linux at that point,
> > unsecurely, as we can use Linux as a bootloader under some
> > circumstances), we start a "secure image".
> > 
> > This is a kernel zImage that includes a "ticket" that has
the
> > appropriate signature etc... so that when that kernel starts, it can
> > authenticate with the ultravisor, be verified (along with its ramdisk)
> > etc... and copied (by the UV) into secure memory & run from there.
> > 
> > At that point, the hypervisor is informed that the VM has become
> > secure.
> > 
> > So at that point, we could exit to qemu to inform it of the change,
> 
> That's probably a good idea too.
We probably will have to tell qemu eventually for migration, as we'll
need some kind of key exchange phase etc... to deal with the crypto
aspects (the actual page copy is sorted via encrypting the secure pages
back to normal pages in qemu, but we'll need extra metadata).
> > and
> > have it walk the qtree and "Switch" all the virtio devices
to use the
> > IOMMU I suppose, but it feels a lot grosser to me.
> 
> That part feels gross, yes.
> 
> > That's the only other option I can think of.
> > 
> > > However in this specific case, the flag does not need to come
from the
> > > hypervisor, it can be set by arch boot code I think.
> > > Christoph do you see a problem with that?
> > 
> > The above could do that yes. Another approach would be to do it from a
> > small virtio "quirk" that pokes a bit in the device to force
it to
> > iommu mode when it detects that we are running in a secure VM.
That's a
> > bit warty on the virito side but probably not as much as having a qemu
> > one that walks of the virtio devices to change how they behave.
> > 
> > What do you reckon ?
> 
> I think you are right that for the dma limit the hypervisor doesn't
seem
> to need to know.
It's not just a limit mind you. It's a range, at least if we allocate
just a single pool of insecure pages. swiotlb feels like a better
option for us.
> > What we want to avoid is to expose any of this to the *end user* or
> > libvirt or any other higher level of the management stack. We really
> > want that stuff to remain contained between the VM itself, KVM and
> > maybe qemu.
> > 
> > We will need some other qemu changes for migration so that's ok.
But
> > the minute you start touching libvirt and the higher levels it becomes
> > a nightmare.
> > 
> > Cheers,
> > Ben.
> 
> I don't believe you'll be able to avoid that entirely. The split
between
> libvirt and qemu is more about community than about code, random bits of
> functionality tend to land on random sides of that fence.  Better add a
> tag in domain XML early is my advice. Having said that, it's your
> hypervisor. I'm just suggesting that when hypervisor does somehow need
> to care then I suspect most people won't be receptive to the argument
> that changing libvirt is a nightmare.
It only needs to care at runtime. The problem isn't changing libvirt
per-se, I don't have a problem with that. The problem is that it means
creating two categories of machines "secure" and
"non-secure", which is
end-user visible, and thus has to be escalated to all the various
management stacks, UIs, etc... out there.

In addition, there are some cases where the individual creating the VMs
may not have any idea that they are secure.

But yes, if we have to, we'll do it. However, so far, we don't think
it's a great idea.

Cheers,
Ben.
> > > > >   To get swiotlb you'll need to then use the
DT/ACPI
> > > > > dma-range property to limit the addressable range, and
a swiotlb
> > > > > capable plaform will use swiotlb automatically.
> > > > 
> > > > This cannot be done as you describe it.
> > > > 
> > > > The VM is created as a *normal* VM. The DT stuff is
generated by qemu
> > > > at a point where it has *no idea* that the VM will later
become secure
> > > > and thus will have to restrict which pages can be used for
"DMA".
> > > > 
> > > > The VM will *at runtime* turn itself into a secure VM via
interactions
> > > > with the security HW and the Ultravisor layer (which sits
below the
> > > > HV). This happens way after the DT has been created and
consumed, the
> > > > qemu devices instanciated etc...
> > > > 
> > > > Only the guest kernel knows because it initates the
transition. When
> > > > that happens, the virtio devices have already been used by
the guest
> > > > firmware, bootloader, possibly another kernel that kexeced
the "secure"
> > > > one, etc... 
> > > > 
> > > > So instead of running around saying NAK NAK NAK, please
explain how we
> > > > can solve that differently.
> > > > 
> > > > Ben.

Reasonably Related Threads

Search for more reasonably related threads

Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Reasonably Related Threads