thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2018-Aug-03 18:58 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig
wrote:> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom
platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU
*and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We
need
> > > a flag passed in from the hypervisor that the device needs all
bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to
provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> Well, if your setup is so fucked up I see no way to support it in Linux.
> 
> Let's end the discussion right now then.
You are saying something along the lines of "I don't like an
instruction in your ISA, let's not support your entire CPU architecture
in Linux".

Our setup is not fucked. It makes a LOT of sense and it's a very
sensible design. It's hitting a problem due to a corner case oddity in
virtio bypassing the MMU, we've worked around such corner cases many
times in the past without any problem, I fail to see what the problem
is here.

We aren't going to cancel years of HW and SW development for our
security infrastructure bcs you don't like a 2 lines hook into virtio
to make things work and aren't willing to even consider the options.

Ben.

Christoph Hellwig

2018-Aug-04 08:21 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Fri, Aug 03, 2018 at 01:58:46PM -0500, Benjamin Herrenschmidt
wrote:> You are saying something along the lines of "I don't like an
> instruction in your ISA, let's not support your entire CPU architecture
> in Linux".
No.  I'm saying if you can't describe your architecture in the virtio
spec document it is bogus.
> Our setup is not fucked. It makes a LOT of sense and it's a very
> sensible design. It's hitting a problem due to a corner case oddity in
> virtio bypassing the MMU, we've worked around such corner cases many
> times in the past without any problem, I fail to see what the problem
> is here.
No matter if you like it or not (I don't!) virtio is defined to bypass
dma translations, it is very clearly stated in the spec.  It has some
ill-defined bits to bypass it, so if you want the dma mapping API
to be used you'll have to set that bit (in its original form, a refined
form, or an entirely newly defined sane form) and make sure your
hypersivors always sets it.  It's not rocket science, just a little bit
for work to make sure your setup is actually going to work reliably
and portably.
> We aren't going to cancel years of HW and SW development for our
Maybe you should have actually read the specs you are claiming to
implemented before spending all that effort.

Benjamin Herrenschmidt

2018-Aug-05 01:10 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Sat, 2018-08-04 at 01:21 -0700, Christoph Hellwig
wrote:> No matter if you like it or not (I don't!) virtio is defined to bypass
> dma translations, it is very clearly stated in the spec.  It has some
> ill-defined bits to bypass it, so if you want the dma mapping API
> to be used you'll have to set that bit (in its original form, a refined
> form, or an entirely newly defined sane form) and make sure your
> hypersivors always sets it.  It's not rocket science, just a little bit
> for work to make sure your setup is actually going to work reliably
> and portably.
I think you are conflating completely different things, let me try to
clarify, we might actually be talking past each other.
> > We aren't going to cancel years of HW and SW development for our
> 
> Maybe you should have actually read the specs you are claiming to
> implemented before spending all that effort.
Anyway, let's cool our respective jets and sort that out, there are
indeed other approaches than overriding the DMA ops with special ones,
though I find them less tasty ... but here's my attempt at a (simpler)
description.

Bear with me for the long-ish email, this tries to describe the system
so you get an idea where we come from, and options we can use to get
out of this.

So we *are* implementing the spec, since qemu is currently unmodified:

Default virtio will bypass the iommu emulated by qemu as per spec etc..

On the Linux side, thus, virtio "sees" a normal iommu-bypassing device
and will treat it as such.

The problem is the assumption in the middle that qemu can access all
guest pages directly, which holds true for traditional VMs, but breaks
when the VM in our case turns itself into a secure VM. This isn't under
the action (or due to changes in) the hypervisor. KVM operates (almost)
normally here.

But there's this (very thin and open source btw) layer underneath
called ultravisor, which exploits some HW facilities to maintain a
separate pool of "secure" memory, which cannot be physically accessed
by a non-secure entity.

So in our scenario, qemu and KVM create a VM totally normally, there is
no changes required to the VM firmware, bootloader(s), etc... in fact
we support Linux based bootloaders, and those will work as normal linux
would in a VM, virtio works normally, etc...

Until that VM (via grub or kexec for example) loads a "secure image".

That secure image is a Linux kernel which has been "wrapped" (to
simply
imagine a modified zImage wrapper though that's not entirely exact).

When that is run, before it modifies it's .data, it will interact with
the ultravisor using a specific HW facility to make itself secure. What
happens then is that the UV cryptographically verifies the kernel and
ramdisk, and copies them to the secure memory where execution returns.

The Ultravisor is then involved as a small shim for hypercalls between
the secure VM and KVM to prevent leakage of information (sanitize
registers etc...).

Now at this point, qemu can no longer access the secure VM pages
(there's more to this, such as using HMM to allow migration/encryption
accross etc... but let's not get bogged down).

So virtio can no longer access any page in the VM.

Now the VM *can* request from the Ultravisor some selected pages to be
made "insecure" and thus shared with qemu. This is how we handle some
of the pages used in our paravirt stuff, and that's how we want to deal
with virtio, by creating an insecure swiotlb pool.

At this point, thus, there are two options.

 - One you have rejected, which is to have a way for "no-iommu" virtio
(which still doesn't use an iommu on the qemu side and doesn't need
to), to be forced to use some custom DMA ops on the VM side.

 - One, which sadly has more overhead and will require modifying more
pieces of the puzzle, which is to make qemu uses an emulated iommu.
Once we make qemu do that, we can then layer swiotlb on top of the
emulated iommu on the guest side, and pass that as dma_ops to virtio.

Now, assuming you still absolutely want us to go down the second
option, there are several ways to get there. We would prefer to avoid
requiring the user to pass some special option to qemu. That has an
impact up the food chain (libvirt, management tools etc...) and users
probably won't understand what it's about. In fact the *end user* might
not even need to know a VM is secure, though applications inside might.

There's the additional annoyance that currently our guest FW (SLOF)
cannot deal with virtio in IOMMU mode, but that's fixable.
>From there, refer to the email chain between Michael and I where we arediscussing options to "switch" virtio at runtime on the qemu side.

Any comment or suggestion ?

Cheers,
Ben.

Maybe Matching Threads

Search for more maybe matching threads

Linux Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Maybe Matching Threads