thr3ads.net - Linux Virtualization - [PATCH v3 0/3] virtio DMA API core stuff [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2015-Oct-29 09:01 UTC

[PATCH v3 0/3] virtio DMA API core stuff

On Wed, Oct 28, 2015 at 03:51:58PM -0700, Andy Lutomirski
wrote:> On Wed, Oct 28, 2015 at 9:12 AM, Michael S. Tsirkin <mst at
redhat.com> wrote:
> > On Wed, Oct 28, 2015 at 11:32:34PM +0900, David Woodhouse wrote:
> >> > I don't have a problem with extending DMA API to address
> >> > more usecases.
> >>
> >> No, this isn't an extension. This is fixing a bug, on certain
platforms
> >> where the DMA API has currently done the wrong thing.
> >>
> >> We have historically worked around that bug by introducing
*another*
> >> bug, which is not to *use* the DMA API in the virtio driver.
> >>
> >> Sure, we can co-ordinate those two bug-fixes. But let's not
talk about
> >> them as anything other than bug-fixes.
> >
> > It was pretty practical not to use it. All virtio devices at the time
> > without exception bypassed the IOMMU, so it was a question of omitting
a
> > couple of function calls in virtio versus hacking on DMA
implementation
> > on multiple platforms. We have more policy options now, so I agree
it's
> > time to revisit this.
> >
> > But for me, the most important thing is that we do coordinate.
> >
> >> > > Drivers use DMA API. No more talky.
> >> >
> >> > Well for virtio they don't ATM. And 1:1 mapping makes
perfect sense
> >> > for the wast majority of users, so I can't switch them
over
> >> > until the DMA API actually addresses all existing usecases.
> >>
> >> That's still not your business; it's the platform's.
And there are
> >> hardware implementations of the virtio protocols on real PCI
cards. And
> >> we have the option of doing IOMMU translation for the virtio
devices
> >> even in a virtual machine. Just don't get involved.
> >>
> >> --
> >> dwmw2
> >>
> >>
> >
> > I'm involved anyway, it's possible not to put all the code in
the virtio
> > subsystem in guest though.  But I suspect we'll need to find a way
for
> > non-linux drivers within guest to work correctly too, and they might
> > have trouble poking at things at the system level.  So possibly virtio
> > subsystem will have to tell platform "this device wants to bypass
IOMMU"
> > and then DMA API does the right thing.
> >
> 
> After some discussion at KS, no one came up with an example where it's
> necessary, and the patches to convert virtqueue to use the DMA API are
> much nicer when they convert it unconditionally.
It's very surprising no one couldn't.  I did above, I try again below.
Note: below discusses configuration *within guest*.

Example: you have a mix of assigned devices and virtio devices. You
don't trust your assigned device vendor not to corrupt your memory so
you want to limit the damage your assigned device can do to your guest,
so you use an IOMMU for that.  Thus existing iommu=pt within guest is out.

But you trust your hypervisor (you have no choice anyway),
and you don't want the overhead of tweaking IOMMU
on data path for virtio. Thus iommu=on is out too.


> The two interesting cases we thought of were PPC and x86's emulated
> Q35 IOMMU.  PPC will look in to architecting a devicetree-based way to
> indicate passthrough status and will add quirks for the existing
> virtio devices.
Isn't this specified by the hypervisor? I don't think this is a good way
to do this: guest security should be up to guest.
> Everyone seems to agree that x86's emulated Q35 thing
> is just buggy right now and should be taught to use the existing ACPI
> mechanism for enumerating passthrough devices.
I'm not sure what ACPI has to do with it.
It's about a way for guest users to specify whether
they want to bypass an IOMMU for a given device.
> I'll send a new version of the series soon.
> 
> --Andy
By the way, a bunch of code is missing on the QEMU side
to make this useful:
1. virtio ignores the iommu
2. vhost user ignores the iommu
3. dataplane ignores the iommu
4. vhost-net ignores the iommu
5. VFIO ignores the iommu

I think so far I only saw patches for 1 above.

-- 
MST

David Woodhouse

2015-Oct-29 16:18 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin
wrote:> 
> Example: you have a mix of assigned devices and virtio devices. You
> don't trust your assigned device vendor not to corrupt your memory so
> you want to limit the damage your assigned device can do to your
> guest,
> so you use an IOMMU for that.  Thus existing iommu=pt within guest is
> out.
> 
> But you trust your hypervisor (you have no choice anyway),
> and you don't want the overhead of tweaking IOMMU
> on data path for virtio. Thus iommu=on is out too.
That's not at all special for virtio or guest VMs. Even with real
hardware, we might want performance from *some* devices, and security
from others. See the DMA_ATTR_IOMMU_BYPASS which is currently being
discussed.

But of course the easy answer in *your* case it just to ask the
hypervisor not to put the virtio devices behind an IOMMU at all. Which
we were planning to remain the default behaviour.

In all cases, the DMA API shall do the right thing.

-- 
dwmw2


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5691 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20151029/4bd0fecc/attachment.bin>

Joerg Roedel

2015-Oct-30 15:16 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin
wrote:> Example: you have a mix of assigned devices and virtio devices. You
> don't trust your assigned device vendor not to corrupt your memory so
> you want to limit the damage your assigned device can do to your guest,
> so you use an IOMMU for that.  Thus existing iommu=pt within guest is out.
> 
> But you trust your hypervisor (you have no choice anyway),
> and you don't want the overhead of tweaking IOMMU
> on data path for virtio. Thus iommu=on is out too.
IOMMUs on x86 usually come with an ACPI table that describes which
IOMMUs are in the system and which devices they translate. So you can
easily describe all devices there that are not behind an IOMMU.

The ACPI table is built by the BIOS, and the platform intialization code
sets the device dma_ops accordingly. If the BIOS provides wrong
information in the ACPI table this is a platform bug.
> I'm not sure what ACPI has to do with it.  It's about a way for
guest
> users to specify whether they want to bypass an IOMMU for a given
> device.
We have no way yet to request passthrough-mode per-device from the IOMMU
drivers, but that can easily be added. But as I see it:
> By the way, a bunch of code is missing on the QEMU side
> to make this useful:
> 1. virtio ignores the iommu
> 2. vhost user ignores the iommu
> 3. dataplane ignores the iommu
> 4. vhost-net ignores the iommu
> 5. VFIO ignores the iommu
Qemu does not implement IOMMU translation for virtio devices anyway
(which is fine), so it just should tell the guest so in the ACPI table
built to describe the emulated IOMMU.


	Joerg

David Woodhouse

2015-Oct-30 16:54 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

(Sorry, missed part of this before).

On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin
wrote:> Isn't this specified by the hypervisor? I don't think this is a
good
> way to do this: guest security should be up to guest.
And it is. When the guest sees an IOMMU, it can choose to use it, or
choose not to (or choose to put it in passthrough mode). But as J?rg
says, we don't have a way for an individual  device driver to *request*
passthrough mode or not yet; the choice is made by the core IOMMU code
(iommu=pt on the command line) ? or by the platform simply stating that
a given device isn't *covered* by an IOMMU, if that is indeed the case.

In *no* circumstance is it sane for a device driver just to "opt out"
of using the correct DMA API function calls, and expect that to
*magically* cause the IOMMU to be bypassed.
> > Everyone seems to agree that x86's emulated Q35 thing
> > is just buggy right now and should be taught to use the existing ACPI
> > mechanism for enumerating passthrough devices.
> 
> I'm not sure what ACPI has to do with it.
> It's about a way for guest users to specify whether
> they want to bypass an IOMMU for a given device.
No, it absolutely isn't. You might want that ? and see the discussion
about DMA_ATTR_IOMMU_BYPASS if you do. But that is *utterly* irrelevant
to *this* discussion, in which you seem to be advocating that the
virtio drivers should remain buggy by just unilaterally not using the
DMA API.
> By the way, a bunch of code is missing on the QEMU side
> to make this useful:
> 1. virtio ignores the iommu
> 2. vhost user ignores the iommu
> 3. dataplane ignores the iommu
> 4. vhost-net ignores the iommu
> 5. VFIO ignores the iommu
No, those things are not useful for fixing the virtio driver bug under
discussion here. All we need to do is make the virtio drivers correctly
use the DMA API. They should never have passed review and been accepted
into the Linux kernel without that.

All we need to do first is make sure that the bug we have in the
PowerPC IOMMU code (and potentially ARM and/or SPARC?) is fixed, and
that it doesn't attempt to use an IOMMU that doesn't exist. And ensure
that the virtualised IOMMU on qemu/x86 isn't lying and claiming that it
translates for the virtio devices when it doesn't.

There are other things we might want to do ? like fixing the IOMMU that
qemu can emulate, and actually making it work with real assigned
devices (currently it's totally hosed because it doesn't handle that
case at all). And potentially making the virtualised IOMMU actually
*do* translation for virtio devices (as opposed to just admitting
correctly that it doesn't). But those aren't strictly relevant here,
yet.

It's not clear what specific uses of the IOMMU you had in mind in your
above list ? could you elucidate?

-- 
dwmw2

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5691 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20151030/af051d16/attachment-0001.bin>

Paolo Bonzini

2015-Nov-03 10:24 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

On 29/10/2015 10:01, Michael S. Tsirkin wrote:> > Everyone seems to agree that x86's emulated Q35 thing
> > is just buggy right now and should be taught to use the existing ACPI
> > mechanism for enumerating passthrough devices.
> 
> I'm not sure what ACPI has to do with it.
> It's about a way for guest users to specify whether
> they want to bypass an IOMMU for a given device.
It's not configured in the guest, it's configured _when starting_ the
guest (e.g. -device some-pci-device,iommu-bypass=on) and it is reflected
in the DMAR table or the device tree.

The default for virtio and VFIO is to bypass the IOMMU.  Changing the
default can be supported (virtio) or not (VFIO, vhost-user).  Hotplug
need to check whether the parent bridge is has the same setting that the
user desires for the new device.
> 1. virtio ignores the iommu
> 2. vhost user ignores the iommu
> 3. dataplane ignores the iommu
> 4. vhost-net ignores the iommu
> 5. VFIO ignores the iommu
> 
> I think so far I only saw patches for 1 above.
1 and 3 are easy.  For 2 and 5 you can simply forbid configurations with
vhost-user/VFIO behind an IOMMU.  For 4 QEMU can simply not activate
vhost-net and use the userspace fallback.

However, IOMMU support in QEMU is experimental.  We can do things a step
at a time.

Paolo

Michael S. Tsirkin

2015-Nov-08 10:37 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

On Thu, Oct 29, 2015 at 05:18:56PM +0100, David Woodhouse
wrote:> On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin wrote:
> > 
> > Example: you have a mix of assigned devices and virtio devices. You
> > don't trust your assigned device vendor not to corrupt your memory
so
> > you want to limit the damage your assigned device can do to your
> > guest,
> > so you use an IOMMU for that.  Thus existing iommu=pt within guest is
> > out.
> > 
> > But you trust your hypervisor (you have no choice anyway),
> > and you don't want the overhead of tweaking IOMMU
> > on data path for virtio. Thus iommu=on is out too.
> 
> That's not at all special for virtio or guest VMs. Even with real
> hardware, we might want performance from *some* devices, and security
> from others. See the DMA_ATTR_IOMMU_BYPASS which is currently being
> discussed.
Right. So let's wait for that discussion to play out?
> But of course the easy answer in *your* case it just to ask the
> hypervisor not to put the virtio devices behind an IOMMU at all. Which
> we were planning to remain the default behaviour.
One can't do this for x86 ATM, can one?
> In all cases, the DMA API shall do the right thing.
I have no problem with that. For example, can we teach
the DMA API on intel x86 to use PT for virtio by default?
That would allow merging Andy's patches with
full compatibility with old guests and hosts.
> -- 
> dwmw2
> 
> 


-- 
MST

Michael S. Tsirkin

2015-Nov-11 09:11 UTC

head link

[PATCH v3 0/3] virtio DMA API core stuff

On Sat, Oct 31, 2015 at 12:16:12AM +0900, Joerg Roedel
wrote:> On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin wrote:
> > Example: you have a mix of assigned devices and virtio devices. You
> > don't trust your assigned device vendor not to corrupt your memory
so
> > you want to limit the damage your assigned device can do to your
guest,
> > so you use an IOMMU for that.  Thus existing iommu=pt within guest is
out.
> > 
> > But you trust your hypervisor (you have no choice anyway),
> > and you don't want the overhead of tweaking IOMMU
> > on data path for virtio. Thus iommu=on is out too.
> 
> IOMMUs on x86 usually come with an ACPI table that describes which
> IOMMUs are in the system and which devices they translate. So you can
> easily describe all devices there that are not behind an IOMMU.
> 
> The ACPI table is built by the BIOS, and the platform intialization code
> sets the device dma_ops accordingly. If the BIOS provides wrong
> information in the ACPI table this is a platform bug.
It doesn't look like I managed to put the point across.
My point is that IOMMU is required to do things like
userspace drivers, what we need is a way to express
"there is an IOMMU but it is part of device itself, use passthrough
 unless your driver is untrusted".
> > I'm not sure what ACPI has to do with it.  It's about a way
for guest
> > users to specify whether they want to bypass an IOMMU for a given
> > device.
> 
> We have no way yet to request passthrough-mode per-device from the IOMMU
> drivers, but that can easily be added. But as I see it:
> 
> > By the way, a bunch of code is missing on the QEMU side
> > to make this useful:
> > 1. virtio ignores the iommu
> > 2. vhost user ignores the iommu
> > 3. dataplane ignores the iommu
> > 4. vhost-net ignores the iommu
> > 5. VFIO ignores the iommu
> 
> Qemu does not implement IOMMU translation for virtio devices anyway
> (which is fine), so it just should tell the guest so in the ACPI table
> built to describe the emulated IOMMU.
> 
> 
> 	Joerg
This is a short term limitation.

Reasonably Related Threads

Search for more reasonably related threads

Linux Virtualization - Nov 2015 - [PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

[PATCH v3 0/3] virtio DMA API core stuff

Reasonably Related Threads