On Wed, Oct 28, 2015 at 03:51:58PM -0700, Andy Lutomirski wrote:> On Wed, Oct 28, 2015 at 9:12 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > On Wed, Oct 28, 2015 at 11:32:34PM +0900, David Woodhouse wrote: > >> > I don't have a problem with extending DMA API to address > >> > more usecases. > >> > >> No, this isn't an extension. This is fixing a bug, on certain platforms > >> where the DMA API has currently done the wrong thing. > >> > >> We have historically worked around that bug by introducing *another* > >> bug, which is not to *use* the DMA API in the virtio driver. > >> > >> Sure, we can co-ordinate those two bug-fixes. But let's not talk about > >> them as anything other than bug-fixes. > > > > It was pretty practical not to use it. All virtio devices at the time > > without exception bypassed the IOMMU, so it was a question of omitting a > > couple of function calls in virtio versus hacking on DMA implementation > > on multiple platforms. We have more policy options now, so I agree it's > > time to revisit this. > > > > But for me, the most important thing is that we do coordinate. > > > >> > > Drivers use DMA API. No more talky. > >> > > >> > Well for virtio they don't ATM. And 1:1 mapping makes perfect sense > >> > for the wast majority of users, so I can't switch them over > >> > until the DMA API actually addresses all existing usecases. > >> > >> That's still not your business; it's the platform's. And there are > >> hardware implementations of the virtio protocols on real PCI cards. And > >> we have the option of doing IOMMU translation for the virtio devices > >> even in a virtual machine. Just don't get involved. > >> > >> -- > >> dwmw2 > >> > >> > > > > I'm involved anyway, it's possible not to put all the code in the virtio > > subsystem in guest though. But I suspect we'll need to find a way for > > non-linux drivers within guest to work correctly too, and they might > > have trouble poking at things at the system level. So possibly virtio > > subsystem will have to tell platform "this device wants to bypass IOMMU" > > and then DMA API does the right thing. > > > > After some discussion at KS, no one came up with an example where it's > necessary, and the patches to convert virtqueue to use the DMA API are > much nicer when they convert it unconditionally.It's very surprising no one couldn't. I did above, I try again below. Note: below discusses configuration *within guest*. Example: you have a mix of assigned devices and virtio devices. You don't trust your assigned device vendor not to corrupt your memory so you want to limit the damage your assigned device can do to your guest, so you use an IOMMU for that. Thus existing iommu=pt within guest is out. But you trust your hypervisor (you have no choice anyway), and you don't want the overhead of tweaking IOMMU on data path for virtio. Thus iommu=on is out too.> The two interesting cases we thought of were PPC and x86's emulated > Q35 IOMMU. PPC will look in to architecting a devicetree-based way to > indicate passthrough status and will add quirks for the existing > virtio devices.Isn't this specified by the hypervisor? I don't think this is a good way to do this: guest security should be up to guest.> Everyone seems to agree that x86's emulated Q35 thing > is just buggy right now and should be taught to use the existing ACPI > mechanism for enumerating passthrough devices.I'm not sure what ACPI has to do with it. It's about a way for guest users to specify whether they want to bypass an IOMMU for a given device.> I'll send a new version of the series soon. > > --AndyBy the way, a bunch of code is missing on the QEMU side to make this useful: 1. virtio ignores the iommu 2. vhost user ignores the iommu 3. dataplane ignores the iommu 4. vhost-net ignores the iommu 5. VFIO ignores the iommu I think so far I only saw patches for 1 above. -- MST
On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin wrote:> > Example: you have a mix of assigned devices and virtio devices. You > don't trust your assigned device vendor not to corrupt your memory so > you want to limit the damage your assigned device can do to your > guest, > so you use an IOMMU for that. Thus existing iommu=pt within guest is > out. > > But you trust your hypervisor (you have no choice anyway), > and you don't want the overhead of tweaking IOMMU > on data path for virtio. Thus iommu=on is out too.That's not at all special for virtio or guest VMs. Even with real hardware, we might want performance from *some* devices, and security from others. See the DMA_ATTR_IOMMU_BYPASS which is currently being discussed. But of course the easy answer in *your* case it just to ask the hypervisor not to put the virtio devices behind an IOMMU at all. Which we were planning to remain the default behaviour. In all cases, the DMA API shall do the right thing. -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20151029/4bd0fecc/attachment.bin>
On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin wrote:> Example: you have a mix of assigned devices and virtio devices. You > don't trust your assigned device vendor not to corrupt your memory so > you want to limit the damage your assigned device can do to your guest, > so you use an IOMMU for that. Thus existing iommu=pt within guest is out. > > But you trust your hypervisor (you have no choice anyway), > and you don't want the overhead of tweaking IOMMU > on data path for virtio. Thus iommu=on is out too.IOMMUs on x86 usually come with an ACPI table that describes which IOMMUs are in the system and which devices they translate. So you can easily describe all devices there that are not behind an IOMMU. The ACPI table is built by the BIOS, and the platform intialization code sets the device dma_ops accordingly. If the BIOS provides wrong information in the ACPI table this is a platform bug.> I'm not sure what ACPI has to do with it. It's about a way for guest > users to specify whether they want to bypass an IOMMU for a given > device.We have no way yet to request passthrough-mode per-device from the IOMMU drivers, but that can easily be added. But as I see it:> By the way, a bunch of code is missing on the QEMU side > to make this useful: > 1. virtio ignores the iommu > 2. vhost user ignores the iommu > 3. dataplane ignores the iommu > 4. vhost-net ignores the iommu > 5. VFIO ignores the iommuQemu does not implement IOMMU translation for virtio devices anyway (which is fine), so it just should tell the guest so in the ACPI table built to describe the emulated IOMMU. Joerg
(Sorry, missed part of this before). On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin wrote:> Isn't this specified by the hypervisor? I don't think this is a good > way to do this: guest security should be up to guest.And it is. When the guest sees an IOMMU, it can choose to use it, or choose not to (or choose to put it in passthrough mode). But as J?rg says, we don't have a way for an individual device driver to *request* passthrough mode or not yet; the choice is made by the core IOMMU code (iommu=pt on the command line) ? or by the platform simply stating that a given device isn't *covered* by an IOMMU, if that is indeed the case. In *no* circumstance is it sane for a device driver just to "opt out" of using the correct DMA API function calls, and expect that to *magically* cause the IOMMU to be bypassed.> > Everyone seems to agree that x86's emulated Q35 thing > > is just buggy right now and should be taught to use the existing ACPI > > mechanism for enumerating passthrough devices. > > I'm not sure what ACPI has to do with it. > It's about a way for guest users to specify whether > they want to bypass an IOMMU for a given device.No, it absolutely isn't. You might want that ? and see the discussion about DMA_ATTR_IOMMU_BYPASS if you do. But that is *utterly* irrelevant to *this* discussion, in which you seem to be advocating that the virtio drivers should remain buggy by just unilaterally not using the DMA API.> By the way, a bunch of code is missing on the QEMU side > to make this useful: > 1. virtio ignores the iommu > 2. vhost user ignores the iommu > 3. dataplane ignores the iommu > 4. vhost-net ignores the iommu > 5. VFIO ignores the iommuNo, those things are not useful for fixing the virtio driver bug under discussion here. All we need to do is make the virtio drivers correctly use the DMA API. They should never have passed review and been accepted into the Linux kernel without that. All we need to do first is make sure that the bug we have in the PowerPC IOMMU code (and potentially ARM and/or SPARC?) is fixed, and that it doesn't attempt to use an IOMMU that doesn't exist. And ensure that the virtualised IOMMU on qemu/x86 isn't lying and claiming that it translates for the virtio devices when it doesn't. There are other things we might want to do ? like fixing the IOMMU that qemu can emulate, and actually making it work with real assigned devices (currently it's totally hosed because it doesn't handle that case at all). And potentially making the virtualised IOMMU actually *do* translation for virtio devices (as opposed to just admitting correctly that it doesn't). But those aren't strictly relevant here, yet. It's not clear what specific uses of the IOMMU you had in mind in your above list ? could you elucidate? -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20151030/af051d16/attachment-0001.bin>
On 29/10/2015 10:01, Michael S. Tsirkin wrote:> > Everyone seems to agree that x86's emulated Q35 thing > > is just buggy right now and should be taught to use the existing ACPI > > mechanism for enumerating passthrough devices. > > I'm not sure what ACPI has to do with it. > It's about a way for guest users to specify whether > they want to bypass an IOMMU for a given device.It's not configured in the guest, it's configured _when starting_ the guest (e.g. -device some-pci-device,iommu-bypass=on) and it is reflected in the DMAR table or the device tree. The default for virtio and VFIO is to bypass the IOMMU. Changing the default can be supported (virtio) or not (VFIO, vhost-user). Hotplug need to check whether the parent bridge is has the same setting that the user desires for the new device.> 1. virtio ignores the iommu > 2. vhost user ignores the iommu > 3. dataplane ignores the iommu > 4. vhost-net ignores the iommu > 5. VFIO ignores the iommu > > I think so far I only saw patches for 1 above.1 and 3 are easy. For 2 and 5 you can simply forbid configurations with vhost-user/VFIO behind an IOMMU. For 4 QEMU can simply not activate vhost-net and use the userspace fallback. However, IOMMU support in QEMU is experimental. We can do things a step at a time. Paolo
On Thu, Oct 29, 2015 at 05:18:56PM +0100, David Woodhouse wrote:> On Thu, 2015-10-29 at 11:01 +0200, Michael S. Tsirkin wrote: > > > > Example: you have a mix of assigned devices and virtio devices. You > > don't trust your assigned device vendor not to corrupt your memory so > > you want to limit the damage your assigned device can do to your > > guest, > > so you use an IOMMU for that. Thus existing iommu=pt within guest is > > out. > > > > But you trust your hypervisor (you have no choice anyway), > > and you don't want the overhead of tweaking IOMMU > > on data path for virtio. Thus iommu=on is out too. > > That's not at all special for virtio or guest VMs. Even with real > hardware, we might want performance from *some* devices, and security > from others. See the DMA_ATTR_IOMMU_BYPASS which is currently being > discussed.Right. So let's wait for that discussion to play out?> But of course the easy answer in *your* case it just to ask the > hypervisor not to put the virtio devices behind an IOMMU at all. Which > we were planning to remain the default behaviour.One can't do this for x86 ATM, can one?> In all cases, the DMA API shall do the right thing.I have no problem with that. For example, can we teach the DMA API on intel x86 to use PT for virtio by default? That would allow merging Andy's patches with full compatibility with old guests and hosts.> -- > dwmw2 > >-- MST
On Sat, Oct 31, 2015 at 12:16:12AM +0900, Joerg Roedel wrote:> On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin wrote: > > Example: you have a mix of assigned devices and virtio devices. You > > don't trust your assigned device vendor not to corrupt your memory so > > you want to limit the damage your assigned device can do to your guest, > > so you use an IOMMU for that. Thus existing iommu=pt within guest is out. > > > > But you trust your hypervisor (you have no choice anyway), > > and you don't want the overhead of tweaking IOMMU > > on data path for virtio. Thus iommu=on is out too. > > IOMMUs on x86 usually come with an ACPI table that describes which > IOMMUs are in the system and which devices they translate. So you can > easily describe all devices there that are not behind an IOMMU. > > The ACPI table is built by the BIOS, and the platform intialization code > sets the device dma_ops accordingly. If the BIOS provides wrong > information in the ACPI table this is a platform bug.It doesn't look like I managed to put the point across. My point is that IOMMU is required to do things like userspace drivers, what we need is a way to express "there is an IOMMU but it is part of device itself, use passthrough unless your driver is untrusted".> > I'm not sure what ACPI has to do with it. It's about a way for guest > > users to specify whether they want to bypass an IOMMU for a given > > device. > > We have no way yet to request passthrough-mode per-device from the IOMMU > drivers, but that can easily be added. But as I see it: > > > By the way, a bunch of code is missing on the QEMU side > > to make this useful: > > 1. virtio ignores the iommu > > 2. vhost user ignores the iommu > > 3. dataplane ignores the iommu > > 4. vhost-net ignores the iommu > > 5. VFIO ignores the iommu > > Qemu does not implement IOMMU translation for virtio devices anyway > (which is fine), so it just should tell the guest so in the ACPI table > built to describe the emulated IOMMU. > > > JoergThis is a short term limitation.