On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote:> I balk at adding more hacks to a broken system. My goals are > merely to > - make things work correctly with an IOMMU and new guests, > ? so people can use userspace drivers with virtio devices > - prevent security risks when guest kernel mistakenly thinks > ? it's protected by an IOMMU, but in fact isn't > - avoid breaking any working configurationsAFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. That's just an optimisation, for telling an OS "you don't really need to bother with the IOMMU, even though you it works". There are two main reasons why an operating system might want to use the IOMMU via the DMA API for native drivers:? ?- To protect against driver bugs triggering rogue DMA. ?- To protect against hardware (or firmware) bugs. With virtio, the first reason still exists. But the second is moot because the device is part of the hypervisor and if the hypervisor is untrustworthy then you're screwed anyway... but then again, in SoC devices you could replace 'hypervisor' with 'chip' and the same is true, isn't it? Is there *really* anything virtio-specific here? Sure, I want my *external* network device on a PCIe card with software- loadable firmware to be behind an IOMMU because I don't trust it as far as I can throw it. But for on-SoC devices surely the situation is *just* the same as devices provided by a hypervisor? And some people want that external network device to use passthrough anyway, for performance reasons. On the whole, there are *plenty* of reasons why we might want to have a passthrough mapping on a per-device basis, and I really struggle to find justification for having this 'hint' in a virtio-specific way. And it's complicating the discussion of the *actual* fix we're looking at.> Looking at guest code, it looks like virtio was always > bypassing the IOMMU even if configured, but no other > guest driver did. > > This makes me think the problem where guest drivers > ignore the IOMMU is virtio specific > and so a virtio specific solution seems cleaner. > > The problem for assigned devices is IMHO different: they bypass > the guest IOMMU too but no guest driver knows about this, > so guests do not work. Seems cleaner to fix QEMU to make > existing guests work.I certainly agree that it's better to fix QEMU. Whether devices are behind an IOMMU or not, the DMAR tables we expose to a guest should tell the truth. Part of the issue here is virtio-specific; part isn't. Basically, we have a conjunction of two separate bugs which happened to work (for virtio) ? the IOMMU support in QEMU wasn't working for virtio (and assigned) devices even though it theoretically *should* have been, and the virtio drivers weren't using the DMA API as they theoretically should have been. So there were corner cases like assigned PCI devices, and real hardware implementations of virtio stuff (and perhaps virtio devices being assigned to nested guests) which didn't work. But for the *common* use case, one bug cancelled out the other. Now we want to fix both bugs, and of course that involves carefully coordinating both fixes. I *like* your idea of a flag from the hypervisor which essentially says "trust me, I'm telling the truth now". But don't think that wants to be virtio-specific, because we actually want it to cover *all* the corner cases, not just the common case which *happened* to work before due to the alignment of the two previous bugs. An updated guest OS can look for this flag (in its generic IOMMU code) and can apply a heuristic of its own to work out which devices *aren't* behind the IOMMU, if the flag isn't present. And it can get that right even for assigned devices, so that new kernels can run happily even on today's QEMU instances. And the virtio driver in new kernels should just use the DMA API and expect it to work. Just as the various drivers for assigned PCI devices do. The other interesting case for compatibility is old kernels running in a new QEMU. And for that case, things are likely to break if you suddenly start putting the virtio devices behind an IOMMU. There's nothing you can do on ARM and Power to stop that breakage, since they don't *have* a way to tell legacy guests that certain devices aren't translated. So I suspect you probably can't enable virtio-behind-IOMMU in QEMU *ever* for those platforms as the default behaviour. For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell the truth, and even legacy kernels ought to cope with that. FSVO 'ought to' where I suspect some of them will actually crash with a NULL pointer dereference if there's no "catch-all" DMAR unit in the tables, which puts it back into the same camp as ARM and Power.> True but I think we should fix QEMU to shadow IOMMU > page tables for assigned devices. This seems rather > possible with VT-D, and there are patches already on list. > > It looks like this will fix all legacy guests which is > much nicer than what you suggest which will only help new guests.Yes, we should do that. And in the short term we should at *least* fix the DMAR tables to tell the truth.> > > > Furthermore, some platforms don't *have* a standard way for qemu to > > 'tell the truth' to the guests, and that's where the real fun comes in. > > But still, I'd like to see a generic solution for that lack instead of > > a virtio-specific hack. > But the issue is not just these holes.??E.g. with VT-D it is only easy > to emulate because there's a "caching mode" hook. It is fundamentally > paravirtualization.??So a completely generic solution would be a > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > justified if many platforms have hard to emulate interfaces.Hm, I'm not sure I understand the point here. Either there is a way for the hypervisor to expose an IOMMU to a guest (be it full hardware virt, or paravirt). Or there isn't. If there is, it doesn't matter *how* it's done. And if there isn't, the whole discussion is moot anyway. -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160418/fcc2bb27/attachment.bin>
On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2 at infradead.org> wrote:> For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > the truth, and even legacy kernels ought to cope with that. > FSVO 'ought to' where I suspect some of them will actually crash with a > NULL pointer dereference if there's no "catch-all" DMAR unit in the > tables, which puts it back into the same camp as ARM and Power.I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU implementation on x86 has always been "experimental", so it just might be okay to change it in a way that causes some older kernels to OOPS. --Andy
Michael S. Tsirkin
2016-Apr-19 09:13 UTC
[PATCH RFC] fixup! virtio: convert to use DMA api
On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote:> On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote: > > I balk at adding more hacks to a broken system. My goals are > > merely to > > - make things work correctly with an IOMMU and new guests, > > ? so people can use userspace drivers with virtio devices > > - prevent security risks when guest kernel mistakenly thinks > > ? it's protected by an IOMMU, but in fact isn't > > - avoid breaking any working configurations > > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. > That's just an optimisation, for telling an OS "you don't really need > to bother with the IOMMU, even though you it works". > > There are two main reasons why an operating system might want to use > the IOMMU via the DMA API for native drivers:? > ?- To protect against driver bugs triggering rogue DMA. > ?- To protect against hardware (or firmware) bugs. > > With virtio, the first reason still exists. But the second is moot > because the device is part of the hypervisor and if the hypervisor is > untrustworthy then you're screwed anyway... but then again, in SoC > devices you could replace 'hypervisor' with 'chip' and the same is > true, isn't it? Is there *really* anything virtio-specific here? > > Sure, I want my *external* network device on a PCIe card with software- > loadable firmware to be behind an IOMMU because I don't trust it as far > as I can throw it. But for on-SoC devices surely the situation is > *just* the same as devices provided by a hypervisor?Depends on how SoC is designed I guess. At the moment specifically QEMU runs everything in a single memory space so an IOMMU table lookup does not offer any extra protection. That's not a must, one could come up with modular hypervisor designs - it's just what we have ATM.> And some people want that external network device to use passthrough > anyway, for performance reasons.That's a policy decision though.> On the whole, there are *plenty* of reasons why we might want to have a > passthrough mapping on a per-device basis,That's true. And driver security also might differ, for example maybe I trust a distro-supplied driver more than an out of tree one. Or maybe I trust a distro-supplied userspace driver more than a closed-source one. And maybe I trust devices from same vendor as my chip more than a 3rd party one. So one can generalize this even further, think about device and driver security/trust level as an integer and platform protection as an integer. If platform IOMMU offers you extra protection over trusting the device (trust < protection) it improves you security to use platform to limit the device. If trust >= protection it just adds overhead without increasing the security.> and I really struggle to > find justification for having this 'hint' in a virtio-specific way.It's a way. No system seems to expose this information in a more generic way at the moment, and it's portable. Would you like to push for some kind of standartization of such a hint? I would be interested to hear about that.> And it's complicating the discussion of the *actual* fix we're looking > at.I guess you are right in that we should split this part out. What I wanted is really the combination PASSTHROUGH && !PLATFORM so that we can say "ok we don't need to guess, this device actually bypasses the IOMMU". And I thought it's a nice idea to use PASSTHROUGH && PLATFORM as a hint since it seemed to be unused. But maybe the best thing to do for now is to say - hosts should not set PASSTHROUGH && PLATFORM - guests should ignore PASSTHROUGH if PLATFORM is set and then we can come back to this optimization idea later if it's appropriate. So yes I think we need the two bits but no we don't need to mix the hint discussion in here.> > Looking at guest code, it looks like virtio was always > > bypassing the IOMMU even if configured, but no other > > guest driver did. > > > > This makes me think the problem where guest drivers > > ignore the IOMMU is virtio specific > > and so a virtio specific solution seems cleaner. > > > > The problem for assigned devices is IMHO different: they bypass > > the guest IOMMU too but no guest driver knows about this, > > so guests do not work. Seems cleaner to fix QEMU to make > > existing guests work. > > I certainly agree that it's better to fix QEMU. Whether devices are > behind an IOMMU or not, the DMAR tables we expose to a guest should > tell the truth. > > Part of the issue here is virtio-specific; part isn't. > > Basically, we have a conjunction of two separate bugs which happened to > work (for virtio) ? the IOMMU support in QEMU wasn't working for virtio > (and assigned) devices even though it theoretically *should* have been, > and the virtio drivers weren't using the DMA API as they theoretically > should have been. > > So there were corner cases like assigned PCI devices, and real hardware > implementations of virtio stuff (and perhaps virtio devices being > assigned to nested guests) which didn't work. But for the *common* use > case, one bug cancelled out the other. > > Now we want to fix both bugs, and of course that involves carefully > coordinating both fixes. > > I *like* your idea of a flag from the hypervisor which essentially says > "trust me, I'm telling the truth now". > > But don't think that wants to be virtio-specific, because we actually > want it to cover *all* the corner cases, not just the common case which > *happened* to work before due to the alignment of the two previous > bugs.I guess we differ here. I care about fixing bugs and not breaking working setups but I see little value in working around existing bugs if they can be fixed at their source. Building a generic mechanism to report which devices bypass the IOMMU isn't trivial because there's no simple generic way to address an arbitrary device from hypervisor. For example, DMAR tables commonly use bus numbers for that but these are guest (bios) assigned. So if we used bus numbers we'd have to ask bios to build a custom ACPI table and stick bus numbers there.> An updated guest OS can look for this flag (in its generic IOMMU code) > and can apply a heuristic of its own to work out which devices *aren't* > behind the IOMMU, if the flag isn't present. And it can get that right > even for assigned devices, so that new kernels can run happily even on > today's QEMU instances.With iommu enabled? Point is, I don't really care about that. At this point only a very small number of devices work with this IOMMU at all. I expect that we'll fix assigned devices very soon.> And the virtio driver in new kernels should > just use the DMA API and expect it to work. Just as the various drivers > for assigned PCI devices do.Absolutely but that's a separate discussion.> The other interesting case for compatibility is old kernels running in > a new QEMU. And for that case, things are likely to break if you > suddenly start putting the virtio devices behind an IOMMU. There's > nothing you can do on ARM and Power to stop that breakage, since they > don't *have* a way to tell legacy guests that certain devices aren't > translated. So I suspect you probably can't enable virtio-behind-IOMMU > in QEMU *ever* for those platforms as the default behaviour. > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > the truth, and even legacy kernels ought to cope with that.I don't see how in that legacy kernels bypassed the DMA API. To me it looks like we either use physical addresses that they give us or they don't work at all (at least without iommu=pt), since the VT-D spec says: DMA requests processed through root-entries with present field Clear result in translation-fault. So I suspect the IOMMU_PLATFORM flag would have to stay off by default for a while.> FSVO 'ought to' where I suspect some of them will actually crash with a > NULL pointer dereference if there's no "catch-all" DMAR unit in the > tables, which puts it back into the same camp as ARM and Power.Right. That would also be an issue.> > > True but I think we should fix QEMU to shadow IOMMU > > page tables for assigned devices. This seems rather > > possible with VT-D, and there are patches already on list. > > > > It looks like this will fix all legacy guests which is > > much nicer than what you suggest which will only help new guests. > > Yes, we should do that. And in the short term we should at *least* fix > the DMAR tables to tell the truth.Right. However, the way timing happens to work, we are out of time to fix it in 2.6 and we are highly likely to have the proper VFIO fix in 2.7. So I'm not sure there's space for a short term fix.> > > > > > Furthermore, some platforms don't *have* a standard way for qemu to > > > 'tell the truth' to the guests, and that's where the real fun comes in. > > > But still, I'd like to see a generic solution for that lack instead of > > > a virtio-specific hack. > > But the issue is not just these holes.??E.g. with VT-D it is only easy > > to emulate because there's a "caching mode" hook. It is fundamentally > > paravirtualization.??So a completely generic solution would be a > > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > > justified if many platforms have hard to emulate interfaces. > > Hm, I'm not sure I understand the point here. > > Either there is a way for the hypervisor to expose an IOMMU to a guest > (be it full hardware virt, or paravirt). Or there isn't. > > If there is, it doesn't matter *how* it's done.Well it does matter for people doing it :)> And if there isn't, the > whole discussion is moot anyway.Point was that we can always build a paravirt interface if it does not exist, but it's easier to maintain if it's minimal, being as close to emulating hardware as we can.> -- > dwmw2 > >
Michael S. Tsirkin
2016-Apr-19 10:27 UTC
[PATCH RFC] fixup! virtio: convert to use DMA api
On Mon, Apr 18, 2016 at 12:24:15PM -0700, Andy Lutomirski wrote:> On Mon, Apr 18, 2016 at 11:29 AM, David Woodhouse <dwmw2 at infradead.org> wrote: > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > > the truth, and even legacy kernels ought to cope with that. > > FSVO 'ought to' where I suspect some of them will actually crash with a > > NULL pointer dereference if there's no "catch-all" DMAR unit in the > > tables, which puts it back into the same camp as ARM and Power. > > I think x86 may get a bit of a free pass here. AFAIK the QEMU IOMMU > implementation on x86 has always been "experimental", so it just might > be okay to change it in a way that causes some older kernels to OOPS. > > --AndySince it's experimental, it might be OK to change *guest kernels* such that they oops on old QEMU. But guest kernels were not experimental - so we need a QEMU mode that makes them work fine. The more functionality is available in this QEMU mode, the betterm because it's going to be the default for a while. For the same reason, it is preferable to also have new kernels not crash in this mode. -- MST
On Tue, 19 Apr 2016 12:13:29 +0300 "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Mon, Apr 18, 2016 at 02:29:33PM -0400, David Woodhouse wrote: > > On Mon, 2016-04-18 at 19:27 +0300, Michael S. Tsirkin wrote: > > > I balk at adding more hacks to a broken system. My goals are > > > merely to > > > - make things work correctly with an IOMMU and new guests, > > > ? so people can use userspace drivers with virtio devices > > > - prevent security risks when guest kernel mistakenly thinks > > > ? it's protected by an IOMMU, but in fact isn't > > > - avoid breaking any working configurations > > > > AFAICT the VIRTIO_F_IOMMU_PASSTHROUGH thing seems orthogonal to this. > > That's just an optimisation, for telling an OS "you don't really need > > to bother with the IOMMU, even though you it works". > > > > There are two main reasons why an operating system might want to use > > the IOMMU via the DMA API for native drivers:? > > ?- To protect against driver bugs triggering rogue DMA. > > ?- To protect against hardware (or firmware) bugs. > > > > With virtio, the first reason still exists. But the second is moot > > because the device is part of the hypervisor and if the hypervisor is > > untrustworthy then you're screwed anyway... but then again, in SoC > > devices you could replace 'hypervisor' with 'chip' and the same is > > true, isn't it? Is there *really* anything virtio-specific here? > > > > Sure, I want my *external* network device on a PCIe card with software- > > loadable firmware to be behind an IOMMU because I don't trust it as far > > as I can throw it. But for on-SoC devices surely the situation is > > *just* the same as devices provided by a hypervisor? > > Depends on how SoC is designed I guess. At the moment specifically QEMU > runs everything in a single memory space so an IOMMU table lookup does > not offer any extra protection. That's not a must, one could come > up with modular hypervisor designs - it's just what we have ATM. > > > > And some people want that external network device to use passthrough > > anyway, for performance reasons. > > That's a policy decision though. > > > On the whole, there are *plenty* of reasons why we might want to have a > > passthrough mapping on a per-device basis, > > That's true. And driver security also might differ, for example maybe I > trust a distro-supplied driver more than an out of tree one. Or maybe I > trust a distro-supplied userspace driver more than a closed-source one. > And maybe I trust devices from same vendor as my chip more than a 3rd > party one. So one can generalize this even further, think about device > and driver security/trust level as an integer and platform protection as an > integer. > > If platform IOMMU offers you extra protection over trusting the device > (trust < protection) it improves you security to use platform to limit > the device. If trust >= protection it just adds overhead without > increasing the security. > > > and I really struggle to > > find justification for having this 'hint' in a virtio-specific way. > > It's a way. No system seems to expose this information in a more generic > way at the moment, and it's portable. Would you like to push for some > kind of standartization of such a hint? I would be interested > to hear about that. > > > > And it's complicating the discussion of the *actual* fix we're looking > > at. > > I guess you are right in that we should split this part out. > What I wanted is really the combination > PASSTHROUGH && !PLATFORM so that we can say "ok we don't > need to guess, this device actually bypasses the IOMMU". > > And I thought it's a nice idea to use PASSTHROUGH && PLATFORM > as a hint since it seemed to be unused. > But maybe the best thing to do for now is to say > - hosts should not set PASSTHROUGH && PLATFORM > - guests should ignore PASSTHROUGH if PLATFORM is set > > and then we can come back to this optimization idea later > if it's appropriate. > > So yes I think we need the two bits but no we don't need to > mix the hint discussion in here. > > > > Looking at guest code, it looks like virtio was always > > > bypassing the IOMMU even if configured, but no other > > > guest driver did. > > > > > > This makes me think the problem where guest drivers > > > ignore the IOMMU is virtio specific > > > and so a virtio specific solution seems cleaner. > > > > > > The problem for assigned devices is IMHO different: they bypass > > > the guest IOMMU too but no guest driver knows about this, > > > so guests do not work. Seems cleaner to fix QEMU to make > > > existing guests work. > > > > I certainly agree that it's better to fix QEMU. Whether devices are > > behind an IOMMU or not, the DMAR tables we expose to a guest should > > tell the truth. > > > > Part of the issue here is virtio-specific; part isn't. > > > > Basically, we have a conjunction of two separate bugs which happened to > > work (for virtio) ? the IOMMU support in QEMU wasn't working for virtio > > (and assigned) devices even though it theoretically *should* have been, > > and the virtio drivers weren't using the DMA API as they theoretically > > should have been. > > > > So there were corner cases like assigned PCI devices, and real hardware > > implementations of virtio stuff (and perhaps virtio devices being > > assigned to nested guests) which didn't work. But for the *common* use > > case, one bug cancelled out the other. > > > > Now we want to fix both bugs, and of course that involves carefully > > coordinating both fixes. > > > > I *like* your idea of a flag from the hypervisor which essentially says > > "trust me, I'm telling the truth now". > > > > But don't think that wants to be virtio-specific, because we actually > > want it to cover *all* the corner cases, not just the common case which > > *happened* to work before due to the alignment of the two previous > > bugs. > > I guess we differ here. I care about fixing bugs and not breaking > working setups but I see little value in working around > existing bugs if they can be fixed at their source. > > Building a generic mechanism to report which devices bypass the IOMMU > isn't trivial because there's no simple generic way to address > an arbitrary device from hypervisor. For example, DMAR tables > commonly use bus numbers for that but these are guest (bios) assigned. > So if we used bus numbers we'd have to ask bios to build a custom > ACPI table and stick bus numbers there.This is incorrect, the DMAR table specifically uses devices paths in order to avoid the issue with guest assigned bus numbers. The only bus number used is the starting bus number, which is generally provided by the platform anyway. Excluding devices isn't necessarily easy with DMAR though, we don't get to be lazy and use the INCLUDE_PCI_ALL flag. Hotplug is also an issue, we either need to hot-add devices into slots where there's already the correct DMAR coverage (or lack of coverage) to represent the inclusion or exclusion or enable dynamic table support. And really it seems like dynamic tables are the only possible way DMAR could support replacing a device that obeys the IOMMU with one that does not at the same address, or vica versa. For any sort of sane implementation, it probably comes down to fully enumerating root bus devices in the DMAR and creating PCI sub-hierarchy entries for certain subordinate buses, leaving others undefined. Devices making use of the IOMMU could only be attached behind those sub-hierarchies and devices not making use of the IOMMU would be downstream of bridges not covered. The management stack would need to know where to place devices.> > An updated guest OS can look for this flag (in its generic IOMMU code) > > and can apply a heuristic of its own to work out which devices *aren't* > > behind the IOMMU, if the flag isn't present. And it can get that right > > even for assigned devices, so that new kernels can run happily even on > > today's QEMU instances. > > With iommu enabled? Point is, I don't really care about that. > At this point only a very small number of devices work with this > IOMMU at all. I expect that we'll fix assigned devices very soon. > > > And the virtio driver in new kernels should > > just use the DMA API and expect it to work. Just as the various drivers > > for assigned PCI devices do. > > Absolutely but that's a separate discussion. > > > The other interesting case for compatibility is old kernels running in > > a new QEMU. And for that case, things are likely to break if you > > suddenly start putting the virtio devices behind an IOMMU. There's > > nothing you can do on ARM and Power to stop that breakage, since they > > don't *have* a way to tell legacy guests that certain devices aren't > > translated. So I suspect you probably can't enable virtio-behind-IOMMU > > in QEMU *ever* for those platforms as the default behaviour. > > > > For x86, you *can* enable virtio-behind-IOMMU if your DMAR tables tell > > the truth, and even legacy kernels ought to cope with that. > > I don't see how in that legacy kernels bypassed the DMA API.It's a matter of excluding the device from being explicitly covered by the DMAR AIUI. This is theoretically possible, but I wonder if it actually works for all kernels.> To me it looks like we either use physical addresses that they give us > or they don't work at all (at least without iommu=pt), > since the VT-D spec says: > DMA requests processed through root-entries with present field > Clear result in translation-fault. > > So I suspect the IOMMU_PLATFORM flag would have to stay off > by default for a while. > > > > FSVO 'ought to' where I suspect some of them will actually crash with a > > NULL pointer dereference if there's no "catch-all" DMAR unit in the > > tables, which puts it back into the same camp as ARM and Power. > > Right. That would also be an issue. > > > > > > True but I think we should fix QEMU to shadow IOMMU > > > page tables for assigned devices. This seems rather > > > possible with VT-D, and there are patches already on list. > > > > > > It looks like this will fix all legacy guests which is > > > much nicer than what you suggest which will only help new guests. > > > > Yes, we should do that. And in the short term we should at *least* fix > > the DMAR tables to tell the truth. > > Right. However, the way timing happens to work, we are out of time to > fix it in 2.6 and we are highly likely to have the proper VFIO fix in > 2.7. So I'm not sure there's space for a short term fix.Note that vfio already works with IOMMUs on power, the issue I believe we're talking about for assigned devices bypassing the guest IOMMU is limited to the QEMU VT-d implementation failing to do the proper notifies. Legacy KVM device assignment of course has no idea about the IOMMU because it piggy backs on KVM memory slot mapping instead of operating within the QEMU Memory API like vfio does. The issues I believe we're going to hit with vfio assigned devices and QEMU VT-d are that 1) the vfio IOMMU interface is not designed for the frequency of mapping that a DMA API managed guest device will generate, 2) we have accounting issues for locked pages since each device will run in a separate IOMMU domain, accounted separately, and 3) we don't have a way to expose host grouping to the VM so trying to assign multiple devices from the same group is likely to fail. We'd almost need to put all of the devices within a group behind a conventional PCI bridge in the VM to get them into the same address space, but I suspect QEMU VT-d doesn't take that aliasing into account.> > > > > > > > Furthermore, some platforms don't *have* a standard way for qemu to > > > > 'tell the truth' to the guests, and that's where the real fun comes in. > > > > But still, I'd like to see a generic solution for that lack instead of > > > > a virtio-specific hack. > > > But the issue is not just these holes.??E.g. with VT-D it is only easy > > > to emulate because there's a "caching mode" hook. It is fundamentally > > > paravirtualization.??So a completely generic solution would be a > > > paravirtualized IOMMU interface, replacing VT-D for VMs. It might be > > > justified if many platforms have hard to emulate interfaces. > > > > Hm, I'm not sure I understand the point here. > > > > Either there is a way for the hypervisor to expose an IOMMU to a guest > > (be it full hardware virt, or paravirt). Or there isn't. > > > > If there is, it doesn't matter *how* it's done. > > Well it does matter for people doing it :) > > > And if there isn't, the > > whole discussion is moot anyway. > > Point was that we can always build a paravirt interface > if it does not exist, but it's easier to maintain > if it's minimal, being as close to emulating hardware > as we can. > > > -- > > dwmw2 > > > > > >
On Apr 19, 2016 2:13 AM, "Michael S. Tsirkin" <mst at redhat.com> wrote:> > > I guess you are right in that we should split this part out. > What I wanted is really the combination > PASSTHROUGH && !PLATFORM so that we can say "ok we don't > need to guess, this device actually bypasses the IOMMU".What happens when you use a device like this on Xen or with a similar software translation layer? I think that a "please bypass IOMMU" feature would be better in the PCI, IOMMU, or platform code. For Xen, virtio would still want to use the DMA API, just without translating at the DMAR or hardware level. Doing it in virtio is awkward, because virtio is involved at the device level and the driver level, but the translation might be entirely in between. I think a nicer long-term approach would be to have a way to ask the guest to set up a full 1:1 mapping for performance, but to still handle the case where the guest refuses to do so or where there's more than one translation layer involved. But I agree that this part shouldn't delay the other part of your series. --Andy