David Woodhouse
2016-Apr-28 15:11 UTC
[PATCH V2 RFC] fixup! virtio: convert to use DMA api
On Thu, 2016-04-28 at 17:34 +0300, Michael S. Tsirkin wrote:> I see work-arounds for broken IOMMUs but not for > individual devices. Could you point me to a more specific > example?I think the closest example is probably quirk_ioat_snb_local_iommu(). If we see this particular device, we *know* what the topology actually looks like. We check the hardware setup, and if we're *not* being told the truth, then we stick it in bypass mode because we know it *isn't* actually being translated. Actually, that's almost *identical* to what we want, isn't it? Except instead of checking undocumented chipset registers, it wants to be checking "am I on a version of qemu known to lie about virtio being translated?"> > We don't actually *need* it for the Intel IOMMU; all we need is for > > QEMU to stop lying in its DMAR tables. > We need it for legacy QEMU anyway, and it's not easy for QEMU to stop > lying about virtio, so we'll need it for a while. > I think it's easy for QEMU to stop lying about assigned devices, > so we don't need it for non-virtio devices.Why is it easier for QEMU to tell the truth about assigned devices, than it is for virtio? Assuming they both remain actually untranslated for now, why's it easier to fix the DMAR table for one and not the other? (Implementing translation of assigned devices is on my list, but it's a long way off).> I don't see why how fwcfg can work here. It's a static thing, > devices can come and go with hotplug.This touches on something you said elsewhere, that it's painful/impossible to hot-unplug a translated device and hot-plug an untranslated device in the same slot (and vice versa). So let's assume for now that a given slot is indeed static, and either translated or untranslated. Like the DMAR table, the fwcfg can just give a list of slot which are (or aren't) translated. And then you can *only* add a translated device to a translated slot, or an untranslated device to an untranslated slot. All the internally-emulated devices *can* be either translated or untranslated. That's just a matter of software. Surely, you currently *can't* have translated assigned devices (until someone implements the whole VT-d page table shadowing or whatever), so you'll be barred from assigning a device to a slot which *previously* had an untranslated device. But so what? Put it in a different slot instead. -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5760 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160428/59b3b152/attachment.bin>
Michael S. Tsirkin
2016-Apr-28 15:37 UTC
[PATCH V2 RFC] fixup! virtio: convert to use DMA api
On Thu, Apr 28, 2016 at 04:11:54PM +0100, David Woodhouse wrote:> On Thu, 2016-04-28 at 17:34 +0300, Michael S. Tsirkin wrote: > > I see work-arounds for broken IOMMUs but not for > > individual devices. Could you point me to a more specific > > example? > > I think the closest example is probably quirk_ioat_snb_local_iommu().OK, so for intel, it seems that it's enough to set pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO; for the device. Do I have to poke at each iommu implementation to find a way to do this, or is there some way to do it portably?> If we see this particular device, we *know* what the topology actually > looks like. We check the hardware setup, and if we're *not* being told > the truth, then we stick it in bypass mode because we know it *isn't* > actually being translated. > > Actually, that's almost *identical* to what we want, isn't it? > > Except instead of checking undocumented chipset registers, it wants to > be checking "am I on a version of qemu known to lie about virtio being > translated?"Not exactly - I think that future versions of qemu might lie about some devices but not others.> > > We don't actually *need* it for the Intel IOMMU; all we need is for > > > QEMU to stop lying in its DMAR tables. > > We need it for legacy QEMU anyway, and it's not easy for QEMU to stop > > lying about virtio, so we'll need it for a while. > > I think it's easy for QEMU to stop lying about assigned devices, > > so we don't need it for non-virtio devices. > > Why is it easier for QEMU to tell the truth about assigned devices, > than it is for virtio? Assuming they both remain actually untranslated > for now, why's it easier to fix the DMAR table for one and not the > other? > > (Implementing translation of assigned devices is on my list, but it's a > long way off).DMAR is unfortunately not a good match for what people do with QEMU. There is a patchset on list fixing translation of assigned devices. So the fix for these will simply be to do translation for all assigned devices. It's harder for virtio as it isn't always processed in QEMU - there's vhost in kernel and an out of process vhost-user plugin. So we can end up e.g. with modern QEMU which does translate in-process virtio but not out of process one.> > I don't see why how fwcfg can work here. It's a static thing, > > devices can come and go with hotplug. > > This touches on something you said elsewhere, that it's > painful/impossible to hot-unplug a translated device and hot-plug an > untranslated device in the same slot (and vice versa). > > So let's assume for now that a given slot is indeed static, and either > translated or untranslated. Like the DMAR table, the fwcfg can just > give a list of slot which are (or aren't) translated. > > And then you can *only* add a translated device to a translated slot, > or an untranslated device to an untranslated slot. > > All the internally-emulated devices *can* be either translated or > untranslated. That's just a matter of software. Surely, you currently > *can't* have translated assigned devices (until someone implements the > whole VT-d page table shadowing or whatever), so you'll be barred from > assigning a device to a slot which *previously* had an untranslated > device. But so what? Put it in a different slot instead.Unfortunately people got used to be able to put any device in any slot, and built external tools around that ability. It's rather painful to break this assumption.> -- > dwmw2 >
David Woodhouse
2016-Apr-28 15:48 UTC
[PATCH V2 RFC] fixup! virtio: convert to use DMA api
On Thu, 2016-04-28 at 18:37 +0300, Michael S. Tsirkin wrote:> OK, so for intel, it seems that it's enough to set > pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO; > for the device.Yes, currently. Although that's vile. In fact what we *want* to happen is for the intel-iommu code simply to decline to provide DMA ops for this device, and let it fall back to the swiotlb or no-op DMA ops, as appropriate. As it is, we have the intel-iommu DMA ops *unconditionally, and they have a hack to manually fall back to calling swiotlb. It's all just horrid, which is why I want to clean it up with nice per-device DMA ops and discovery thereof :)> Do I have to poke at each iommu implementation to find > a way to do this, or is there some way to do it > portably?There *will* be.... Christoph has already done some of the cleanup in this space, and I need to take stock of what he's already done, and finish off the parts I want to build on top of it.> Not exactly - I think that future versions of qemu might lie > about some devices but not others.Can we keep this simple? QEMU currently lies about some devices. Let's implement a heuristic for the guest OS to know about that, and react accordingly. Then let's fix QEMU to tell the truth. All the time, unconditionally. Even on POWER/ARM where there's no obvious *way* for it to tell the truth (because you don't have the flexibility that DMAR tables do), and we need to devise a way to put it in the device-tree or fwcfg or something else. And only once QEMU consistently tells the *truth*, then we can start to do new stuff and let it actually change its behaviour.> DMAR is unfortunately not a good match for what people do with QEMU. > > There is a patchset on list fixing translation of assigned > devices. So the fix for these will simply be to do translation for > all assigned devices. It's harder for virtio as it isn't always > processed in QEMU - there's vhost in kernel and an out of process > vhost-user plugin. So we can end up e.g. with modern QEMU which > does translate in-process virtio but not out of process one.Right... just stop. Fix QEMU to tell the truth first, and *then* once we can trust it, we can start to change its behaviour. :)> Unfortunately people got used to be able to put any device > in any slot, and built external tools around that ability. > It's rather painful to break this assumption.Well, if you just said you have a patch set which allows translation of assigned devices then you are most of the way there, aren't you? We just need to fix the out-of-process virtio case, and everything can be either translated or untranslated? -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5760 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160428/03529ee6/attachment.bin>
On 28/04/2016 17:37, Michael S. Tsirkin wrote:> > All the internally-emulated devices *can* be either translated or > > untranslated. That's just a matter of software. Surely, you currently > > *can't* have translated assigned devices (until someone implements the > > whole VT-d page table shadowing or whatever), so you'll be barred from > > assigning a device to a slot which *previously* had an untranslated > > device. But so what? Put it in a different slot instead. > > Unfortunately people got used to be able to put any device > in any slot, and built external tools around that ability. > It's rather painful to break this assumption.Once you move to PCIe, a lot of things become more complicated. This is just one of them; instead of needing half a dozen PCI bridges, you'll need half a dozen plus one. Paolo
Possibly Parallel Threads
- [PATCH V2 RFC] fixup! virtio: convert to use DMA api
- [PATCH V2 RFC] fixup! virtio: convert to use DMA api
- [PATCH V2 RFC] fixup! virtio: convert to use DMA api
- [PATCH V2 RFC] fixup! virtio: convert to use DMA api
- [PATCH V2 RFC] fixup! virtio: convert to use DMA api