On Mon, 2016-04-18 at 17:23 +0300, Michael S. Tsirkin wrote:> > This patch doesn't change DMAR tables, it creates a way for virtio > device to tell guest "I obey what DMAR tables tell you, you can stop > doing hacks". > > And as PPC guys seem adamant that platform tools there are no good for > that purpose, there's another bit that says "ignore what platform tells > you, I'm not a real device - I'm part of hypervisor and I bypass the > IOMMU".... +/* Request IOMMU passthrough (if available) + * Without VIRTIO_F_IOMMU_PLATFORM: bypass the IOMMU even if enabled. + * With VIRTIO_F_IOMMU_PLATFORM: suggest disabling IOMMU. + */ +#define VIRTIO_F_IOMMU_PASSTHROUGH?????33 + +/* Do not bypass the IOMMU (if configured) */ +#define VIRTIO_F_IOMMU_PLATFORM????????????????34 OK... let's see if I can reconcile those descriptions coherently. Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that its own operating system's IOMMU code is expected to be broken, and that the virtio driver should eschew the DMA API? And that the guest OS cannot further assign the affected device to any of *its* nested guests? Not that the broken IOMMU code in said guest OS will know the latter, of course. With VIRTIO_F_IOMMU_PLATFORM set, VIRTIO_F_IOMMU_PASSTHROUGH is just a *hint*, suggesting that the guest OS should *request* a passthrough mapping from the IOMMU? Via a driver??IOMMU API which doesn't yet exist in Linux, since we only have 'iommu=pt' on the command line for that? And having *neither* of those bits sets is the status quo, which means that your OS code might well be broken and need you to eschew the DMA API, but maybe not. --? dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160418/d6b8e694/attachment-0001.bin>
Michael S. Tsirkin
2016-Apr-18 15:30 UTC
[PATCH RFC] fixup! virtio: convert to use DMA api
On Mon, Apr 18, 2016 at 11:22:03AM -0400, David Woodhouse wrote:> On Mon, 2016-04-18 at 17:23 +0300, Michael S. Tsirkin wrote: > > > > This patch doesn't change DMAR tables, it creates a way for virtio > > device to tell guest "I obey what DMAR tables tell you, you can stop > > doing hacks". > > > > And as PPC guys seem adamant that platform tools there are no good for > > that purpose, there's another bit that says "ignore what platform tells > > you, I'm not a real device - I'm part of hypervisor and I bypass the > > IOMMU". > > ... > > +/* Request IOMMU passthrough (if available) > + * Without VIRTIO_F_IOMMU_PLATFORM: bypass the IOMMU even if enabled. > + * With VIRTIO_F_IOMMU_PLATFORM: suggest disabling IOMMU. > + */ > +#define VIRTIO_F_IOMMU_PASSTHROUGH?????33 > + > +/* Do not bypass the IOMMU (if configured) */ > +#define VIRTIO_F_IOMMU_PLATFORM????????????????34 > > OK... let's see if I can reconcile those descriptions coherently. > > Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that > its own operating system's IOMMU code is expected to be broken, and > that the virtio driver should eschew the DMA API?No - it tells guest that e.g. the ACPI tables (or whatever the equivalent is) do not match reality with respect to this device since IOMMU is ignored by hypervisor. Hypervisor has no idea what does guest IOMMU code do - hopefully it is not actually broken.> And that the guest OS > cannot further assign the affected device to any of *its* nested > guests? Not that the broken IOMMU code in said guest OS will know the > latter, of course. > > With VIRTIO_F_IOMMU_PLATFORM set, VIRTIO_F_IOMMU_PASSTHROUGH is just a > *hint*, suggesting that the guest OS should *request* a passthrough > mapping from the IOMMU?Right. But it'll work correctly if you don't.> Via a driver??IOMMU API which doesn't yet exist > in Linux, since we only have 'iommu=pt' on the command line for that? > > And having *neither* of those bits sets is the status quo, which means > that your OS code might well be broken and need you to eschew the DMA > API, but maybe not.The status quo is that that the IOMMU might well be bypassed and then you need to program physical addresses into the device, but maybe not. If DMA API does not give you physical addresses, you need to bypass it, but hypervisor does not know or care.> > --? > dwmw2 > >
On Mon, 2016-04-18 at 18:30 +0300, Michael S. Tsirkin wrote:> > > Setting (only) VIRTIO_F_IOMMU_PASSTHROUGH indicates to the guest that > > its own operating system's IOMMU code is expected to be broken, and > > that the virtio driver should eschew the DMA API? > > No - it tells guest that e.g. the ACPI tables (or whatever the > equivalent is) do not match reality with respect to this device > since IOMMU is ignored by hypervisor. > Hypervisor has no idea what does guest IOMMU code do - hopefully > it is not actually broken.OK, that makes sense ? thanks. So where the platform *does* have a way to coherently tell the guest that some devices are behind and IOMMU and some aren't, we should never see VIRTIO_F_IOMMU_PASSTHROUGH && !VIRTIO_F_IOMMU_PLATFORM. (Except perhaps temporarily on x86 until we *do* fix the DMAR tables to tell the truth; qv.) This should *only* be a crutch for platforms which cannot properly convey that information from the hypervisor to the guest. It should be clearly documented "thou shalt not use this unless you've first attempted to fix the broken platform to get it right for itself". And if we look at it as such... does it make more sense for this to be a more *generic* qemu??guest interface? That way the software hacks can live in the OS IOMMU code where they belong, and prevent assignment to nested guests for example. And can cover cases like assigned PCI devices in existing qemu/x86 which need the same treatment. Put another way: if we're going to add code to the guest OS to look at this information, why can't we add that code in the guest's IOMMU support instead, to look at an out-of-band qemu-specific "ignore IOMMU for these devices" list instead?> The status quo is that that the IOMMU might well be bypassed > and then you need to program physical addresses into the device, > but maybe not. If DMA API does not give you physical addresses, you > need to bypass it, but hypervisor does not know or care.Right. The status quo is that qemu doesn't provide correct information about IOMMU topology to guests, and they have to have heuristics to work out whether to eschew the IOMMU for a given device or not. This is true for virtio and assigned PCI devices alike. Furthermore, some platforms don't *have* a standard way for qemu to 'tell the truth' to the guests, and that's where the real fun comes in. But still, I'd like to see a generic solution for that lack instead of a virtio-specific hack. -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160418/66393474/attachment.bin>