Benjamin Herrenschmidt
2014-Sep-02 23:20 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Tue, 2014-09-02 at 16:11 -0700, Andy Lutomirski wrote:> I don't think so. I would argue that it's a straight-up bug for QEMU > to expose a physically-addressed virtio-pci device to the guest behind > an emulated IOMMU. QEMU may already be doing that on ppc64, but it > isn't on x86_64 or arm (yet).Last I looked, it does on everything, it bypasses the DMA layer in qemu which is where IOMMUs are implemented.> On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for > everything except the virtio-pci devices. The ACPI DMAR stuff is > quite expressive.Well, *except* virtio, exactly...> On ARM, I hope the QEMU will never implement a PCI IOMMU. As far as I > could tell when I looked last week, none of the newer QEMU-emulated > ARM machines even support PCI. Even if QEMU were to implement a PCI > IOMMU on some future ARM machine, it could continue using virtio-mmio > for virtio devices.Possibly...> So ppc might actually be the only system that has or will have > physically-addressed virtio PCI devices that are behind an IOMMU. Can > this be handled in a ppc64-specific way?I wouldn't be so certain, as I said, the way virtio is implemented in qemu bypass the DMA layer which is where IOMMUs sit. The fact that currently x86 doesn't put an IOMMU there is not even garanteed, is it ? What happens if you try to mix and match virtio and other emulated devices that require the iommu on the same bus ? If we could discriminate virtio devices to a specific host bridge and guarantee no mix & match, we could probably add a concept of "IOMMU-less" bus but that would require guest changes which limits the usefulness.> Is there any way that the > kernel can distinguish a QEMU-provided virtio PCI device from a > physical PCIe thing?Not with existing guests which cannot be changed. Existing distros are out with those drivers. If we add a backward compatibility mechanism, then we could add something yes, provided we can segregate virtio onto a dedicated host bridge (which can be a problem with the libvirt trainwreck...)> It would be kind of nice to address this without > adding complexity to the virtio spec. Maybe virtio 1.0 devices could > be assumed to use bus addressing unless a new devicetree property says > otherwise.Cheers, Ben.
Benjamin Herrenschmidt
2014-Sep-03 00:25 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Tue, 2014-09-02 at 16:42 -0700, Andy Lutomirski wrote:> But there aren't any ACPI systems with both virtio-pci and IOMMUs, > right? So we could say that, henceforth, ACPI systems must declare > whether virtio-pci devices live behind IOMMUs without breaking > backward compatibility.I don't know for sure whether that's the case and whether we can rely on that not happening, we'll need x86 folks opinion here.> >> On ARM, I hope the QEMU will never implement a PCI IOMMU. As far as I > >> could tell when I looked last week, none of the newer QEMU-emulated > >> ARM machines even support PCI. Even if QEMU were to implement a PCI > >> IOMMU on some future ARM machine, it could continue using virtio-mmio > >> for virtio devices. > > > > Possibly... > > > >> So ppc might actually be the only system that has or will have > >> physically-addressed virtio PCI devices that are behind an IOMMU. Can > >> this be handled in a ppc64-specific way? > > > > I wouldn't be so certain, as I said, the way virtio is implemented in > > qemu bypass the DMA layer which is where IOMMUs sit. The fact that > > currently x86 doesn't put an IOMMU there is not even garanteed, is it ? > > What happens if you try to mix and match virtio and other emulated > > devices that require the iommu on the same bus ? > > AFAIK QEMU doesn't support IOMMUs at all on x86, so current versions > of QEMU really do guarantee that virtio-pci on x86 has no IOMMU, even > if that guarantee is purely accidental.Right.> > If we could discriminate virtio devices to a specific host bridge and > > guarantee no mix & match, we could probably add a concept of > > "IOMMU-less" bus but that would require guest changes which limits the > > usefulness. > > > >> Is there any way that the > >> kernel can distinguish a QEMU-provided virtio PCI device from a > >> physical PCIe thing? > > > > Not with existing guests which cannot be changed. Existing distros are > > out with those drivers. If we add a backward compatibility mechanism, > > then we could add something yes, provided we can segregate virtio onto a > > dedicated host bridge (which can be a problem with the libvirt > > trainwreck...) > > Ugh. > > So here's an ugly proposal: > > Step 1: Make virtio-pci use the DMA API only on x86. This will at > least fix Xen and people experimenting with virtio hardware on x86, > and it won't break anything, since there are no emulated IOMMUs on > x86.I think we should make all virtio drivers use the DMA API and just have different set of dma_ops. We can make a simple ifdef powerpc if needed in virtio-pci that force the dma-ops of the device to some direct "bypass" ops at init time. That way no need to select whether to use the DMA API or not, just always use it, and add a tweak to replace the DMA ops with the direct ones on the archs/platforms that need that. That was my original proposal and I still think it's the best approach.> Step 2: Update the virtio spec. Virtio 1.0 PCI devices should set a > new bit if they are physically addressed. If that bit is clear, then > the device is assumed to be addressed in accordance with the > platform's standard addressing model for PCI. Presumably this would > be something like VIRTIO_F_BUS_ADDRESSING = 33, and the spec would say > something like "Physical devices compatible with this specification > MUST offer VIRTIO_F_BUS_ADDRESSING. Drivers MUST implement this > feature." Alternatively, this could live in a PCI configuration > capability.I'll let you sort that out with Rusty but it makes sense.> Step 3: Update virtio-pci to use the DMA API for all devices on x86 > and for devices that advertise bus addressing on other architectures. > > I think this proposal will work, but I also think it sucks and I'd > really like to see a better counter-proposal.As I said, make it always use the DMA API, but add a quirk to replace the dma_ops with some NULL ops on platforms that need it. The only issue with that is the location of the dma ops is arch specific, so that one function will contain some ifdefs, but the rest of the code can just use the DMA API. Cheers, Ben.
Possibly Parallel Threads
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API