Benjamin Herrenschmidt
2014-Sep-03 00:25 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Tue, 2014-09-02 at 16:42 -0700, Andy Lutomirski wrote:> But there aren't any ACPI systems with both virtio-pci and IOMMUs, > right? So we could say that, henceforth, ACPI systems must declare > whether virtio-pci devices live behind IOMMUs without breaking > backward compatibility.I don't know for sure whether that's the case and whether we can rely on that not happening, we'll need x86 folks opinion here.> >> On ARM, I hope the QEMU will never implement a PCI IOMMU. As far as I > >> could tell when I looked last week, none of the newer QEMU-emulated > >> ARM machines even support PCI. Even if QEMU were to implement a PCI > >> IOMMU on some future ARM machine, it could continue using virtio-mmio > >> for virtio devices. > > > > Possibly... > > > >> So ppc might actually be the only system that has or will have > >> physically-addressed virtio PCI devices that are behind an IOMMU. Can > >> this be handled in a ppc64-specific way? > > > > I wouldn't be so certain, as I said, the way virtio is implemented in > > qemu bypass the DMA layer which is where IOMMUs sit. The fact that > > currently x86 doesn't put an IOMMU there is not even garanteed, is it ? > > What happens if you try to mix and match virtio and other emulated > > devices that require the iommu on the same bus ? > > AFAIK QEMU doesn't support IOMMUs at all on x86, so current versions > of QEMU really do guarantee that virtio-pci on x86 has no IOMMU, even > if that guarantee is purely accidental.Right.> > If we could discriminate virtio devices to a specific host bridge and > > guarantee no mix & match, we could probably add a concept of > > "IOMMU-less" bus but that would require guest changes which limits the > > usefulness. > > > >> Is there any way that the > >> kernel can distinguish a QEMU-provided virtio PCI device from a > >> physical PCIe thing? > > > > Not with existing guests which cannot be changed. Existing distros are > > out with those drivers. If we add a backward compatibility mechanism, > > then we could add something yes, provided we can segregate virtio onto a > > dedicated host bridge (which can be a problem with the libvirt > > trainwreck...) > > Ugh. > > So here's an ugly proposal: > > Step 1: Make virtio-pci use the DMA API only on x86. This will at > least fix Xen and people experimenting with virtio hardware on x86, > and it won't break anything, since there are no emulated IOMMUs on > x86.I think we should make all virtio drivers use the DMA API and just have different set of dma_ops. We can make a simple ifdef powerpc if needed in virtio-pci that force the dma-ops of the device to some direct "bypass" ops at init time. That way no need to select whether to use the DMA API or not, just always use it, and add a tweak to replace the DMA ops with the direct ones on the archs/platforms that need that. That was my original proposal and I still think it's the best approach.> Step 2: Update the virtio spec. Virtio 1.0 PCI devices should set a > new bit if they are physically addressed. If that bit is clear, then > the device is assumed to be addressed in accordance with the > platform's standard addressing model for PCI. Presumably this would > be something like VIRTIO_F_BUS_ADDRESSING = 33, and the spec would say > something like "Physical devices compatible with this specification > MUST offer VIRTIO_F_BUS_ADDRESSING. Drivers MUST implement this > feature." Alternatively, this could live in a PCI configuration > capability.I'll let you sort that out with Rusty but it makes sense.> Step 3: Update virtio-pci to use the DMA API for all devices on x86 > and for devices that advertise bus addressing on other architectures. > > I think this proposal will work, but I also think it sucks and I'd > really like to see a better counter-proposal.As I said, make it always use the DMA API, but add a quirk to replace the dma_ops with some NULL ops on platforms that need it. The only issue with that is the location of the dma ops is arch specific, so that one function will contain some ifdefs, but the rest of the code can just use the DMA API. Cheers, Ben.
Andy Lutomirski
2014-Sep-03 00:32 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Tue, Sep 2, 2014 at 5:25 PM, Benjamin Herrenschmidt <benh at au1.ibm.com> wrote:> On Tue, 2014-09-02 at 16:42 -0700, Andy Lutomirski wrote: >> So here's an ugly proposal: >> >> Step 1: Make virtio-pci use the DMA API only on x86. This will at >> least fix Xen and people experimenting with virtio hardware on x86, >> and it won't break anything, since there are no emulated IOMMUs on >> x86. > > I think we should make all virtio drivers use the DMA API and just have > different set of dma_ops. We can make a simple ifdef powerpc if needed > in virtio-pci that force the dma-ops of the device to some direct > "bypass" ops at init time. > > That way no need to select whether to use the DMA API or not, just > always use it, and add a tweak to replace the DMA ops with the direct > ones on the archs/platforms that need that. That was my original > proposal and I still think it's the best approach.I agree *except* that implementing it will be a real PITA and (I think) can't be done without changing code in arch/. My patches plus an ifdef powerpc will be functionally equivalent, just uglier.> > As I said, make it always use the DMA API, but add a quirk to replace > the dma_ops with some NULL ops on platforms that need it. > > The only issue with that is the location of the dma ops is arch > specific, so that one function will contain some ifdefs, but the rest of > the code can just use the DMA API.Bigger quirk: on a standard s390 virtio guest configuration, dma_map_single etc will fail to link. I tried this in v1 of these patches. So we can poke at the archdata all day, but we can't build a kernel like that :( So until the dma_ops pointer move into struct device and CONFIG_HAS_DMA becomes mandatory (or mandatory enough that virtio can depend on it), I don't think we can do it this way. I'll send a v5 that is the same as v4 except with physical addressing hardcoded in for powerpc. --Andy
Benjamin Herrenschmidt
2014-Sep-03 00:43 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Tue, 2014-09-02 at 17:32 -0700, Andy Lutomirski wrote:> > I agree *except* that implementing it will be a real PITA and (I > think) can't be done without changing code in arch/. My patches plus > an ifdef powerpc will be functionally equivalent, just uglier.So for powerpc, it's a 2 liner inside virtio-pci, but yes, it might be more of a problem for s390, I'm not too sure what they do in that area.> Bigger quirk: on a standard s390 virtio guest configuration, > dma_map_single etc will fail to link.Yuck> I tried this in v1 of these > patches. So we can poke at the archdata all day, but we can't build a > kernel like that :(I would like the s390 people to chime in here, it still looks like the best way to go if they can fix things on their side :-)> So until the dma_ops pointer move into struct device and > CONFIG_HAS_DMA becomes mandatory (or mandatory enough that virtio can > depend on it), I don't think we can do it this way.I see, it's a bummer because it would be a lot cleaner.> I'll send a v5 that is the same as v4 except with physical addressing > hardcoded in for powerpc.Thanks. That will do for now, but ideally we want to make it a function of some flag from the implementation, so let's see what Rusty has to say. Cheers, Ben.
Paolo Bonzini
2014-Sep-03 07:47 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
Il 03/09/2014 02:25, Benjamin Herrenschmidt ha scritto:> > But there aren't any ACPI systems with both virtio-pci and IOMMUs, > > right? So we could say that, henceforth, ACPI systems must declare > > whether virtio-pci devices live behind IOMMUs without breaking > > backward compatibility. > > I don't know for sure whether that's the case and whether we can rely on > that not happening, we'll need x86 folks opinion here.IOMMU support for x86 is going to go in this week. However, it is and likely will remain niche enough that I don't really care about performance loss from IOMMU support. If you enable it, you want it. So from the QEMU point of view we can simply add the direct-ram-access property, and have the pseries machine turn it on by default (while other machines can leave it off by default---they have no IOMMU and thus no performance cost). Paolo
Andy Lutomirski
2014-Sep-03 07:52 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Wed, Sep 3, 2014 at 12:47 AM, Paolo Bonzini <pbonzini at redhat.com> wrote:> Il 03/09/2014 02:25, Benjamin Herrenschmidt ha scritto: >> > But there aren't any ACPI systems with both virtio-pci and IOMMUs, >> > right? So we could say that, henceforth, ACPI systems must declare >> > whether virtio-pci devices live behind IOMMUs without breaking >> > backward compatibility. >> >> I don't know for sure whether that's the case and whether we can rely on >> that not happening, we'll need x86 folks opinion here. > > IOMMU support for x86 is going to go in this week. >Can you try to make sure that qemu-system-x86_64 -device iommu -device virtio-balloon-pci (or whatever the syntax is) doesn't put the virtio-pci device behind the IOMMU? Because, if it does, then the kernel will have to support that, and it'll be messy. --Andy
Benjamin Herrenschmidt
2014-Sep-03 08:05 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Wed, 2014-09-03 at 09:47 +0200, Paolo Bonzini wrote:> > IOMMU support for x86 is going to go in this week.But won't that break virtio on x86 ? Or will virtio continue bypassing it ? IE, the guest side virtio doesn't expect an IOMMU and doesn't call the dma mappings ops.> However, it is and likely will remain niche enough that I don't really > care about performance loss from IOMMU support. If you enable it, you > want it. > > So from the QEMU point of view we can simply add the direct-ram-access > property, and have the pseries machine turn it on by default (while > other machines can leave it off by default---they have no IOMMU and > thus no performance cost).Well, it's only for virtio and should be on by default on x86 as well if an iommu is installed no ? Cheers, Ben.
Apparently Analagous Threads
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API