thr3ads.net - Virtualization - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2014-Sep-02 23:11 UTC

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 3:10 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Tue, 2014-09-02 at 14:37 -0700, Andy Lutomirski wrote:
>
>> Let's take a step back from from the implementation.  What is a
driver
>> for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
>> supposed to do on ppc64?
>
> Today, it's supposed to send guest physical addresses. We can make that
> optional via some nego or capabilities to support more esoteric setups
> but for backward compatibility, this must remain the default behaviour.
I think it only needs to remain the default in cases where the
alternative (bus addressing) won't work.  I think that, so far, this
is just ppc64.  But see below...
>
> My suggestion was that it might be a cleaner approach to do that by
> having the individual virtio drivers always use the dma_map_* API, and
> limiting the kludgery to a combination of virtio_pci "core" and
arch
> code by selecting an appropriate set of dma_map_ops, defaulting with a
> "transparent" (or direct) one as our current default case (and
thus
> overriding the iommu ones provided by the arch).
I think the cleanest way of all would be to get the bus drivers to do
the right thing so that all of the virtio code can just use the dma
api.  I don't know whether this is achievable.
>
>>   We could teach virtio_pci
>> to use physical addressing on ppc64, but that seems like a pretty
>> awful hack, and it'll start needing quirks as soon as someone tries
to
>> plug a virtio-speaking PCI card into a ppc64 machine.
>
> But x86_64 is the same no ? The day it starts growing an iommu emulation
> in qemu (and I've heard it's happening) it will still want to do
direct
> bypass for virtio for performance.
I don't think so.  I would argue that it's a straight-up bug for QEMU
to expose a physically-addressed virtio-pci device to the guest behind
an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
isn't on x86_64 or arm (yet).

On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
everything except the virtio-pci devices.  The ACPI DMAR stuff is
quite expressive.

On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
could tell when I looked last week, none of the newer QEMU-emulated
ARM machines even support PCI.  Even if QEMU were to implement a PCI
IOMMU on some future ARM machine, it could continue using virtio-mmio
for virtio devices.

So ppc might actually be the only system that has or will have
physically-addressed virtio PCI devices that are behind an IOMMU.  Can
this be handled in a ppc64-specific way?  Is there any way that the
kernel can distinguish a QEMU-provided virtio PCI device from a
physical PCIe thing?  It would be kind of nice to address this without
adding complexity to the virtio spec.  Maybe virtio 1.0 devices could
be assumed to use bus addressing unless a new devicetree property says
otherwise.

--Andy

Benjamin Herrenschmidt

2014-Sep-02 23:20 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, 2014-09-02 at 16:11 -0700, Andy Lutomirski wrote:
> I don't think so.  I would argue that it's a straight-up bug for
QEMU
> to expose a physically-addressed virtio-pci device to the guest behind
> an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
> isn't on x86_64 or arm (yet).
Last I looked, it does on everything, it bypasses the DMA layer in qemu
which is where IOMMUs are implemented.
> On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
> everything except the virtio-pci devices.  The ACPI DMAR stuff is
> quite expressive.
Well, *except* virtio, exactly...
> On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
> could tell when I looked last week, none of the newer QEMU-emulated
> ARM machines even support PCI.  Even if QEMU were to implement a PCI
> IOMMU on some future ARM machine, it could continue using virtio-mmio
> for virtio devices.
Possibly...
> So ppc might actually be the only system that has or will have
> physically-addressed virtio PCI devices that are behind an IOMMU.  Can
> this be handled in a ppc64-specific way?
I wouldn't be so certain, as I said, the way virtio is implemented in
qemu bypass the DMA layer which is where IOMMUs sit. The fact that
currently x86 doesn't put an IOMMU there is not even garanteed, is it ?
What happens if you try to mix and match virtio and other emulated
devices that require the iommu on the same bus ?

If we could discriminate virtio devices to a specific host bridge and
guarantee no mix & match, we could probably add a concept of
"IOMMU-less" bus but that would require guest changes which limits the
usefulness.
>   Is there any way that the
> kernel can distinguish a QEMU-provided virtio PCI device from a
> physical PCIe thing? 
Not with existing guests which cannot be changed. Existing distros are
out with those drivers. If we add a backward compatibility mechanism,
then we could add something yes, provided we can segregate virtio onto a
dedicated host bridge (which can be a problem with the libvirt
trainwreck...)
>  It would be kind of nice to address this without
> adding complexity to the virtio spec.  Maybe virtio 1.0 devices could
> be assumed to use bus addressing unless a new devicetree property says
> otherwise.
Cheers,
Ben.

Andy Lutomirski

2014-Sep-02 23:42 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 4:20 PM, Benjamin Herrenschmidt <benh at
au1.ibm.com> wrote:> On Tue, 2014-09-02 at 16:11 -0700, Andy Lutomirski wrote:
>
>> I don't think so.  I would argue that it's a straight-up bug
for QEMU
>> to expose a physically-addressed virtio-pci device to the guest behind
>> an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
>> isn't on x86_64 or arm (yet).
>
> Last I looked, it does on everything, it bypasses the DMA layer in qemu
> which is where IOMMUs are implemented.
I believe you, but I'm not convinced that this means much from the
guest's POV, except on ppc64.
>
>> On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
>> everything except the virtio-pci devices.  The ACPI DMAR stuff is
>> quite expressive.
>
> Well, *except* virtio, exactly...
But there aren't any ACPI systems with both virtio-pci and IOMMUs,
right?  So we could say that, henceforth, ACPI systems must declare
whether virtio-pci devices live behind IOMMUs without breaking
backward compatibility.
>
>> On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
>> could tell when I looked last week, none of the newer QEMU-emulated
>> ARM machines even support PCI.  Even if QEMU were to implement a PCI
>> IOMMU on some future ARM machine, it could continue using virtio-mmio
>> for virtio devices.
>
> Possibly...
>
>> So ppc might actually be the only system that has or will have
>> physically-addressed virtio PCI devices that are behind an IOMMU.  Can
>> this be handled in a ppc64-specific way?
>
> I wouldn't be so certain, as I said, the way virtio is implemented in
> qemu bypass the DMA layer which is where IOMMUs sit. The fact that
> currently x86 doesn't put an IOMMU there is not even garanteed, is it ?
> What happens if you try to mix and match virtio and other emulated
> devices that require the iommu on the same bus ?
AFAIK QEMU doesn't support IOMMUs at all on x86, so current versions
of QEMU really do guarantee that virtio-pci on x86 has no IOMMU, even
if that guarantee is purely accidental.
>
> If we could discriminate virtio devices to a specific host bridge and
> guarantee no mix & match, we could probably add a concept of
> "IOMMU-less" bus but that would require guest changes which
limits the
> usefulness.
>
>>   Is there any way that the
>> kernel can distinguish a QEMU-provided virtio PCI device from a
>> physical PCIe thing?
>
> Not with existing guests which cannot be changed. Existing distros are
> out with those drivers. If we add a backward compatibility mechanism,
> then we could add something yes, provided we can segregate virtio onto a
> dedicated host bridge (which can be a problem with the libvirt
> trainwreck...)
Ugh.

So here's an ugly proposal:

Step 1: Make virtio-pci use the DMA API only on x86.  This will at
least fix Xen and people experimenting with virtio hardware on x86,
and it won't break anything, since there are no emulated IOMMUs on
x86.

Step 2: Update the virtio spec.  Virtio 1.0 PCI devices should set a
new bit if they are physically addressed.  If that bit is clear, then
the device is assumed to be addressed in accordance with the
platform's standard addressing model for PCI.  Presumably this would
be something like VIRTIO_F_BUS_ADDRESSING = 33, and the spec would say
something like "Physical devices compatible with this specification
MUST offer VIRTIO_F_BUS_ADDRESSING.  Drivers MUST implement this
feature."  Alternatively, this could live in a PCI configuration
capability.

Step 3: Update virtio-pci to use the DMA API for all devices on x86
and for devices that advertise bus addressing on other architectures.

I think this proposal will work, but I also think it sucks and I'd
really like to see a better counter-proposal.


--Andy

Paolo Bonzini

2014-Sep-03 07:43 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Il 03/09/2014 01:20, Benjamin Herrenschmidt ha scritto:> I wouldn't be so certain, as I said, the way virtio is implemented in
> qemu bypass the DMA layer which is where IOMMUs sit. The fact that
> currently x86 doesn't put an IOMMU there is not even garanteed, is it ?
> What happens if you try to mix and match virtio and other emulated
> devices that require the iommu on the same bus ?
As far as QEMU is concerned, it's trivial to add a property like
"direct-ram-access" that selects whether to bypass the IOMMU or not.
And it would have zero performance cost if direct RAM access is enabled,
compared to the current code.

If possible, I would quirk it in the PPC code.

Paolo

Possibly Parallel Threads

Search for more possibly parallel threads

Virtualization - Sep 2014 - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Possibly Parallel Threads