thr3ads.net - Linux Virtualization - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2014-Sep-02 05:55 UTC

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Mon, Sep 1, 2014 at 3:16 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Mon, 2014-09-01 at 10:39 -0700, Andy Lutomirski wrote:
>> Changes from v1:
>>  - Using the DMA API is optional now.  It would be nice to improve the
>>    DMA API to the point that it could be used unconditionally, but s390
>>    proves that we're not there yet.
>>  - Includes patch 4, which fixes DMA debugging warnings from
virtio_net.
>
> I'm not sure if you saw my reply on the other thread but I have a few
> comments based on the above "it would be nice if ..."
>
Yeah, sorry, I sort of thought I responded, but I didn't do a very good job.
> So here we have both a yes and a no :-)
>
> It would be nice to avoid those if () games all over and indeed just
> use the DMA API, *however* we most certainly don't want to actually
> create IOMMU mappings for the KVM virio case. This would be a massive
> loss in performances on several platforms and generally doesn't make
> much sense.
>
> However, we can still use the API without that on any architecture
> where the dma mapping API ends up calling the generic dma_map_ops,
> it becomes just a matter of virtio setting up some special "nop"
ops
> when needed.
I'm not quite convinced that this is a good idea.  I think that there
are three relevant categories of virtio devices:

a) Any virtio device where the normal DMA ops are nops.  This includes
x86 without an IOMMU (e.g. in a QEMU/KVM guest), 32-bit ARM, and
probably many other architectures.  In this case, what we do only
matters for performance, not for correctness.  Ideally the arch DMA
ops are fast.

b) Virtio devices that use physical addressing on systems where DMA
ops either don't exist at all (most s390) or do something nontrivial.
In this case, we must either override the DMA ops or just not use
them.

c) Virtio devices that use bus addressing.  This includes everything
on Xen (because the "physical" addresses are nonsense) and any actual
physical PCI device that speaks virtio on a system with an IOMMU.  In
this case, we must use the DMA ops.

The issue is that, on systems with DMA ops that do something, we need
to make sure that we know whether we're in case (b) or (c).  In these
patches, I've made the assumption that, if the virtio devices lives on
the PCI bus, then it uses the same type of addressing that any other
device on that PCI bus would use.

On x86, at least, I doubt that we'll ever see a physically addressed
PCI virtio device for which ACPI advertises an IOMMU, since any sane
hypervisor will just not advertise an IOMMU for the virtio device.
But are there arm64 or PPC guests that use virtio_pci, that have
IOMMUs, and that will malfunction if the virtio_pci driver ends up
using the IOMMU?  I certainly hope not, since these systems might be
very hard-pressed to work right if someone plugged in a physical
virtio-speaking PCI device.
>
> The difficulty here resides in the fact that we have never completely
> made the dma_map_ops generic. The ops themselves are defined generically
> as are the dma_map_* interfaces based on them, but the location of the
> ops pointer is still more/less arch specific and some architectures
> still chose not to use that indirection at all I believe.
>
I'd be happy to update the patches if someone does this, but I don't
really want to attack the DMA API on all architectures right now.  In
the mean time, at least s390 requires that we be able to compile out
the DMA API calls.  I'd rather see s390 provide working no-op dma ops
for all of the struct devices that provide virtio interfaces.

On a related note, shouldn't virtio be doing something to provide dma
ops to the virtio device and any of its children?  I don't know how it
would even try to do this, given how architecture-dependent this code
currently is.  Calling dma_map_single on the virtio device (as opposed
to its parent) is currently likely to crash on x86.  Fortunately,
nothing does this.

--Andy

Benjamin Herrenschmidt

2014-Sep-02 20:53 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski
wrote:> 
> On x86, at least, I doubt that we'll ever see a physically addressed
> PCI virtio device for which ACPI advertises an IOMMU, since any sane
> hypervisor will just not advertise an IOMMU for the virtio device.
> But are there arm64 or PPC guests that use virtio_pci, that have
> IOMMUs, and that will malfunction if the virtio_pci driver ends up
> using the IOMMU?  I certainly hope not, since these systems might be
> very hard-pressed to work right if someone plugged in a physical
> virtio-speaking PCI device.
It will definitely not work on ppc64. We always have IOMMUs on pseries,
all PCI busses do, and because it's a paravirtualized environment,
napping/unmapping pages means hypercalls -> expensive.

But our virtio implementation bypasses it in qemu, so if virtio-pci
starts using the DMA mapping API without changing the DMA ops under the
hood, it will break for us.

Cheers,
Ben.

Konrad Rzeszutek Wilk

2014-Sep-02 20:56 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Wed, Sep 03, 2014 at 06:53:33AM +1000, Benjamin Herrenschmidt
wrote:> On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
> > 
> > On x86, at least, I doubt that we'll ever see a physically
addressed
> > PCI virtio device for which ACPI advertises an IOMMU, since any sane
> > hypervisor will just not advertise an IOMMU for the virtio device.
> > But are there arm64 or PPC guests that use virtio_pci, that have
> > IOMMUs, and that will malfunction if the virtio_pci driver ends up
> > using the IOMMU?  I certainly hope not, since these systems might be
> > very hard-pressed to work right if someone plugged in a physical
> > virtio-speaking PCI device.
> 
> It will definitely not work on ppc64. We always have IOMMUs on pseries,
> all PCI busses do, and because it's a paravirtualized environment,
> napping/unmapping pages means hypercalls -> expensive.
> 
> But our virtio implementation bypasses it in qemu, so if virtio-pci
> starts using the DMA mapping API without changing the DMA ops under the
> hood, it will break for us.
What is the default dma_ops that the Linux guests start with as
guests under ppc64?

Thanks!> 
> Cheers,
> Ben.
> 
>

Michael S. Tsirkin

2014-Sep-02 21:10 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Mon, Sep 01, 2014 at 10:55:29PM -0700, Andy Lutomirski
wrote:> On Mon, Sep 1, 2014 at 3:16 PM, Benjamin Herrenschmidt
> <benh at kernel.crashing.org> wrote:
> > On Mon, 2014-09-01 at 10:39 -0700, Andy Lutomirski wrote:
> >> Changes from v1:
> >>  - Using the DMA API is optional now.  It would be nice to improve
the
> >>    DMA API to the point that it could be used unconditionally, but
s390
> >>    proves that we're not there yet.
> >>  - Includes patch 4, which fixes DMA debugging warnings from
virtio_net.
> >
> > I'm not sure if you saw my reply on the other thread but I have a
few
> > comments based on the above "it would be nice if ..."
> >
> 
> Yeah, sorry, I sort of thought I responded, but I didn't do a very good
job.
> 
> > So here we have both a yes and a no :-)
> >
> > It would be nice to avoid those if () games all over and indeed just
> > use the DMA API, *however* we most certainly don't want to
actually
> > create IOMMU mappings for the KVM virio case. This would be a massive
> > loss in performances on several platforms and generally doesn't
make
> > much sense.
> >
> > However, we can still use the API without that on any architecture
> > where the dma mapping API ends up calling the generic dma_map_ops,
> > it becomes just a matter of virtio setting up some special
"nop" ops
> > when needed.
> 
> I'm not quite convinced that this is a good idea.  I think that there
> are three relevant categories of virtio devices:
> 
> a) Any virtio device where the normal DMA ops are nops.  This includes
> x86 without an IOMMU (e.g. in a QEMU/KVM guest), 32-bit ARM, and
> probably many other architectures.  In this case, what we do only
> matters for performance, not for correctness.  Ideally the arch DMA
> ops are fast.
> 
> b) Virtio devices that use physical addressing on systems where DMA
> ops either don't exist at all (most s390) or do something nontrivial.
> In this case, we must either override the DMA ops or just not use
> them.
> 
> c) Virtio devices that use bus addressing.  This includes everything
> on Xen (because the "physical" addresses are nonsense) and any
actual
> physical PCI device that speaks virtio on a system with an IOMMU.  In
> this case, we must use the DMA ops.
> 
> The issue is that, on systems with DMA ops that do something, we need
> to make sure that we know whether we're in case (b) or (c).  In these
> patches, I've made the assumption that, if the virtio devices lives on
> the PCI bus, then it uses the same type of addressing that any other
> device on that PCI bus would use.
> 
> On x86, at least, I doubt that we'll ever see a physically addressed
> PCI virtio device for which ACPI advertises an IOMMU, since any sane
> hypervisor will just not advertise an IOMMU for the virtio device.
How exactly does one not advertise an IOMMU for a specific
device? Could you please clarify?
> But are there arm64 or PPC guests that use virtio_pci, that have
> IOMMUs, and that will malfunction if the virtio_pci driver ends up
> using the IOMMU?  I certainly hope not, since these systems might be
> very hard-pressed to work right if someone plugged in a physical
> virtio-speaking PCI device.
One simple fix is to defer this all until virtio 1.0.
virtio 1.0 has an alternative set of IDs for virtio pci,
that can be used if you are making an incompatible change.
We can use that if there's an iommu.

> >
> > The difficulty here resides in the fact that we have never completely
> > made the dma_map_ops generic. The ops themselves are defined
generically
> > as are the dma_map_* interfaces based on them, but the location of the
> > ops pointer is still more/less arch specific and some architectures
> > still chose not to use that indirection at all I believe.
> >
> 
> I'd be happy to update the patches if someone does this, but I
don't
> really want to attack the DMA API on all architectures right now.  In
> the mean time, at least s390 requires that we be able to compile out
> the DMA API calls.  I'd rather see s390 provide working no-op dma ops
> for all of the struct devices that provide virtio interfaces.
> 
> On a related note, shouldn't virtio be doing something to provide dma
> ops to the virtio device and any of its children?  I don't know how it
> would even try to do this, given how architecture-dependent this code
> currently is.  Calling dma_map_single on the virtio device (as opposed
> to its parent) is currently likely to crash on x86.  Fortunately,
> nothing does this.
> 
> --Andy

Andy Lutomirski

2014-Sep-02 21:37 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 1:53 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
>>
>> On x86, at least, I doubt that we'll ever see a physically
addressed
>> PCI virtio device for which ACPI advertises an IOMMU, since any sane
>> hypervisor will just not advertise an IOMMU for the virtio device.
>> But are there arm64 or PPC guests that use virtio_pci, that have
>> IOMMUs, and that will malfunction if the virtio_pci driver ends up
>> using the IOMMU?  I certainly hope not, since these systems might be
>> very hard-pressed to work right if someone plugged in a physical
>> virtio-speaking PCI device.
>
> It will definitely not work on ppc64. We always have IOMMUs on pseries,
> all PCI busses do, and because it's a paravirtualized environment,
> napping/unmapping pages means hypercalls -> expensive.
>
> But our virtio implementation bypasses it in qemu, so if virtio-pci
> starts using the DMA mapping API without changing the DMA ops under the
> hood, it will break for us.
>
Let's take a step back from from the implementation.  What is a driver
for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
supposed to do on ppc64?

It can send the device physical addresses and ignore the normal PCI
DMA semantics, which is what the current virtio_pci driver does.  This
seems like a layering violation, and this won't work if the device is
a real PCI device.  Alternatively, it can treat the device like any
other PCI device and use the IOMMU.  This is a bit slower, and it is
also incompatible with current hypervisors.

There really are virtio devices that are pieces of silicon and not
figments of a hypervisor's imagination [1].  We could teach virtio_pci
to use physical addressing on ppc64, but that seems like a pretty
awful hack, and it'll start needing quirks as soon as someone tries to
plug a virtio-speaking PCI card into a ppc64 machine.

Ideas?  x86 and arm seem to be safe here, since AFAIK there is no such
thing as a physically addressed virtio "PCI" device on a bus with an
IOMMU on x86, arm, or arm64.

[1] https://lwn.net/Articles/580186/
> Cheers,
> Ben.
>
>

-- 
Andy Lutomirski
AMA Capital Management, LLC

Andy Lutomirski

2014-Sep-02 21:49 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 2:10 PM, Michael S. Tsirkin <mst at redhat.com>
wrote:> On Mon, Sep 01, 2014 at 10:55:29PM -0700, Andy Lutomirski wrote:
>> On Mon, Sep 1, 2014 at 3:16 PM, Benjamin Herrenschmidt
>> <benh at kernel.crashing.org> wrote:
>> > On Mon, 2014-09-01 at 10:39 -0700, Andy Lutomirski wrote:
>> >> Changes from v1:
>> >>  - Using the DMA API is optional now.  It would be nice to
improve the
>> >>    DMA API to the point that it could be used unconditionally,
but s390
>> >>    proves that we're not there yet.
>> >>  - Includes patch 4, which fixes DMA debugging warnings from
virtio_net.
>> >
>> > I'm not sure if you saw my reply on the other thread but I
have a few
>> > comments based on the above "it would be nice if ..."
>> >
>>
>> Yeah, sorry, I sort of thought I responded, but I didn't do a very
good job.
>>
>> > So here we have both a yes and a no :-)
>> >
>> > It would be nice to avoid those if () games all over and indeed
just
>> > use the DMA API, *however* we most certainly don't want to
actually
>> > create IOMMU mappings for the KVM virio case. This would be a
massive
>> > loss in performances on several platforms and generally
doesn't make
>> > much sense.
>> >
>> > However, we can still use the API without that on any architecture
>> > where the dma mapping API ends up calling the generic dma_map_ops,
>> > it becomes just a matter of virtio setting up some special
"nop" ops
>> > when needed.
>>
>> I'm not quite convinced that this is a good idea.  I think that
there
>> are three relevant categories of virtio devices:
>>
>> a) Any virtio device where the normal DMA ops are nops.  This includes
>> x86 without an IOMMU (e.g. in a QEMU/KVM guest), 32-bit ARM, and
>> probably many other architectures.  In this case, what we do only
>> matters for performance, not for correctness.  Ideally the arch DMA
>> ops are fast.
>>
>> b) Virtio devices that use physical addressing on systems where DMA
>> ops either don't exist at all (most s390) or do something
nontrivial.
>> In this case, we must either override the DMA ops or just not use
>> them.
>>
>> c) Virtio devices that use bus addressing.  This includes everything
>> on Xen (because the "physical" addresses are nonsense) and
any actual
>> physical PCI device that speaks virtio on a system with an IOMMU.  In
>> this case, we must use the DMA ops.
>>
>> The issue is that, on systems with DMA ops that do something, we need
>> to make sure that we know whether we're in case (b) or (c).  In
these
>> patches, I've made the assumption that, if the virtio devices lives
on
>> the PCI bus, then it uses the same type of addressing that any other
>> device on that PCI bus would use.
>>
>> On x86, at least, I doubt that we'll ever see a physically
addressed
>> PCI virtio device for which ACPI advertises an IOMMU, since any sane
>> hypervisor will just not advertise an IOMMU for the virtio device.
>
> How exactly does one not advertise an IOMMU for a specific
> device? Could you please clarify?
See
https://software.intel.com/en-us/blogs/2009/09/11/decoding-the-dmar-tables-in-acpiiommu-part-2

I think that all that needs to happen is for ACPI to not list the
device in the scope of any drhd unit.  I don't know whether this works
correctly, but it looks like the iommu_dummy and the
init_no_remapping_devices code in intel-iommu.c exists for almost
exactly this purpose.
>
>> But are there arm64 or PPC guests that use virtio_pci, that have
>> IOMMUs, and that will malfunction if the virtio_pci driver ends up
>> using the IOMMU?  I certainly hope not, since these systems might be
>> very hard-pressed to work right if someone plugged in a physical
>> virtio-speaking PCI device.
>
> One simple fix is to defer this all until virtio 1.0.
> virtio 1.0 has an alternative set of IDs for virtio pci,
> that can be used if you are making an incompatible change.
> We can use that if there's an iommu.
How?  If someone builds a physical device compliant with the virtio
1.0 specification, how do can that device know whether it's behind an
IOMMU?  The IOMMU is part of the host (or Xen, sort of), not the PCI
device.  I suppose that virtio 1.0 could add a bit indicating that the
virtio device is a physical piece of hardware (presumably this should
be PCI-specific).

--Andy

Reasonably Related Threads

Search for more maybe matching threads

Linux Virtualization - Sep 2014 - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Reasonably Related Threads