thr3ads.net - Linux Virtualization - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2014-Sep-02 21:37 UTC

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 1:53 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Mon, 2014-09-01 at 22:55 -0700, Andy Lutomirski wrote:
>>
>> On x86, at least, I doubt that we'll ever see a physically
addressed
>> PCI virtio device for which ACPI advertises an IOMMU, since any sane
>> hypervisor will just not advertise an IOMMU for the virtio device.
>> But are there arm64 or PPC guests that use virtio_pci, that have
>> IOMMUs, and that will malfunction if the virtio_pci driver ends up
>> using the IOMMU?  I certainly hope not, since these systems might be
>> very hard-pressed to work right if someone plugged in a physical
>> virtio-speaking PCI device.
>
> It will definitely not work on ppc64. We always have IOMMUs on pseries,
> all PCI busses do, and because it's a paravirtualized environment,
> napping/unmapping pages means hypercalls -> expensive.
>
> But our virtio implementation bypasses it in qemu, so if virtio-pci
> starts using the DMA mapping API without changing the DMA ops under the
> hood, it will break for us.
>
Let's take a step back from from the implementation.  What is a driver
for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
supposed to do on ppc64?

It can send the device physical addresses and ignore the normal PCI
DMA semantics, which is what the current virtio_pci driver does.  This
seems like a layering violation, and this won't work if the device is
a real PCI device.  Alternatively, it can treat the device like any
other PCI device and use the IOMMU.  This is a bit slower, and it is
also incompatible with current hypervisors.

There really are virtio devices that are pieces of silicon and not
figments of a hypervisor's imagination [1].  We could teach virtio_pci
to use physical addressing on ppc64, but that seems like a pretty
awful hack, and it'll start needing quirks as soon as someone tries to
plug a virtio-speaking PCI card into a ppc64 machine.

Ideas?  x86 and arm seem to be safe here, since AFAIK there is no such
thing as a physically addressed virtio "PCI" device on a bus with an
IOMMU on x86, arm, or arm64.

[1] https://lwn.net/Articles/580186/
> Cheers,
> Ben.
>
>

-- 
Andy Lutomirski
AMA Capital Management, LLC

Benjamin Herrenschmidt

2014-Sep-02 22:10 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, 2014-09-02 at 14:37 -0700, Andy Lutomirski wrote:
> Let's take a step back from from the implementation.  What is a driver
> for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
> supposed to do on ppc64?
Today, it's supposed to send guest physical addresses. We can make that
optional via some nego or capabilities to support more esoteric setups
but for backward compatibility, this must remain the default behaviour.
> It can send the device physical addresses and ignore the normal PCI
> DMA semantics, which is what the current virtio_pci driver does.  This
> seems like a layering violation, and this won't work if the device is
> a real PCI device.
Correct, it's an original virtio implementation choice for maximum
performances.
>   Alternatively, it can treat the device like any
> other PCI device and use the IOMMU.  This is a bit slower, and it is
> also incompatible with current hypervisors.
This is a potentially a LOT slower and is backward incompatible with
current qemu/KVM and kvmtool yes.

The slowness can be alleviated using various techniques, for example on
ppc64 we can create a DMA window that contains a permanent mapping of
the entire guest space, so we could use such a thing for virtio.

Another think we could do potentially is advertize via the device-tree
that such a bus uses a direct mapping and have the guest use appropriate
"direct map" dma_ops.

But we need to keep backward compatibility with existing
guest/hypervisors so the default must remain as it is.
> There really are virtio devices that are pieces of silicon and not
> figments of a hypervisor's imagination [1].
I am aware of that. There are also attempts at using virtio to make two
machines communicate via a PCIe link (either with one as endpoint of the
other or via a non-transparent switch).

Which is why I'm not objecting to what you are trying to do ;-)

My suggestion was that it might be a cleaner approach to do that by
having the individual virtio drivers always use the dma_map_* API, and
limiting the kludgery to a combination of virtio_pci "core" and arch
code by selecting an appropriate set of dma_map_ops, defaulting with a
"transparent" (or direct) one as our current default case (and thus
overriding the iommu ones provided by the arch).
>   We could teach virtio_pci
> to use physical addressing on ppc64, but that seems like a pretty
> awful hack, and it'll start needing quirks as soon as someone tries to
> plug a virtio-speaking PCI card into a ppc64 machine.
But x86_64 is the same no ? The day it starts growing an iommu emulation
in qemu (and I've heard it's happening) it will still want to do direct
bypass for virtio for performance.
> Ideas?  x86 and arm seem to be safe here, since AFAIK there is no such
> thing as a physically addressed virtio "PCI" device on a bus with
an
> IOMMU on x86, arm, or arm64.
Today .... I wouldn't bet on it to remain that way. The qemu
implementation of virtio is physically addressed and you don't
necessarily have a choice of which device gets an iommu and which not.

Cheers,
Ben.
> [1] https://lwn.net/Articles/580186/
> 
> > Cheers,
> > Ben.
> >
> >
> 
> 
>

Andy Lutomirski

2014-Sep-02 23:11 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Sep 2, 2014 at 3:10 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:> On Tue, 2014-09-02 at 14:37 -0700, Andy Lutomirski wrote:
>
>> Let's take a step back from from the implementation.  What is a
driver
>> for a virtio PCI device (i.e. a PCI device with vendor 0x1af4)
>> supposed to do on ppc64?
>
> Today, it's supposed to send guest physical addresses. We can make that
> optional via some nego or capabilities to support more esoteric setups
> but for backward compatibility, this must remain the default behaviour.
I think it only needs to remain the default in cases where the
alternative (bus addressing) won't work.  I think that, so far, this
is just ppc64.  But see below...
>
> My suggestion was that it might be a cleaner approach to do that by
> having the individual virtio drivers always use the dma_map_* API, and
> limiting the kludgery to a combination of virtio_pci "core" and
arch
> code by selecting an appropriate set of dma_map_ops, defaulting with a
> "transparent" (or direct) one as our current default case (and
thus
> overriding the iommu ones provided by the arch).
I think the cleanest way of all would be to get the bus drivers to do
the right thing so that all of the virtio code can just use the dma
api.  I don't know whether this is achievable.
>
>>   We could teach virtio_pci
>> to use physical addressing on ppc64, but that seems like a pretty
>> awful hack, and it'll start needing quirks as soon as someone tries
to
>> plug a virtio-speaking PCI card into a ppc64 machine.
>
> But x86_64 is the same no ? The day it starts growing an iommu emulation
> in qemu (and I've heard it's happening) it will still want to do
direct
> bypass for virtio for performance.
I don't think so.  I would argue that it's a straight-up bug for QEMU
to expose a physically-addressed virtio-pci device to the guest behind
an emulated IOMMU.  QEMU may already be doing that on ppc64, but it
isn't on x86_64 or arm (yet).

On x86_64, I'm pretty sure that QEMU can emulate an IOMMU for
everything except the virtio-pci devices.  The ACPI DMAR stuff is
quite expressive.

On ARM, I hope the QEMU will never implement a PCI IOMMU.  As far as I
could tell when I looked last week, none of the newer QEMU-emulated
ARM machines even support PCI.  Even if QEMU were to implement a PCI
IOMMU on some future ARM machine, it could continue using virtio-mmio
for virtio devices.

So ppc might actually be the only system that has or will have
physically-addressed virtio PCI devices that are behind an IOMMU.  Can
this be handled in a ppc64-specific way?  Is there any way that the
kernel can distinguish a QEMU-provided virtio PCI device from a
physical PCIe thing?  It would be kind of nice to address this without
adding complexity to the virtio spec.  Maybe virtio 1.0 devices could
be assumed to use bus addressing unless a new devicetree property says
otherwise.

--Andy

Rusty Russell

2014-Sep-03 06:42 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Andy Lutomirski <luto at amacapital.net> writes:> There really are virtio devices that are pieces of silicon and not
> figments of a hypervisor's imagination [1].
Hi Andy,

As you're discovering, there's a reason no one has done the DMA
API before.

So the problem is that ppc64's IOMMU is a platform thing, not a bus
thing. They really do carve out an exception for virtio devices,
because performance (LOTS of performance). It remains to be seen if
other platforms have the same performance issues, but in absence of
other evidence, the answer is yes.

It's a hack. But having specific virtual-only devices are an even
bigger hack.

Physical virtio devices have been talked about, but don't actually exist
in Real Life. And someone a virtio PCI card is going to have serious
performance issues: mainly because they'll want the rings in the card's
MMIO region, not allocated by the driver. Being broken on PPC is really
the least of their problems.

So, what do we do? It'd be nice if Linux virtio Just Worked under Xen,
though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be
a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer
exposed by virtio_pci.c is out.

I think the best approach is to have a new feature bit (25 is free),
VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
use the mapping for the bus it is on. A real device would set this,
or it won't work behind an IOMMU. A Xen device would also set this.

Thoughts?
Rusty.

PS. I cc'd OASIS virtio-dev: it's subscriber only for IP reasons (to
subscribe you have to promise we can use your suggestion in the
standard). Feel free to remove in any replies, but it's part of
the world we live in...

Andy Lutomirski

2014-Sep-03 07:50 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at
rustcorp.com.au> wrote:>
> Andy Lutomirski <luto at amacapital.net> writes:
> > There really are virtio devices that are pieces of silicon and not
> > figments of a hypervisor's imagination [1].
>
> Hi Andy,
>
>         As you're discovering, there's a reason no one has done the
DMA
> API before.
>
> So the problem is that ppc64's IOMMU is a platform thing, not a bus
> thing.  They really do carve out an exception for virtio devices,
> because performance (LOTS of performance).  It remains to be seen if
> other platforms have the same performance issues, but in absence of
> other evidence, the answer is yes.
>
> It's a hack.  But having specific virtual-only devices are an even
> bigger hack.
>
> Physical virtio devices have been talked about, but don't actually
exist
> in Real Life.  And someone a virtio PCI card is going to have serious
> performance issues: mainly because they'll want the rings in the
card's
> MMIO region, not allocated by the driver.  Being broken on PPC is really
> the least of their problems.
>
> So, what do we do?  It'd be nice if Linux virtio Just Worked under Xen,
> though Xen's IOMMU is outside the virtio spec.  Since virtio_pci can be
> a module, obvious hacks like having xen_arch_setup initialize a dma_ops
pointer
> exposed by virtio_pci.c is out.
Xen does expose dma_ops.  The trick is knowing when to use it.
>
> I think the best approach is to have a new feature bit (25 is free),
> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
> use the mapping for the bus it is on.  A real device would set this,
> or it won't work behind an IOMMU.  A Xen device would also set this.
The devices I care about aren't actually Xen devices.  They're devices
supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes
the virtio device (along with every other PCI device) through to dom0.
So this is exactly the same virtio device that regular x86 KVM guests
would see.  The reason that current code fails is that Xen guest
physical addresses aren't the same as the addresses seen by the outer
hypervisor.

These devices don't know that physical addresses != bus addresses, so
they can't advertise that fact.

If we ever end up with a virtio_pci device with physical addressing,
behind an IOMMU (but ignoring it), on Xen, we'll have a problem, since
neither "physical" addressing nor dma ops will work.

That being said, there are also proposals for virtio devices supplied
by Xen dom0 to domU, and these will presumably work the same way,
except that the device implementation will know that it's on Xen.

Grr.  This is mostly a result of the fact that virtio_pci devices
aren't really PCI devices.  I still think that virtio_pci shouldn't
have to worry about this; ideally this would all be handled higher up
in the device hierarchy.  x86 already gets this right.

Are there any hypervisors except PPC that use virtio_pci, have IOMMUs
on the pci slot that virtio_pci lives in, and that use physical
addressing?  If not, I think that just quirking PPC will work (at
least until someone wants IOMMU support in virtio_pci on PPC, in which
case doing something using devicetree seems like a reasonable
solution).

--Andy
>
> Thoughts?
> Rusty.
>
> PS.  I cc'd OASIS virtio-dev: it's subscriber only for IP reasons
(to
>      subscribe you have to promise we can use your suggestion in the
>      standard).  Feel free to remove in any replies, but it's part of
>      the world we live in...

Michael S. Tsirkin

2014-Sep-03 12:51 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Wed, Sep 03, 2014 at 04:12:01PM +0930, Rusty Russell
wrote:> Andy Lutomirski <luto at amacapital.net> writes:
> > There really are virtio devices that are pieces of silicon and not
> > figments of a hypervisor's imagination [1].
> 
> Hi Andy,
> 
>         As you're discovering, there's a reason no one has done the
DMA
> API before.
> 
> So the problem is that ppc64's IOMMU is a platform thing, not a bus
> thing.  They really do carve out an exception for virtio devices,
> because performance (LOTS of performance).  It remains to be seen if
> other platforms have the same performance issues, but in absence of
> other evidence, the answer is yes.
> 
> It's a hack.  But having specific virtual-only devices are an even
> bigger hack.
> 
> Physical virtio devices have been talked about, but don't actually
exist
> in Real Life.  And someone a virtio PCI card is going to have serious
> performance issues: mainly because they'll want the rings in the
card's
> MMIO region, not allocated by the driver.
Why? What's wrong with rings in memory?
>  Being broken on PPC is really
> the least of their problems.
> 
> So, what do we do?  It'd be nice if Linux virtio Just Worked under Xen,
> though Xen's IOMMU is outside the virtio spec.  Since virtio_pci can be
> a module, obvious hacks like having xen_arch_setup initialize a dma_ops
pointer
> exposed by virtio_pci.c is out.
Well virtio could probe for xen, it's not a lot of code.
> I think the best approach is to have a new feature bit (25 is free),
> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
> use the mapping for the bus it is on.  A real device would set this,
> or it won't work behind an IOMMU.  A Xen device would also set this.
> 
> Thoughts?
> Rusty.
OK and it should then be active even if guest does not ack
the feature (so in fact, it would have to be a mandatory feature).
That can work, but I still find this a bit inelegant: this is
a property of the platform, not of the device.

> PS.  I cc'd OASIS virtio-dev: it's subscriber only for IP reasons
(to
>      subscribe you have to promise we can use your suggestion in the
>      standard).  Feel free to remove in any replies, but it's part of
>      the world we live in...

Reasonably Related Threads

Search for more reasonably related threads

Linux Virtualization - Sep 2014 - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Reasonably Related Threads