thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Will Deacon

2018-Aug-01 08:16 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt
wrote:> On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > However the question people raise is that DMA API is already full
of
> > > arch-specific tricks the likes of which are outlined in your post
linked
> > > above. How is this one much worse?
> > 
> > None of these warts is visible to the driver, they are all handled in
> > the architecture (possibly on a per-bus basis).
> > 
> > So for virtio we really need to decide if it has one set of behavior
> > as specified in the virtio spec, or if it behaves exactly as if it
> > was on a PCI bus, or in fact probably both as you lined up.  But no
> > magic arch specific behavior inbetween.
> 
> The only arch specific behaviour is needed in the case where it doesn't
> behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> our secure VMs, we still need to make it use swiotlb in order to bounce
> through non-secure pages.
On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
transport (so definitely not PCI) have historically been advertised by qemu
as not being cache coherent, but because the virtio core has bypassed DMA
ops then everything has happened to work. If we blindly enable the arch DMA
ops, we'll plumb in the non-coherent ops and start getting data corruption,
so we do need a way to quirk virtio as being "always coherent" if we
want to
use the DMA ops (which we do, because our emulation platforms have an IOMMU
for all virtio devices).

Will

Christoph Hellwig

2018-Aug-01 08:36 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon
wrote:> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops,
No one is suggesting that as far as I can tell.
> we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if
we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
>From all that I've gather so far: no you do not want that.  We reallyneed to figure out virtio "dma" interacts with the host / device.

If you look at the current iommu spec it does talk of physical address
with a little careveout for VIRTIO_F_IOMMU_PLATFORM.

So between that and our discussion in this thread and its previous
iterations I think we need to stick to the current always physical,
bypass system dma ops mode of virtio operation as the default.

We just need to figure out how to deal with devices that deviate
from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
dma tweaks (offsets, cache flushing), which seems well in spirit of
the original design.  The other issue is VIRTIO_F_IO_BARRIER
which is very vaguely defined, and which needs a better definition.
And last but not least we'll need some text explaining the challenges
of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
is what would basically cover them, but a good description including
an explanation of why these matter.

Will Deacon

2018-Aug-01 09:05 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

Hi Christoph,

On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig
wrote:> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the
MMIO
> > transport (so definitely not PCI) have historically been advertised by
qemu
> > as not being cache coherent, but because the virtio core has bypassed
DMA
> > ops then everything has happened to work. If we blindly enable the
arch DMA
> > ops,
> 
> No one is suggesting that as far as I can tell.
Apologies: it's me that wants the DMA ops enabled to handle legacy devices
behind an IOMMU, but see below.
> > we'll plumb in the non-coherent ops and start getting data
corruption,
> > so we do need a way to quirk virtio as being "always
coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an
IOMMU
> > for all virtio devices).
> 
> From all that I've gather so far: no you do not want that.  We really
> need to figure out virtio "dma" interacts with the host / device.
> 
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
That's true, although that doesn't exist in the legacy virtio spec, and
we
have an existing emulation platform which puts legacy virtio devices behind
an IOMMU. Currently, Linux is unable to boot on this platform unless the
IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
then it works perfectly.
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
As above -- that means we hang during boot because we get stuck trying to
bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
answer is "just upgrade to latest virtio and advertise the presence of the
IOMMU". I'm pushing for that in future platforms, but it seems a shame
not
to support the current platform, especially given that other systems do have
hacks in mainline to get virtio working.
> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.
I agree that this makes sense for future revisions of virtio (or perhaps
it can just be a clarification to virtio 1.0), but we're still left in the
dark with legacy devices and it would be nice to have them work on the
systems which currently exist, even if it's a legacy-only hack in the arch
code.

Will

Michael S. Tsirkin

2018-Aug-01 22:35 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig
wrote:> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the
MMIO
> > transport (so definitely not PCI) have historically been advertised by
qemu
> > as not being cache coherent, but because the virtio core has bypassed
DMA
> > ops then everything has happened to work. If we blindly enable the
arch DMA
> > ops,
> 
> No one is suggesting that as far as I can tell.
> 
> > we'll plumb in the non-coherent ops and start getting data
corruption,
> > so we do need a way to quirk virtio as being "always
coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an
IOMMU
> > for all virtio devices).
> 
> >From all that I've gather so far: no you do not want that.  We
really
> need to figure out virtio "dma" interacts with the host / device.
> 
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
> 
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
> 
> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.
Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest
programmable protection which is designed for things like userspace
drivers but still very much which a CPU doing the accesses. I think
VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA.
>  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.
I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea.

-- 
MST

Benjamin Herrenschmidt

2018-Aug-02 15:24 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig
wrote:> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design. 
I don't completely agree:

1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
for example. It indicates that the peer bypasses the normal platform
iommu. The platform code in the guest has no real way to know that this
is the case, this is a specific "feature" of the qemu implementation.

2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
property of the guest platform itself (not qemu), there's no way the
"peer" can advertize it via the virtio negociated flags. At least for
us. I don't know for sure whether that would be workable for the ARM
case. In our case, qemu has no idea at VM creation time that the VM
will turn itself into a secure VM and thus will require bounce
buffering for IOs (including virtio).

So unless we have another hook for the arch code to set
VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
guest itself, I don't see that as a way to deal with it.
>  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.
Ben.

Michael S. Tsirkin

2018-Aug-05 00:27 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon
wrote:> On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > However the question people raise is that DMA API is already
full of
> > > > arch-specific tricks the likes of which are outlined in your
post linked
> > > > above. How is this one much worse?
> > > 
> > > None of these warts is visible to the driver, they are all
handled in
> > > the architecture (possibly on a per-bus basis).
> > > 
> > > So for virtio we really need to decide if it has one set of
behavior
> > > as specified in the virtio spec, or if it behaves exactly as if
it
> > > was on a PCI bus, or in fact probably both as you lined up.  But
no
> > > magic arch specific behavior inbetween.
> > 
> > The only arch specific behaviour is needed in the case where it
doesn't
> > behave like PCI. In this case, the PCI DMA ops are not suitable, but
in
> > our secure VMs, we still need to make it use swiotlb in order to
bounce
> > through non-secure pages.
> 
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops, we'll plumb in the non-coherent ops and start getting data
corruption,
> so we do need a way to quirk virtio as being "always coherent" if
we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
> 
> Will
Right that's not very different from placing the device within the IOMMU
domain but in fact bypassing the IOMMU. I wonder whether anyone ever
needs a non coherent virtio-mmio. If yes we can extend
PLATFORM_IOMMU to cover that or add another bit.

What exactly do the non-coherent ops do that causes the corruption?

-- 
MST

Will Deacon

2018-Aug-06 14:05 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

Hi Michael,

On Sun, Aug 05, 2018 at 03:27:42AM +0300, Michael S. Tsirkin
wrote:> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt
wrote:
> > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > > However the question people raise is that DMA API is
already full of
> > > > > arch-specific tricks the likes of which are outlined in
your post linked
> > > > > above. How is this one much worse?
> > > > 
> > > > None of these warts is visible to the driver, they are all
handled in
> > > > the architecture (possibly on a per-bus basis).
> > > > 
> > > > So for virtio we really need to decide if it has one set of
behavior
> > > > as specified in the virtio spec, or if it behaves exactly as
if it
> > > > was on a PCI bus, or in fact probably both as you lined up. 
But no
> > > > magic arch specific behavior inbetween.
> > > 
> > > The only arch specific behaviour is needed in the case where it
doesn't
> > > behave like PCI. In this case, the PCI DMA ops are not suitable,
but in
> > > our secure VMs, we still need to make it use swiotlb in order to
bounce
> > > through non-secure pages.
> > 
> > On arm/arm64, the problem we have is that legacy virtio devices on the
MMIO
> > transport (so definitely not PCI) have historically been advertised by
qemu
> > as not being cache coherent, but because the virtio core has bypassed
DMA
> > ops then everything has happened to work. If we blindly enable the
arch DMA
> > ops, we'll plumb in the non-coherent ops and start getting data
corruption,
> > so we do need a way to quirk virtio as being "always
coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an
IOMMU
> > for all virtio devices).
> > 
> > Will
> 
> Right that's not very different from placing the device within the
IOMMU
> domain but in fact bypassing the IOMMU
Hmm, I'm not sure I follow you here -- the IOMMU bypassing is handled
inside the IOMMU driver, so we'd still end up with non-coherent DMA ops
for the guest accesses. The presence of an IOMMU doesn't imply coherency for
us. Or am I missing your point here?
> I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we
> can extend PLATFORM_IOMMU to cover that or add another bit.
I think that's probably the right way around: assume that legacy virtio-mmio
devices are coherent by default.
> What exactly do the non-coherent ops do that causes the corruption?
The non-coherent ops mean that the guest ends up allocating the vring queues
using non-cacheable mappings, whereas qemu/hypervisor uses a cacheable
mapping despite not advertising the devices as being cache-coherent.

This hits something in the architecture known as "mismatched aliases",
which
means that coherency is lost between the guest and the hypervisor,
consequently resulting in data not being visible and ordering not being
guaranteed. The usual symptom is that the device appears to lock up iirc,
because the guest and the hypervisor are unable to communicate with each
other.

Does that help to clarify things?

Thanks,

Will

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Maybe Matching Threads