thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2018-Aug-07 20:32 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, 2018-08-07 at 06:55 -0700, Christoph Hellwig
wrote:> On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote:
> > Note that I can make it so that the same DMA ops (basically standard
> > swiotlb ops without arch hacks) work for both "direct
virtio" and
> > "normal PCI" devices.
> > 
> > The trick is simply in the arch to setup the iommu to map the swiotlb
> > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
> > ignored without affecting the physical addresses.
> > 
> > If I do that, *all* I need is a way, from the guest itself (again, the
> > other side dosn't know anything about it), to force virtio to use
the
> > DMA ops as if there was an iommu, that is, use whatever dma ops were
> > setup by the platform for the pci device.
> 
> In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should
> do the work (even if that isn't strictly what the current definition
> of the flag actually means).  On the qemu side you'll need to make
> sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating
> an iommu, but with code to take dma offsets into account if your
> plaform has any (various power plaforms seem to have them, not sure
> if it affects your config).
Something like that yes. I prefer a slightly different way, see below,
any but in both cases, it should alleviate your concerns since it means
there would be no particular mucking around with DMA ops at all, virtio
would just use whatever "normal" ops we establish for all PCI devices
on that platform, which will be standard ones.

(swiotlb ones today and the new "integrates" ones you're cooking
tomorrow).

As for the flag itself, while we could set it from qemu when we get
notified that the guest is going secure, both Michael and I think it's
rather gross, it requires qemu to go iterate all virtio devices and
"poke" something into them.

It also means qemu will need some other internal nasty flag that says
"set that bit but don't do iommu".

It's nicer if we have a way in the guest virtio driver to do something
along the lines of

	if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())

Which would have the same effect and means the issue is entirely
contained in the guest.

Cheers,
Ben.

Christoph Hellwig

2018-Aug-08 06:31 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, Aug 08, 2018 at 06:32:45AM +1000, Benjamin Herrenschmidt
wrote:> As for the flag itself, while we could set it from qemu when we get
> notified that the guest is going secure, both Michael and I think it's
> rather gross, it requires qemu to go iterate all virtio devices and
> "poke" something into them.
You don't need to set them the time you go secure.  You just need to
set the flag from the beginning on any VM you might want to go secure.
Or for simplicity just any VM - if the DT/ACPI tables exposed by
qemu are good enough that will always exclude a iommu and not set a
DMA offset, so nothing will change on the qemu side of he processing,
and with the new direct calls for the direct dma ops performance in
the guest won't change either.
> It's nicer if we have a way in the guest virtio driver to do something
> along the lines of
> 
> 	if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> 
> Which would have the same effect and means the issue is entirely
> contained in the guest.
It would not be the same effect.  The problem with that is that you must
now assumes that your qemu knows that for example you might be passing
a dma offset if the bus otherwise requires it.  Or in other words:
you potentially break the contract between qemu and the guest of always
passing down physical addresses.  If we explicitly change that contract
through using a flag that says you pass bus address everything is fine.

Note that in practice your scheme will probably just work for your
initial prototype, but chances are it will get us in trouble later on.

Benjamin Herrenschmidt

2018-Aug-08 10:07 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, 2018-08-07 at 23:31 -0700, Christoph Hellwig
wrote:> 
> You don't need to set them the time you go secure.  You just need to
> set the flag from the beginning on any VM you might want to go secure.
> Or for simplicity just any VM - if the DT/ACPI tables exposed by
> qemu are good enough that will always exclude a iommu and not set a
> DMA offset, so nothing will change on the qemu side of he processing,
> and with the new direct calls for the direct dma ops performance in
> the guest won't change either.
So that's where I'm not sure things are "good enough" due to
how
pseries works. (remember it's paravirtualized).

A pseries system starts with a default iommu on all devices, that uses
translation using 4k entires with a "pinhole" window (usually 2G with
qemu iirc). There's no "pass through" by default.

Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
is not set (default) but there's nothing in the device-tree to tell the
guest about this since it's a violation of our pseries architecture, so
we just rely on Linux virtio "knowing" that it happens. It's a bit
yucky but that's now history...

Essentially pseries "architecturally" does not have the concept of not
having an iommu in the way and qemu violates that architecture today.

(Remember it comes from pHyp, our priorietary HV, which we are somewhat
mimmicing here).

So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
through that iommu and performance will suffer (esp vhost I suspect),
especially since adding/removing translations in the iommu is a
hypercall.

Now, we do have HV APIs to create a second window that's "permanently
mapped" to the guest memory, thus avoiding dynamic map/unmaps, and
Linux can make use of this but I don't know if that works with qemu and
the performance impact with vhost.

So the situation isn't that great.... On the other hand, I think the
other approach works for us:
> > It's nicer if we have a way in the guest virtio driver to do
something
> > along the lines of
> > 
> > 	if ((flags & VIRTIO_F_IOMMU_PLATFORM) ||
arch_virtio_wants_dma_ops())
> > 
> > Which would have the same effect and means the issue is entirely
> > contained in the guest.
> 
> It would not be the same effect.  The problem with that is that you must
> now assumes that your qemu knows that for example you might be passing
> a dma offset if the bus otherwise requires it. 
I would assume that arch_virtio_wants_dma_ops() only returns true when
no such offsets are involved, at least in our case that would be what
happens.
>  Or in other words:
> you potentially break the contract between qemu and the guest of always
> passing down physical addresses.  If we explicitly change that contract
> through using a flag that says you pass bus address everything is fine.
For us a "bus address" is behind the iommu so that's what
VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
bus address that is different. I suppose it's an ARMism to have DMA
offsets that are separate from iommus ? 
> Note that in practice your scheme will probably just work for your
> initial prototype, but chances are it will get us in trouble later on.
Not on pseries, at least not in any way I can think of mind you... but
maybe other architectures would abuse it... We could add a WARN_ON if
that calls returns true on a bus with an offset I suppose.

Cheers,
Ben.

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Maybe Matching Threads