thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2018-Aug-06 19:52 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Mon, 2018-08-06 at 02:42 -0700, Christoph Hellwig
wrote:> On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote:
> > Who would set this bit ? qemu ? Under what circumstances ?
> 
> I don't really care who sets what.  The implementation might not even
> involved qemu.
> 
> It is your job to write a coherent interface specification that does
> not depend on the used components.  The hypervisor might be PAPR,
> Linux + qemu, VMware, Hyperv or something so secret that you'd have
> to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> properly specify the interface it simplify does not matter.
That's the point Christoph. The interface is today's interface. It does
NOT change. That information is not part of the interface.

It's the VM itself that is stashing away its memory in a secret place,
and thus needs to do bounce buffering. There is no change to the virtio
interface per-se.
> > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
> > ie, what would qemu do and what would Linux do ? I'm not sure I
fully
> > understand your idea.
> 
> In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the
> description which currently is rather vague but basically captures
> the use case.  Currently is is:
> 
> VIRTIO_F_IOMMU_PLATFORM(33)
>     This feature indicates that the device is behind an IOMMU that
>     translates bus addresses from the device into physical addresses in
>     memory. If this feature bit is set to 0, then the device emits
>     physical addresses which are not translated further, even though an
>     IOMMU may be present.
> 
> And I'd change it to something like:
> 
> VIRTIO_F_PLATFORM_DMA(33)
>     This feature indicates that the device emits platform specific
>     bus addresses that might not be identical to physical address.
>     The translation of physical to bus address is platform speific
>     and defined by the plaform specification for the bus that the virtio
>     device is attached to.
>     If this feature bit is set to 0, then the device emits
>     physical addresses which are not translated further, even if
>     the platform would normally require translations for the bus that
>     the virtio device is attached to.
> 
> If we can't change the defintion any more we should deprecate the
> old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM
> and VIRTIO_F_PLATFORM_DMA to be not set at the same time.
But this doesn't really change our problem does it ?

None of what happens in our case is part of the "interface". The
suggestion to force the iommu ON was simply that it was a "workaround"
as by doing so, we get to override the DMA ops, but that's just a
trick.

Fundamentally, what we need to solve is pretty much entirely a guest
problem.
> > I'm trying to understand because the limitation is not a device
side
> > limitation, it's not a qemu limitation, it's actually more of
a VM
> > limitation. It has most of its memory pages made inaccessible for
> > security reasons. The platform from a qemu/KVM perspective is almost
> > entirely normal.
> 
> Well, find a way to describe this either in the qemu specification using
> new feature bits, or by using something like the above.
But again, why do you want to involve the interface, and thus the
hypervisor for something that is essentially what the guest is doign to
itself ?

It really is something we need to solve locally to the guest, it's not
part of the interface.

Cheers,
Ben.

Christoph Hellwig

2018-Aug-07 06:21 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt
wrote:> > It is your job to write a coherent interface specification that does
> > not depend on the used components.  The hypervisor might be PAPR,
> > Linux + qemu, VMware, Hyperv or something so secret that you'd
have
> > to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> > AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> > properly specify the interface it simplify does not matter.
> 
> That's the point Christoph. The interface is today's interface. It
does
> NOT change. That information is not part of the interface.
> 
> It's the VM itself that is stashing away its memory in a secret place,
> and thus needs to do bounce buffering. There is no change to the virtio
> interface per-se.
Any guest that doesn't know about your magic limited adressing is simply
not going to work, so we need to communicate that fact.

Benjamin Herrenschmidt

2018-Aug-07 06:42 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Mon, 2018-08-06 at 23:21 -0700, Christoph Hellwig
wrote:> On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote:
> > > It is your job to write a coherent interface specification that
does
> > > not depend on the used components.  The hypervisor might be PAPR,
> > > Linux + qemu, VMware, Hyperv or something so secret that
you'd have
> > > to shoot me if you had to tell me.  The guest might be Linux,
FreeBSD,
> > > AIX, OS400 or a Hipster project of the day in Rust.  As long as
we
> > > properly specify the interface it simplify does not matter.
> > 
> > That's the point Christoph. The interface is today's
interface. It does
> > NOT change. That information is not part of the interface.
> > 
> > It's the VM itself that is stashing away its memory in a secret
place,
> > and thus needs to do bounce buffering. There is no change to the
virtio
> > interface per-se.
> 
> Any guest that doesn't know about your magic limited adressing is
simply
> not going to work, so we need to communicate that fact.
The guest does. It's the guest itself that initiates it. That's my
point, it's not a factor of the hypervisor, which is unchanged in that
area.

It's the guest itself, that makes the decision early on, to stash it's
memory away in a secure place, and thus needs to establish some kind of
bouce buffering via a few left over "insecure" pages.

It's all done by the guest: initiated by the guest and controlled by
the guest.

That's why I don't see why this specifically needs to involve the
hypervisor side, and thus a VIRTIO feature bit.

Note that I can make it so that the same DMA ops (basically standard
swiotlb ops without arch hacks) work for both "direct virtio" and
"normal PCI" devices.

The trick is simply in the arch to setup the iommu to map the swiotlb
bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
ignored without affecting the physical addresses.

If I do that, *all* I need is a way, from the guest itself (again, the
other side dosn't know anything about it), to force virtio to use the
DMA ops as if there was an iommu, that is, use whatever dma ops were
setup by the platform for the pci device.

Cheers,
Ben.

Reasonably Related Threads

Search for more reasonably related threads

Linux Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Reasonably Related Threads