thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2018-Aug-06 21:26 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin
wrote:> > As I said replying to Christoph, we are "leaking" into the
interface
> > something here that is really what's the VM is doing to itself,
which
> > is to stash its memory away in an inaccessible place.
> > 
> > Cheers,
> > Ben.
> 
> I think Christoph merely objects to the specific implementation.  If
> instead you do something like tweak dev->bus_dma_mask for the virtio
> device I think he won't object.
Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?

So, something like that would be a possibility, but the problem is that
the current virtio (guest side) implementation doesn't honor this when
not using dma ops and will not use dma ops if not using iommu, so back
to square one.

Christoph seems to be wanting to use a flag in the interface to make
the guest use dma_ops which is what I don't understand.

What would be needed then would be something along the lines of virtio
noticing that dma_mask isn't big enough to cover all of memory (which
isn't something generic code can easily do here for various reasons I
can elaborate if you want, but that specific test more/less has to be
arch specific), and in that case, force itself to use DMA ops routed to
swiotlb.

I'd rather have arch code do the bulk of that work, don't you think ?

Which brings me back to this option, which may be the simplest and
avoids the overhead of the proposed series (I found the series to be a
nice cleanup but retpoline does kick us in the nuts here).

So what about this ?

--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
*vdev)
         * the DMA API if we're a Xen guest, which at least allows
         * all of the sensible Xen configurations to work correctly.
         */
-       if (xen_domain())
+       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
                return true;

        return false;

(Passing the dev allows the arch to know this is a virtio device in
"direct" mode or whatever we want to call the !iommu case, and
construct appropriate DMA ops for it, which aren't the same as the DMA
ops of any other PCI device who *do* use the iommu).

Otherwise, the harder option would be for us to hack so that
xen_domain() returns true in our setup (gross), and have the arch code,
when it sets up PCI device DMA ops, have a gross hack to identify
virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
different ops at that point.

As for those "special" ops, they are of course just normal swiotlb
ops,
there's nothing "special" other that they aren't the ops that
other PCI
device on that bus use.

Cheers,
Ben.

Michael S. Tsirkin

2018-Aug-06 21:46 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt
wrote:> On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > As I said replying to Christoph, we are "leaking" into
the interface
> > > something here that is really what's the VM is doing to
itself, which
> > > is to stash its memory away in an inaccessible place.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the
virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask
?
> 
> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.
Well we have the RFC for that - the switch to using DMA ops unconditionally
isn't
problematic itself IMHO, for now that RFC is blocked
by its perfromance overhead for now but Christoph says
he's trying to remove that for direct mappings,
so we should hopefully be able to get there in X weeks.
> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.
> 
> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think
?
> 
> Which brings me back to this option, which may be the simplest and
> avoids the overhead of the proposed series (I found the series to be a
> nice cleanup but retpoline does kick us in the nuts here).
> 
> So what about this ?
> 
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> *vdev)
>          * the DMA API if we're a Xen guest, which at least allows
>          * all of the sensible Xen configurations to work correctly.
>          */
> -       if (xen_domain())
> +       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
>                 return true;
>  
>         return false;
Right but can't we fix the retpoline overhead such that
vring_use_dma_api will not be called on data path any longer, making
this a setup time check?

> (Passing the dev allows the arch to know this is a virtio device in
> "direct" mode or whatever we want to call the !iommu case, and
> construct appropriate DMA ops for it, which aren't the same as the DMA
> ops of any other PCI device who *do* use the iommu).
I think that's where Christoph might have specific ideas about it.
> Otherwise, the harder option would be for us to hack so that
> xen_domain() returns true in our setup (gross), and have the arch code,
> when it sets up PCI device DMA ops, have a gross hack to identify
> virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
> different ops at that point.
> 
> As for those "special" ops, they are of course just normal
swiotlb ops,
> there's nothing "special" other that they aren't the ops
that other PCI
> device on that bus use.
> 
> Cheers,
> Ben.
-- 
MST

Benjamin Herrenschmidt

2018-Aug-06 22:13 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin
wrote:> On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > As I said replying to Christoph, we are "leaking"
into the interface
> > > > something here that is really what's the VM is doing to
itself, which
> > > > is to stash its memory away in an inaccessible place.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > 
> > > I think Christoph merely objects to the specific implementation. 
If
> > > instead you do something like tweak dev->bus_dma_mask for the
virtio
> > > device I think he won't object.
> > 
> > Well, we don't have "bus_dma_mask" yet ..or you mean
dma_mask ?
> > 
> > So, something like that would be a possibility, but the problem is
that
> > the current virtio (guest side) implementation doesn't honor this
when
> > not using dma ops and will not use dma ops if not using iommu, so back
> > to square one.
> 
> Well we have the RFC for that - the switch to using DMA ops unconditionally
isn't
> problematic itself IMHO, for now that RFC is blocked
> by its perfromance overhead for now but Christoph says
> he's trying to remove that for direct mappings,
> so we should hopefully be able to get there in X weeks.
That would be good yes.

 ../..
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > *vdev)
> >          * the DMA API if we're a Xen guest, which at least allows
> >          * all of the sensible Xen configurations to work correctly.
> >          */
> > -       if (xen_domain())
> > +       if (xen_domain() ||
arch_virtio_direct_dma_ops(&vdev->dev))
> >                 return true;
> >  
> >         return false;
> 
> Right but can't we fix the retpoline overhead such that
> vring_use_dma_api will not be called on data path any longer, making
> this a setup time check?
Yes it needs to be a setup time check regardless actually !

The above is broken, sorry I was a bit quick here (too early in the
morning... ugh). We don't want the arch to go override the dma ops
every time that is callled.

But yes, if we can fix the overhead, it becomes just a matter of
setting up the "right" ops automatically.
> > (Passing the dev allows the arch to know this is a virtio device in
> > "direct" mode or whatever we want to call the !iommu case,
and
> > construct appropriate DMA ops for it, which aren't the same as the
DMA
> > ops of any other PCI device who *do* use the iommu).
> 
> I think that's where Christoph might have specific ideas about it.
OK well, assuming Christoph can solve the direct case in a way that
also work for the virtio !iommu case, we still want some bit of logic
somewhere that will "switch" to swiotlb based ops if the DMA mask is
limited.

You mentioned an RFC for that ? Do you happen to have a link ?

It would be indeed ideal if all we had to do was setup some kind of
bus_dma_mask on all PCI devices and have virtio automagically insert
swiotlb when necessary.

Cheers,
Ben.

Christoph Hellwig

2018-Aug-07 06:16 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt
wrote:> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the
virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask
?
It will be new in 4.19:

http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/f07d141fe9430cdf9f8a65a87c41
> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.
> 
> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.
As-is virtio devices are very clearly and explcitly defined to use
physical addresses in the spec.  dma ops will often do platform
based translations (iommu, offsets), so we can't just use the plaform
default dma ops and will need to opt into them.
> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think
?
There is nothing architecture specific about that.  I've been working
hard to remove all the bullshit architectures have done in their DMA
ops and consolidating them into common code based on rules.  The last
thing I want is another vector for weird underspecified arch
interfaction with DMA ops, which is exactly what your patch below
does.

Christoph Hellwig

2018-Aug-07 06:18 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Tue, Aug 07, 2018 at 12:46:34AM +0300, Michael S. Tsirkin
wrote:> Well we have the RFC for that - the switch to using DMA ops unconditionally
isn't
> problematic itself IMHO, for now that RFC is blocked
> by its perfromance overhead for now but Christoph says
> he's trying to remove that for direct mappings,
> so we should hopefully be able to get there in X weeks.
The direct calls to dma_direct_ops aren't going to help you with legacy
virtio, given that virtio is specified to deal with physical addresses,
while dma-direct is not in many cases.

It would however help with the case where qemu always sets the platform
dma flag, as we'd avoid the indirect calls for that.

Possibly Parallel Threads

Search for more maybe matching threads

Linux Virtualization - Aug 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Possibly Parallel Threads