thr3ads.net - Linux Virtualization - [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Benjamin Herrenschmidt

2014-Sep-17 12:02 UTC

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski
wrote:> On non-PPC systems, virtio_pci should use the DMA API.  This fixes
> virtio_pci on Xen.  On PPC, using the DMA API would break things, so
> we need to preserve the old behavior.
> 
> The big comment in this patch explains the considerations in more
> detail.
I still disagree with using CONFIG_PPC as a trigger here.

Fundamentally, the qemu implementation today bypasses IOMMUs on all
platforms as far as I can tell.

If that changes, we'll have a backward compatibility problem.

The virtio device should advertise whether it's using that bypass
mode of operation and virtio_pci should react accordingly.

There is a demand for being able to operate on top of an IOMMU on
powerpc as well for some embedded stuff using PCI as a transport so your
patch precludes that.

Cheers,
Ben.
> 
> Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> ---
>  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 81 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index a1f299fa4626..8ddb0a641878 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
>  	/* the number of entries in the queue */
>  	int num;
>  
> -	/* the virtual address of the ring queue */
> -	void *queue;
> +	/* the ring queue */
> +	void *queue;			/* virtual address */
> +	dma_addr_t queue_dma_addr;	/* bus address */
> +	bool use_dma_api;		/* are we using the DMA API? */
>  
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
> @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static bool vp_use_dma_api(void)
> +{
> +	/*
> +	 * Due to limitations of the DMA API, we only have two choices:
> +	 * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
> +	 * physical-to-machine translation) or use direct physical
> +	 * addressing.  Furthermore, there's no sensible way yet for the
> +	 * PCI bus code to tell us whether we're supposed to act like a
> +	 * normal PCI device (and use the DMA API) or to do something
> +	 * else.  So we're stuck with heuristics here.
> +	 *
> +	 * In general, we would prefer to use the DMA API, since we
> +	 * might be driving a physical device, and such devices *must*
> +	 * use the DMA API if there is an IOMMU involved.
> +	 *
> +	 * On x86, there are no physically-mapped emulated virtio PCI
> +	 * devices that live behind an IOMMU.  On ARM, there don't seem
> +	 * to be any hypervisors that use virtio_pci (as opposed to
> +	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
> +	 * API is safe.
> +	 *
> +	 * On PowerPC, it's the other way around.  There usually is an
> +	 * IOMMU between us and the virtio PCI device, but the device is
> +	 * probably emulated and ignores the IOMMU.  Unfortunately, we
> +	 * can't tell whether we're talking to an emulated device or to
> +	 * a physical device that really lives behind the IOMMU.  That
> +	 * means that we're stuck with ignoring the DMA API.
> +	 */
> +
> +#ifdef CONFIG_PPC
> +	return false;
> +#else
> +	/*
> +	 * Minor optimization: if the platform promises to have physical
> +	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> +	 * platform's DMA API implementation is well optimized, this
> +	 * should have almost no effect, but we already have a branch in
> +	 * the vring code, and we can avoid any further indirection with
> +	 * very little effort.
> +	 */
> +	return !PCI_DMA_BUS_IS_PHYS;
> +#endif
> +}
> +
>  static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned
index,
>  				  void (*callback)(struct virtqueue *vq),
>  				  const char *name,
> @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
>  
>  	info->num = num;
>  	info->msix_vector = msix_vec;
> +	info->use_dma_api = vp_use_dma_api();
>  
> -	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> -	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> +	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		info->queue = dma_zalloc_coherent(vdev->dev.parent, size,
> +						  &info->queue_dma_addr,
> +						  GFP_KERNEL);
> +	} else {
> +		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> +						GFP_KERNEL|__GFP_ZERO);
> +		info->queue_dma_addr = virt_to_phys(info->queue);
> +	}
>  	if (info->queue == NULL) {
>  		err = -ENOMEM;
>  		goto out_info;
>  	}
>  
>  	/* activate the queue */
> -	iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> +	iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
>  		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
>  	/* create the vring */
>  	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN,
vdev,
> -				 true, false, info->queue,
> +				 true, info->use_dma_api, info->queue,
>  				 vp_notify, callback, name);
>  	if (!vq) {
>  		err = -ENOMEM;
> @@ -463,7 +518,12 @@ out_assign:
>  	vring_del_virtqueue(vq);
>  out_activate_queue:
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> -	free_pages_exact(info->queue, size);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  out_info:
>  	kfree(info);
>  	return ERR_PTR(err);
> @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
>  	/* Select and deactivate the queue */
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
> -	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
> -	free_pages_exact(info->queue, size);
> +	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vq->vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  	kfree(info);
>  }
>  
> @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  	if (err)
>  		goto out;
>  
> +	err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
> +	if (err)
> +		err = dma_set_mask_and_coherent(&pci_dev->dev,
> +						DMA_BIT_MASK(32));
> +	if (err)
> +		dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit
DMA.  Trying to continue, but this might not work.\n");
> +
>  	err = pci_request_regions(pci_dev, "virtio-pci");
>  	if (err)
>  		goto out_enable_device;

Michael S. Tsirkin

2014-Sep-17 14:16 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Wed, Sep 17, 2014 at 08:02:31AM -0400, Benjamin Herrenschmidt
wrote:> On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski wrote:
> > On non-PPC systems, virtio_pci should use the DMA API.  This fixes
> > virtio_pci on Xen.  On PPC, using the DMA API would break things, so
> > we need to preserve the old behavior.
> > 
> > The big comment in this patch explains the considerations in more
> > detail.
> 
> I still disagree with using CONFIG_PPC as a trigger here.
> 
> Fundamentally, the qemu implementation today bypasses IOMMUs on all
> platforms as far as I can tell.
> 
> If that changes, we'll have a backward compatibility problem.
> 
> The virtio device should advertise whether it's using that bypass
> mode of operation and virtio_pci should react accordingly.
Well if there's a way to detect that - that's outside the
device, then we probably shouldn't use device-specific
interfaces for this capability.

> There is a demand for being able to operate on top of an IOMMU on
> powerpc as well for some embedded stuff using PCI as a transport so your
> patch precludes that.
> 
> Cheers,
> Ben.
As far as I can see, nothing changes on PPC so this patch
does not preclude anything that was working?

> > 
> > Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> > ---
> >  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 81 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > index a1f299fa4626..8ddb0a641878 100644
> > --- a/drivers/virtio/virtio_pci.c
> > +++ b/drivers/virtio/virtio_pci.c
> > @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
> >  	/* the number of entries in the queue */
> >  	int num;
> >  
> > -	/* the virtual address of the ring queue */
> > -	void *queue;
> > +	/* the ring queue */
> > +	void *queue;			/* virtual address */
> > +	dma_addr_t queue_dma_addr;	/* bus address */
> > +	bool use_dma_api;		/* are we using the DMA API? */
> >  
> >  	/* the list node for the virtqueues list */
> >  	struct list_head node;
> > @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device
*vdev)
> >  	return err;
> >  }
> >  
> > +static bool vp_use_dma_api(void)
> > +{
> > +	/*
> > +	 * Due to limitations of the DMA API, we only have two choices:
> > +	 * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
> > +	 * physical-to-machine translation) or use direct physical
> > +	 * addressing.  Furthermore, there's no sensible way yet for the
> > +	 * PCI bus code to tell us whether we're supposed to act like a
> > +	 * normal PCI device (and use the DMA API) or to do something
> > +	 * else.  So we're stuck with heuristics here.
> > +	 *
> > +	 * In general, we would prefer to use the DMA API, since we
> > +	 * might be driving a physical device, and such devices *must*
> > +	 * use the DMA API if there is an IOMMU involved.
> > +	 *
> > +	 * On x86, there are no physically-mapped emulated virtio PCI
> > +	 * devices that live behind an IOMMU.  On ARM, there don't seem
> > +	 * to be any hypervisors that use virtio_pci (as opposed to
> > +	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
> > +	 * API is safe.
> > +	 *
> > +	 * On PowerPC, it's the other way around.  There usually is an
> > +	 * IOMMU between us and the virtio PCI device, but the device is
> > +	 * probably emulated and ignores the IOMMU.  Unfortunately, we
> > +	 * can't tell whether we're talking to an emulated device or
to
> > +	 * a physical device that really lives behind the IOMMU.  That
> > +	 * means that we're stuck with ignoring the DMA API.
> > +	 */
> > +
> > +#ifdef CONFIG_PPC
> > +	return false;
> > +#else
> > +	/*
> > +	 * Minor optimization: if the platform promises to have physical
> > +	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> > +	 * platform's DMA API implementation is well optimized, this
> > +	 * should have almost no effect, but we already have a branch in
> > +	 * the vring code, and we can avoid any further indirection with
> > +	 * very little effort.
> > +	 */
> > +	return !PCI_DMA_BUS_IS_PHYS;
> > +#endif
> > +}
> > +
> >  static struct virtqueue *setup_vq(struct virtio_device *vdev,
unsigned index,
> >  				  void (*callback)(struct virtqueue *vq),
> >  				  const char *name,
> > @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
> >  
> >  	info->num = num;
> >  	info->msix_vector = msix_vec;
> > +	info->use_dma_api = vp_use_dma_api();
> >  
> > -	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> > -	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> > +	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> > +	if (info->use_dma_api) {
> > +		info->queue = dma_zalloc_coherent(vdev->dev.parent, size,
> > +						  &info->queue_dma_addr,
> > +						  GFP_KERNEL);
> > +	} else {
> > +		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> > +						GFP_KERNEL|__GFP_ZERO);
> > +		info->queue_dma_addr = virt_to_phys(info->queue);
> > +	}
> >  	if (info->queue == NULL) {
> >  		err = -ENOMEM;
> >  		goto out_info;
> >  	}
> >  
> >  	/* activate the queue */
> > -	iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > +	iowrite32(info->queue_dma_addr >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> >  		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> >  
> >  	/* create the vring */
> >  	vq = vring_new_virtqueue(index, info->num,
VIRTIO_PCI_VRING_ALIGN, vdev,
> > -				 true, false, info->queue,
> > +				 true, info->use_dma_api, info->queue,
> >  				 vp_notify, callback, name);
> >  	if (!vq) {
> >  		err = -ENOMEM;
> > @@ -463,7 +518,12 @@ out_assign:
> >  	vring_del_virtqueue(vq);
> >  out_activate_queue:
> >  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > -	free_pages_exact(info->queue, size);
> > +	if (info->use_dma_api) {
> > +		dma_free_coherent(vdev->dev.parent, size,
> > +				  info->queue, info->queue_dma_addr);
> > +	} else {
> > +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> > +	}
> >  out_info:
> >  	kfree(info);
> >  	return ERR_PTR(err);
> > @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
> >  	/* Select and deactivate the queue */
> >  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> >  
> > -	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
> > -	free_pages_exact(info->queue, size);
> > +	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> > +	if (info->use_dma_api) {
> > +		dma_free_coherent(vq->vdev->dev.parent, size,
> > +				  info->queue, info->queue_dma_addr);
> > +	} else {
> > +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> > +	}
> >  	kfree(info);
> >  }
> >  
> > @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev
*pci_dev,
> >  	if (err)
> >  		goto out;
> >  
> > +	err = dma_set_mask_and_coherent(&pci_dev->dev,
DMA_BIT_MASK(64));
> > +	if (err)
> > +		err = dma_set_mask_and_coherent(&pci_dev->dev,
> > +						DMA_BIT_MASK(32));
> > +	if (err)
> > +		dev_warn(&pci_dev->dev, "Failed to enable 64-bit or
32-bit DMA.  Trying to continue, but this might not work.\n");
> > +
> >  	err = pci_request_regions(pci_dev, "virtio-pci");
> >  	if (err)
> >  		goto out_enable_device;
>

Andy Lutomirski

2014-Sep-17 16:07 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Sep 17, 2014 7:13 AM, "Michael S. Tsirkin" <mst at
redhat.com> wrote:>
> On Wed, Sep 17, 2014 at 08:02:31AM -0400, Benjamin Herrenschmidt wrote:
> > On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski wrote:
> > > On non-PPC systems, virtio_pci should use the DMA API.  This
fixes
> > > virtio_pci on Xen.  On PPC, using the DMA API would break things,
so
> > > we need to preserve the old behavior.
> > >
> > > The big comment in this patch explains the considerations in more
> > > detail.
> >
> > I still disagree with using CONFIG_PPC as a trigger here.
> >
> > Fundamentally, the qemu implementation today bypasses IOMMUs on all
> > platforms as far as I can tell.
> >
> > If that changes, we'll have a backward compatibility problem.
> >
> > The virtio device should advertise whether it's using that bypass
> > mode of operation and virtio_pci should react accordingly.
>
> Well if there's a way to detect that - that's outside the
> device, then we probably shouldn't use device-specific
> interfaces for this capability.
I still think that this is a property of the bus, not the device.  x86
has such a mechanism, and this patch uses it transparently.

(NB: According to David Woodhouse about ten minutes ago, this should
work with a bit of care right now on x86 via ACPI as long as QEMU
configures the tables correctly.)
>
>
> > There is a demand for being able to operate on top of an IOMMU on
> > powerpc as well for some embedded stuff using PCI as a transport so
your
> > patch precludes that.
> >
> > Cheers,
> > Ben.
>
> As far as I can see, nothing changes on PPC so this patch
> does not preclude anything that was working?
>
It shouldn't.  That being said, at some point this problem will need
solving on PPC, and this patch doesn't help much, other than adding
the virtio_ring piece.

I'd really like to see the generic or arch IOMMU code handle this so
that the PPC special case in the virtio-pci driver can go away.

Another reason that this kind of special casing doesn't belong in
virtio-pci: if anyone ever binds vfio to a virtio-pci device, if the
DMA API doesn't work right, then the kernel will go boom or be
compromised.

--Andy
>
> > >
> > > Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> > > ---
> > >  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 81 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci.c
b/drivers/virtio/virtio_pci.c
> > > index a1f299fa4626..8ddb0a641878 100644
> > > --- a/drivers/virtio/virtio_pci.c
> > > +++ b/drivers/virtio/virtio_pci.c
> > > @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
> > >     /* the number of entries in the queue */
> > >     int num;
> > >
> > > -   /* the virtual address of the ring queue */
> > > -   void *queue;
> > > +   /* the ring queue */
> > > +   void *queue;                    /* virtual address */
> > > +   dma_addr_t queue_dma_addr;      /* bus address */
> > > +   bool use_dma_api;               /* are we using the DMA API?
*/
> > >
> > >     /* the list node for the virtqueues list */
> > >     struct list_head node;
> > > @@ -388,6 +390,50 @@ static int vp_request_intx(struct
virtio_device *vdev)
> > >     return err;
> > >  }
> > >
> > > +static bool vp_use_dma_api(void)
> > > +{
> > > +   /*
> > > +    * Due to limitations of the DMA API, we only have two
choices:
> > > +    * use the DMA API (e.g. set up IOMMU mappings or apply
Xen's
> > > +    * physical-to-machine translation) or use direct physical
> > > +    * addressing.  Furthermore, there's no sensible way yet
for the
> > > +    * PCI bus code to tell us whether we're supposed to act
like a
> > > +    * normal PCI device (and use the DMA API) or to do something
> > > +    * else.  So we're stuck with heuristics here.
> > > +    *
> > > +    * In general, we would prefer to use the DMA API, since we
> > > +    * might be driving a physical device, and such devices
*must*
> > > +    * use the DMA API if there is an IOMMU involved.
> > > +    *
> > > +    * On x86, there are no physically-mapped emulated virtio PCI
> > > +    * devices that live behind an IOMMU.  On ARM, there
don't seem
> > > +    * to be any hypervisors that use virtio_pci (as opposed to
> > > +    * virtio_mmio) that also emulate an IOMMU.  So using the DMI
> > > +    * API is safe.
> > > +    *
> > > +    * On PowerPC, it's the other way around.  There usually
is an
> > > +    * IOMMU between us and the virtio PCI device, but the device
is
> > > +    * probably emulated and ignores the IOMMU.  Unfortunately,
we
> > > +    * can't tell whether we're talking to an emulated
device or to
> > > +    * a physical device that really lives behind the IOMMU. 
That
> > > +    * means that we're stuck with ignoring the DMA API.
> > > +    */
> > > +
> > > +#ifdef CONFIG_PPC
> > > +   return false;
> > > +#else
> > > +   /*
> > > +    * Minor optimization: if the platform promises to have
physical
> > > +    * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> > > +    * platform's DMA API implementation is well optimized,
this
> > > +    * should have almost no effect, but we already have a branch
in
> > > +    * the vring code, and we can avoid any further indirection
with
> > > +    * very little effort.
> > > +    */
> > > +   return !PCI_DMA_BUS_IS_PHYS;
> > > +#endif
> > > +}
> > > +
> > >  static struct virtqueue *setup_vq(struct virtio_device *vdev,
unsigned index,
> > >                               void (*callback)(struct virtqueue
*vq),
> > >                               const char *name,
> > > @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
> > >
> > >     info->num = num;
> > >     info->msix_vector = msix_vec;
> > > +   info->use_dma_api = vp_use_dma_api();
> > >
> > > -   size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> > > -   info->queue = alloc_pages_exact(size,
GFP_KERNEL|__GFP_ZERO);
> > > +   size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> > > +   if (info->use_dma_api) {
> > > +           info->queue =
dma_zalloc_coherent(vdev->dev.parent, size,
> > > +                                            
&info->queue_dma_addr,
> > > +                                             GFP_KERNEL);
> > > +   } else {
> > > +           info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> > > +                                          
GFP_KERNEL|__GFP_ZERO);
> > > +           info->queue_dma_addr =
virt_to_phys(info->queue);
> > > +   }
> > >     if (info->queue == NULL) {
> > >             err = -ENOMEM;
> > >             goto out_info;
> > >     }
> > >
> > >     /* activate the queue */
> > > -   iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > > +   iowrite32(info->queue_dma_addr >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > >               vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > >
> > >     /* create the vring */
> > >     vq = vring_new_virtqueue(index, info->num,
VIRTIO_PCI_VRING_ALIGN, vdev,
> > > -                            true, false, info->queue,
> > > +                            true, info->use_dma_api,
info->queue,
> > >                              vp_notify, callback, name);
> > >     if (!vq) {
> > >             err = -ENOMEM;
> > > @@ -463,7 +518,12 @@ out_assign:
> > >     vring_del_virtqueue(vq);
> > >  out_activate_queue:
> > >     iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > > -   free_pages_exact(info->queue, size);
> > > +   if (info->use_dma_api) {
> > > +           dma_free_coherent(vdev->dev.parent, size,
> > > +                             info->queue,
info->queue_dma_addr);
> > > +   } else {
> > > +           free_pages_exact(info->queue, PAGE_ALIGN(size));
> > > +   }
> > >  out_info:
> > >     kfree(info);
> > >     return ERR_PTR(err);
> > > @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
> > >     /* Select and deactivate the queue */
> > >     iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > >
> > > -   size = PAGE_ALIGN(vring_size(info->num,
VIRTIO_PCI_VRING_ALIGN));
> > > -   free_pages_exact(info->queue, size);
> > > +   size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> > > +   if (info->use_dma_api) {
> > > +           dma_free_coherent(vq->vdev->dev.parent, size,
> > > +                             info->queue,
info->queue_dma_addr);
> > > +   } else {
> > > +           free_pages_exact(info->queue, PAGE_ALIGN(size));
> > > +   }
> > >     kfree(info);
> > >  }
> > >
> > > @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev
*pci_dev,
> > >     if (err)
> > >             goto out;
> > >
> > > +   err = dma_set_mask_and_coherent(&pci_dev->dev,
DMA_BIT_MASK(64));
> > > +   if (err)
> > > +           err = dma_set_mask_and_coherent(&pci_dev->dev,
> > > +                                           DMA_BIT_MASK(32));
> > > +   if (err)
> > > +           dev_warn(&pci_dev->dev, "Failed to enable
64-bit or 32-bit DMA.  Trying to continue, but this might not work.\n");
> > > +
> > >     err = pci_request_regions(pci_dev, "virtio-pci");
> > >     if (err)
> > >             goto out_enable_device;
> >

Benjamin Herrenschmidt

2014-Sep-19 21:31 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Wed, 2014-09-17 at 17:16 +0300, Michael S. Tsirkin
wrote:> On Wed, Sep 17, 2014 at 08:02:31AM -0400, Benjamin Herrenschmidt wrote:
> > On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski wrote:
> > > On non-PPC systems, virtio_pci should use the DMA API.  This
fixes
> > > virtio_pci on Xen.  On PPC, using the DMA API would break things,
so
> > > we need to preserve the old behavior.
> > > 
> > > The big comment in this patch explains the considerations in more
> > > detail.
> > 
> > I still disagree with using CONFIG_PPC as a trigger here.
> > 
> > Fundamentally, the qemu implementation today bypasses IOMMUs on all
> > platforms as far as I can tell.
> > 
> > If that changes, we'll have a backward compatibility problem.
> > 
> > The virtio device should advertise whether it's using that bypass
> > mode of operation and virtio_pci should react accordingly.
> 
> Well if there's a way to detect that - that's outside the
> device, then we probably shouldn't use device-specific
> interfaces for this capability.
Not necessarily. virtio is "special" in that regard, and it's an
attribute of a given virtio "server" implementation as to whether it
honors or bypasses the system or bus iommu.
> > There is a demand for being able to operate on top of an IOMMU on
> > powerpc as well for some embedded stuff using PCI as a transport so
your
> > patch precludes that.
> > 
> > Cheers,
> > Ben.
> 
> As far as I can see, nothing changes on PPC so this patch
> does not preclude anything that was working?
Today but breaks some other intended usages. "PPC" is a very wide
scope.
Some "PPC" users want to use virtio for communicating between machines
over a non-transparent PCI setup an that requires using the iommu for
example. We have other things coming down the pipe that will require the
use of the DMA mapping API as well.
> > > 
> > > Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> > > ---
> > >  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 81 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_pci.c
b/drivers/virtio/virtio_pci.c
> > > index a1f299fa4626..8ddb0a641878 100644
> > > --- a/drivers/virtio/virtio_pci.c
> > > +++ b/drivers/virtio/virtio_pci.c
> > > @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
> > >  	/* the number of entries in the queue */
> > >  	int num;
> > >  
> > > -	/* the virtual address of the ring queue */
> > > -	void *queue;
> > > +	/* the ring queue */
> > > +	void *queue;			/* virtual address */
> > > +	dma_addr_t queue_dma_addr;	/* bus address */
> > > +	bool use_dma_api;		/* are we using the DMA API? */
> > >  
> > >  	/* the list node for the virtqueues list */
> > >  	struct list_head node;
> > > @@ -388,6 +390,50 @@ static int vp_request_intx(struct
virtio_device *vdev)
> > >  	return err;
> > >  }
> > >  
> > > +static bool vp_use_dma_api(void)
> > > +{
> > > +	/*
> > > +	 * Due to limitations of the DMA API, we only have two choices:
> > > +	 * use the DMA API (e.g. set up IOMMU mappings or apply
Xen's
> > > +	 * physical-to-machine translation) or use direct physical
> > > +	 * addressing.  Furthermore, there's no sensible way yet
for the
> > > +	 * PCI bus code to tell us whether we're supposed to act
like a
> > > +	 * normal PCI device (and use the DMA API) or to do something
> > > +	 * else.  So we're stuck with heuristics here.
> > > +	 *
> > > +	 * In general, we would prefer to use the DMA API, since we
> > > +	 * might be driving a physical device, and such devices *must*
> > > +	 * use the DMA API if there is an IOMMU involved.
> > > +	 *
> > > +	 * On x86, there are no physically-mapped emulated virtio PCI
> > > +	 * devices that live behind an IOMMU.  On ARM, there don't
seem
> > > +	 * to be any hypervisors that use virtio_pci (as opposed to
> > > +	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
> > > +	 * API is safe.
> > > +	 *
> > > +	 * On PowerPC, it's the other way around.  There usually is
an
> > > +	 * IOMMU between us and the virtio PCI device, but the device
is
> > > +	 * probably emulated and ignores the IOMMU.  Unfortunately, we
> > > +	 * can't tell whether we're talking to an emulated
device or to
> > > +	 * a physical device that really lives behind the IOMMU.  That
> > > +	 * means that we're stuck with ignoring the DMA API.
> > > +	 */
> > > +
> > > +#ifdef CONFIG_PPC
> > > +	return false;
> > > +#else
> > > +	/*
> > > +	 * Minor optimization: if the platform promises to have
physical
> > > +	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> > > +	 * platform's DMA API implementation is well optimized,
this
> > > +	 * should have almost no effect, but we already have a branch
in
> > > +	 * the vring code, and we can avoid any further indirection
with
> > > +	 * very little effort.
> > > +	 */
> > > +	return !PCI_DMA_BUS_IS_PHYS;
> > > +#endif
> > > +}
> > > +
> > >  static struct virtqueue *setup_vq(struct virtio_device *vdev,
unsigned index,
> > >  				  void (*callback)(struct virtqueue *vq),
> > >  				  const char *name,
> > > @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
> > >  
> > >  	info->num = num;
> > >  	info->msix_vector = msix_vec;
> > > +	info->use_dma_api = vp_use_dma_api();
> > >  
> > > -	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> > > -	info->queue = alloc_pages_exact(size,
GFP_KERNEL|__GFP_ZERO);
> > > +	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> > > +	if (info->use_dma_api) {
> > > +		info->queue = dma_zalloc_coherent(vdev->dev.parent,
size,
> > > +						  &info->queue_dma_addr,
> > > +						  GFP_KERNEL);
> > > +	} else {
> > > +		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> > > +						GFP_KERNEL|__GFP_ZERO);
> > > +		info->queue_dma_addr = virt_to_phys(info->queue);
> > > +	}
> > >  	if (info->queue == NULL) {
> > >  		err = -ENOMEM;
> > >  		goto out_info;
> > >  	}
> > >  
> > >  	/* activate the queue */
> > > -	iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > > +	iowrite32(info->queue_dma_addr >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > >  		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > >  
> > >  	/* create the vring */
> > >  	vq = vring_new_virtqueue(index, info->num,
VIRTIO_PCI_VRING_ALIGN, vdev,
> > > -				 true, false, info->queue,
> > > +				 true, info->use_dma_api, info->queue,
> > >  				 vp_notify, callback, name);
> > >  	if (!vq) {
> > >  		err = -ENOMEM;
> > > @@ -463,7 +518,12 @@ out_assign:
> > >  	vring_del_virtqueue(vq);
> > >  out_activate_queue:
> > >  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > > -	free_pages_exact(info->queue, size);
> > > +	if (info->use_dma_api) {
> > > +		dma_free_coherent(vdev->dev.parent, size,
> > > +				  info->queue, info->queue_dma_addr);
> > > +	} else {
> > > +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> > > +	}
> > >  out_info:
> > >  	kfree(info);
> > >  	return ERR_PTR(err);
> > > @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
> > >  	/* Select and deactivate the queue */
> > >  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > >  
> > > -	size = PAGE_ALIGN(vring_size(info->num,
VIRTIO_PCI_VRING_ALIGN));
> > > -	free_pages_exact(info->queue, size);
> > > +	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> > > +	if (info->use_dma_api) {
> > > +		dma_free_coherent(vq->vdev->dev.parent, size,
> > > +				  info->queue, info->queue_dma_addr);
> > > +	} else {
> > > +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> > > +	}
> > >  	kfree(info);
> > >  }
> > >  
> > > @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev
*pci_dev,
> > >  	if (err)
> > >  		goto out;
> > >  
> > > +	err = dma_set_mask_and_coherent(&pci_dev->dev,
DMA_BIT_MASK(64));
> > > +	if (err)
> > > +		err = dma_set_mask_and_coherent(&pci_dev->dev,
> > > +						DMA_BIT_MASK(32));
> > > +	if (err)
> > > +		dev_warn(&pci_dev->dev, "Failed to enable 64-bit
or 32-bit DMA.  Trying to continue, but this might not work.\n");
> > > +
> > >  	err = pci_request_regions(pci_dev, "virtio-pci");
> > >  	if (err)
> > >  		goto out_enable_device;
> >

Andy Lutomirski

2014-Sep-29 18:55 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Wed, Sep 17, 2014 at 7:16 AM, Michael S. Tsirkin <mst at redhat.com>
wrote:> On Wed, Sep 17, 2014 at 08:02:31AM -0400, Benjamin Herrenschmidt wrote:
>> On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski wrote:
>> > On non-PPC systems, virtio_pci should use the DMA API.  This fixes
>> > virtio_pci on Xen.  On PPC, using the DMA API would break things,
so
>> > we need to preserve the old behavior.
>> >
>> > The big comment in this patch explains the considerations in more
>> > detail.
>>
>> I still disagree with using CONFIG_PPC as a trigger here.
>>
>> Fundamentally, the qemu implementation today bypasses IOMMUs on all
>> platforms as far as I can tell.
>>
>> If that changes, we'll have a backward compatibility problem.
>>
>> The virtio device should advertise whether it's using that bypass
>> mode of operation and virtio_pci should react accordingly.
>
> Well if there's a way to detect that - that's outside the
> device, then we probably shouldn't use device-specific
> interfaces for this capability.
>
>
>> There is a demand for being able to operate on top of an IOMMU on
>> powerpc as well for some embedded stuff using PCI as a transport so
your
>> patch precludes that.
>>
>> Cheers,
>> Ben.
>
> As far as I can see, nothing changes on PPC so this patch
> does not preclude anything that was working?
Rusty and Michael, what's the status of this?

I think that (aside from the trivial DMI/DMA typo) the only real issue
here is that the situation on PPC is ugly.  We're failing to enable
physical virtio hardware on PPC with these patches, but that never
worked anyway.  I don't think that there are any regressions other
than ugliness.

My preference would be to apply the patches as is (or with "DMA"
spelled correctly), and then to:

 - Make sure that all virtio-mmio systems have working DMA ops so that
virtio-mmio can the DMA API

 - Fix the DMA API on s390 (probably easy) and on PPC (not necessarily so easy)

 - Remove the non-DMA-API code, which would be a very small change on
top of these patches.

--Andy

Reasonably Related Threads

Search for more possibly parallel threads

Linux Virtualization - Sep 2014 - [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

Reasonably Related Threads