thr3ads.net - Virtualization - [PATCH v5 0/3] virtio: Use the DMA API when appropriate [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2014-Sep-17 05:22 UTC

[PATCH v5 0/3] virtio: Use the DMA API when appropriate

This fixes virtio on Xen guests as well as on any other platform
that uses virtio_pci on which physical addresses don't match bus
addresses.

This can be tested with:

    virtme-run --xen xen --kimg arch/x86/boot/bzImage --console

using virtme from here:

    https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git

Without these patches, the guest hangs forever.  With these patches,
everything works.

This should be safe on all platforms that I'm aware of.  That
doesn't mean that there isn't anything that I missed.

Applies to net-next.

Changes from v4:
 - Rebased onto net-next.
 - Dropped the sg end mark changes from virtio_net, as that issue
   is solved differently in net-next.
 - virtio_pci does not use the DMA API on powerpc, for reasons that
   are explained in a detailed comment in virtio_ring.

Changes from v3:
 - virtio_pci only asks virtio_ring to use the DMA API if
   !PCI_DMA_BUS_IS_PHYS.
 - Reduce tools/virtio breakage.  It's now merely as broken as before
   instead of being even more broken.
 - Drop the sg_next changes -- Rusty's version is better.

Changes from v2:
 - Reordered patches.
 - Fixed a virtio_net OOPS.

Changes from v1:
 - Using the DMA API is optional now.  It would be nice to improve the
   DMA API to the point that it could be used unconditionally, but s390
   proves that we're not there yet.
 - Includes patch 4, which fixes DMA debugging warnings from virtio_net.

Andy Lutomirski (3):
  virtio_ring: Support DMA APIs if requested
  virtio_pci: Use the DMA API for virtqueues when possible
  virtio_net: Stop doing DMA from the stack

 drivers/lguest/lguest_device.c         |   3 +-
 drivers/misc/mic/card/mic_virtio.c     |   2 +-
 drivers/net/virtio_net.c               |  53 ++++++---
 drivers/remoteproc/remoteproc_virtio.c |   4 +-
 drivers/s390/kvm/kvm_virtio.c          |   2 +-
 drivers/s390/kvm/virtio_ccw.c          |   4 +-
 drivers/virtio/virtio_mmio.c           |   5 +-
 drivers/virtio/virtio_pci.c            |  91 ++++++++++++++--
 drivers/virtio/virtio_ring.c           | 194 +++++++++++++++++++++++++++------
 include/linux/virtio_ring.h            |   1 +
 tools/virtio/linux/dma-mapping.h       |  17 +++
 tools/virtio/linux/virtio.h            |   1 +
 tools/virtio/virtio_test.c             |   2 +-
 tools/virtio/vringh_test.c             |   3 +-
 14 files changed, 314 insertions(+), 68 deletions(-)
 create mode 100644 tools/virtio/linux/dma-mapping.h

-- 
1.9.3

Andy Lutomirski

2014-Sep-17 05:22 UTC

head link

[PATCH v5 1/3] virtio_ring: Support DMA APIs if requested

virtio_ring currently sends the device (usually a hypervisor)
physical addresses of its I/O buffers.  This is okay when DMA
addresses and physical addresses are the same thing, but this isn't
always the case.  For example, this never works on Xen guests, and
it is likely to fail if a physical "virtio" device ever ends up
behind an IOMMU or swiotlb.

The immediate use case for me is to enable virtio on Xen guests.
For that to work, we need vring to support DMA address translation
as well as a corresponding change to virtio_pci or to another
driver.

With this patch, if enabled, virtfs survives kmemleak and
CONFIG_DMA_API_DEBUG.  virtio-net warns (correctly) about DMA from
the stack in virtnet_set_rx_mode.

This explicitly supports !CONFIG_HAS_DMA.  If vring is asked to use
the DMA API and CONFIG_HAS_DMA is not set, then vring will refuse to
create the virtqueue.

Signed-off-by: Andy Lutomirski <luto at amacapital.net>
---
 drivers/lguest/lguest_device.c         |   3 +-
 drivers/misc/mic/card/mic_virtio.c     |   2 +-
 drivers/remoteproc/remoteproc_virtio.c |   4 +-
 drivers/s390/kvm/kvm_virtio.c          |   2 +-
 drivers/s390/kvm/virtio_ccw.c          |   4 +-
 drivers/virtio/virtio_mmio.c           |   5 +-
 drivers/virtio/virtio_pci.c            |   3 +-
 drivers/virtio/virtio_ring.c           | 194 +++++++++++++++++++++++++++------
 include/linux/virtio_ring.h            |   1 +
 tools/virtio/linux/dma-mapping.h       |  17 +++
 tools/virtio/linux/virtio.h            |   1 +
 tools/virtio/virtio_test.c             |   2 +-
 tools/virtio/vringh_test.c             |   3 +-
 13 files changed, 198 insertions(+), 43 deletions(-)
 create mode 100644 tools/virtio/linux/dma-mapping.h

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index d0a1d8a45c81..f0eafbe82ed4 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -301,7 +301,8 @@ static struct virtqueue *lg_find_vq(struct virtio_device
*vdev,
 	 * barriers.
 	 */
 	vq = vring_new_virtqueue(index, lvq->config.num, LGUEST_VRING_ALIGN, vdev,
-				 true, lvq->pages, lg_notify, callback, name);
+				 true, false, lvq->pages,
+				 lg_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto unmap;
diff --git a/drivers/misc/mic/card/mic_virtio.c
b/drivers/misc/mic/card/mic_virtio.c
index f14b60080c21..d633964417b1 100644
--- a/drivers/misc/mic/card/mic_virtio.c
+++ b/drivers/misc/mic/card/mic_virtio.c
@@ -256,7 +256,7 @@ static struct virtqueue *mic_find_vq(struct virtio_device
*vdev,
 	mvdev->vr[index] = va;
 	memset_io(va, 0x0, _vr_size);
 	vq = vring_new_virtqueue(index, le16_to_cpu(config.num),
-				 MIC_VIRTIO_RING_ALIGN, vdev, false,
+				 MIC_VIRTIO_RING_ALIGN, vdev, false, false,
 				 (void __force *)va, mic_notify, callback,
 				 name);
 	if (!vq) {
diff --git a/drivers/remoteproc/remoteproc_virtio.c
b/drivers/remoteproc/remoteproc_virtio.c
index a34b50690b4e..e31f2fefa76e 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -107,8 +107,8 @@ static struct virtqueue *rp_find_vq(struct virtio_device
*vdev,
 	 * Create the new vq, and tell virtio we're not interested in
 	 * the 'weak' smp barriers, since we're talking with a real
device.
 	 */
-	vq = vring_new_virtqueue(id, len, rvring->align, vdev, false, addr,
-					rproc_virtio_notify, callback, name);
+	vq = vring_new_virtqueue(id, len, rvring->align, vdev, false, false,
+				 addr, rproc_virtio_notify, callback, name);
 	if (!vq) {
 		dev_err(dev, "vring_new_virtqueue %s failed\n", name);
 		rproc_free_vring(rvring);
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index a1349653c6d9..91abcdc196d0 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -206,7 +206,7 @@ static struct virtqueue *kvm_find_vq(struct virtio_device
*vdev,
 		goto out;
 
 	vq = vring_new_virtqueue(index, config->num, KVM_S390_VIRTIO_RING_ALIGN,
-				 vdev, true, (void *) config->address,
+				 vdev, true, false, (void *) config->address,
 				 kvm_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index d2c0b442bce5..2462a443358a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -478,8 +478,8 @@ static struct virtqueue *virtio_ccw_setup_vq(struct
virtio_device *vdev,
 	}
 
 	vq = vring_new_virtqueue(i, info->num, KVM_VIRTIO_CCW_RING_ALIGN, vdev,
-				 true, info->queue, virtio_ccw_kvm_notify,
-				 callback, name);
+				 true, false, info->queue,
+				 virtio_ccw_kvm_notify, callback, name);
 	if (!vq) {
 		/* For now, we fail if we can't get the requested size. */
 		dev_warn(&vcdev->cdev->dev, "no vq\n");
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index c600ccfd6922..693254e52a5d 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -366,8 +366,9 @@ static struct virtqueue *vm_setup_vq(struct virtio_device
*vdev, unsigned index,
 			vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
 
 	/* Create the vring */
-	vq = vring_new_virtqueue(index, info->num, VIRTIO_MMIO_VRING_ALIGN, vdev,
-				 true, info->queue, vm_notify, callback, name);
+	vq = vring_new_virtqueue(index, info->num, VIRTIO_MMIO_VRING_ALIGN,
+				 vdev, true, false, info->queue,
+				 vm_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto error_new_virtqueue;
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 3d1463c6b120..a1f299fa4626 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -430,7 +430,8 @@ static struct virtqueue *setup_vq(struct virtio_device
*vdev, unsigned index,
 
 	/* create the vring */
 	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
-				 true, info->queue, vp_notify, callback, name);
+				 true, false, info->queue,
+				 vp_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto out_activate_queue;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3b1f89b6e743..babcaea1afa0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -24,6 +24,7 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 #include <linux/kmemleak.h>
+#include <linux/dma-mapping.h>
 
 #ifdef DEBUG
 /* For development, we want to crash whenever the ring is screwed. */
@@ -54,6 +55,12 @@
 #define END_USE(vq)
 #endif
 
+struct vring_desc_state
+{
+	void *data;			/* Data for callback. */
+	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+};
+
 struct vring_virtqueue
 {
 	struct virtqueue vq;
@@ -64,6 +71,9 @@ struct vring_virtqueue
 	/* Can we use weak barriers? */
 	bool weak_barriers;
 
+	/* Should we use the DMA API? */
+	bool use_dma_api;
+
 	/* Other side has made a mess, don't try any more. */
 	bool broken;
 
@@ -93,12 +103,81 @@ struct vring_virtqueue
 	ktime_t last_add_time;
 #endif
 
-	/* Tokens for callbacks. */
-	void *data[];
+	/* Per-descriptor state. */
+	struct vring_desc_state desc_state[];
 };
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+/* Map one sg entry. */
+static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
+				   struct scatterlist *sg,
+				   enum dma_data_direction direction)
+{
+#ifdef CONFIG_HAS_DMA
+	/*
+	 * We can't use dma_map_sg, because we don't use scatterlists in
+	 * the way it expects (we sometimes use unterminated
+	 * scatterlists, and we don't guarantee that the scatterlist
+	 * will exist for the lifetime of the mapping.
+	 */
+	if (vq->use_dma_api)
+		return dma_map_page(vq->vq.vdev->dev.parent,
+				    sg_page(sg), sg->offset, sg->length,
+				    direction);
+#endif
+
+	return sg_phys(sg);
+}
+
+static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
+				   void *cpu_addr, size_t size,
+				   enum dma_data_direction direction)
+{
+#ifdef CONFIG_HAS_DMA
+	if (vq->use_dma_api)
+		return dma_map_single(vq->vq.vdev->dev.parent,
+				      cpu_addr, size,
+				      direction);
+#endif
+
+	/* avoid kmemleak false positive (hidden by virt_to_phys) */
+	kmemleak_ignore(cpu_addr);
+	return virt_to_phys(cpu_addr);
+}
+
+static void vring_unmap_one(const struct vring_virtqueue *vq,
+			    struct vring_desc *desc)
+{
+#ifdef CONFIG_HAS_DMA
+	if (!vq->use_dma_api)
+		return;		/* Nothing to do. */
+
+	if (desc->flags & VRING_DESC_F_INDIRECT) {
+		dma_unmap_single(vq->vq.vdev->dev.parent,
+				 desc->addr, desc->len,
+				 (desc->flags & VRING_DESC_F_WRITE) ?
+				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	} else {
+		dma_unmap_page(vq->vq.vdev->dev.parent,
+			       desc->addr, desc->len,
+			       (desc->flags & VRING_DESC_F_WRITE) ?
+			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+#endif
+}
+
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+			       dma_addr_t addr)
+{
+#ifdef CONFIG_HAS_DMA
+	return vq->use_dma_api &&
+		dma_mapping_error(vq->vq.vdev->dev.parent, addr);
+#else
+	return 0;
+#endif
+}
+
 static struct vring_desc *alloc_indirect(unsigned int total_sg, gfp_t gfp)
 {
 	struct vring_desc *desc;
@@ -131,7 +210,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct scatterlist *sg;
 	struct vring_desc *desc;
-	unsigned int i, n, avail, descs_used, uninitialized_var(prev);
+	unsigned int i, n, avail, descs_used, uninitialized_var(prev), err_idx;
 	int head;
 	bool indirect;
 
@@ -171,21 +250,25 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
 	if (desc) {
 		/* Use a single buffer which doesn't continue */
+		indirect = true;
 		vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
-		vq->vring.desc[head].addr = virt_to_phys(desc);
-		/* avoid kmemleak false positive (hidden by virt_to_phys) */
-		kmemleak_ignore(desc);
+		vq->vring.desc[head].addr = vring_map_single(
+			vq,
+			desc, total_sg * sizeof(struct vring_desc),
+			DMA_TO_DEVICE);
+		if (vring_mapping_error(vq, vq->vring.desc[head].addr))
+			goto unmap_free_indir;
+
 		vq->vring.desc[head].len = total_sg * sizeof(struct vring_desc);
 
 		/* Set up rest to use this indirect table. */
 		i = 0;
 		descs_used = 1;
-		indirect = true;
 	} else {
+		indirect = false;
 		desc = vq->vring.desc;
 		i = head;
 		descs_used = total_sg;
-		indirect = false;
 	}
 
 	if (vq->vq.num_free < descs_used) {
@@ -200,14 +283,13 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 		return -ENOSPC;
 	}
 
-	/* We're about to use some buffers from the free list. */
-	vq->vq.num_free -= descs_used;
-
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			desc[i].flags = VRING_DESC_F_NEXT;
-			desc[i].addr = sg_phys(sg);
+			desc[i].addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
 			desc[i].len = sg->length;
+			if (vring_mapping_error(vq, desc[i].addr))
+				goto unmap_release;
 			prev = i;
 			i = desc[i].next;
 		}
@@ -215,8 +297,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-			desc[i].addr = sg_phys(sg);
+			desc[i].addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
 			desc[i].len = sg->length;
+			if (vring_mapping_error(vq, desc[i].addr))
+				goto unmap_release;
 			prev = i;
 			i = desc[i].next;
 		}
@@ -224,14 +308,19 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	/* Last one doesn't continue. */
 	desc[prev].flags &= ~VRING_DESC_F_NEXT;
 
+	/* We're using some buffers from the free list. */
+	vq->vq.num_free -= descs_used;
+
 	/* Update free pointer */
 	if (indirect)
 		vq->free_head = vq->vring.desc[head].next;
 	else
 		vq->free_head = i;
 
-	/* Set token. */
-	vq->data[head] = data;
+	/* Store token and indirect buffer state. */
+	vq->desc_state[head].data = data;
+	if (indirect)
+		vq->desc_state[head].indir_desc = desc;
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
@@ -253,6 +342,28 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	END_USE(vq);
 
 	return 0;
+
+unmap_release:
+	err_idx = i;
+	i = head;
+
+	for (n = 0; n < total_sg; n++) {
+		if (i == err_idx)
+			break;
+		vring_unmap_one(vq, &desc[i]);
+		i = vq->vring.desc[i].next;
+	}
+
+	vq->vq.num_free += total_sg;
+
+	if (indirect)
+		vring_unmap_one(vq, desc);
+
+unmap_free_indir:
+	if (indirect)
+		kfree(desc);
+
+	return -EIO;
 }
 
 /**
@@ -423,27 +534,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick);
 
 static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 {
-	unsigned int i;
+	unsigned int i, j;
 
 	/* Clear data ptr. */
-	vq->data[head] = NULL;
+	vq->desc_state[head].data = NULL;
 
-	/* Put back on free list: find end */
+	/* Put back on free list: unmap first-level descriptors and find end */
 	i = head;
 
-	/* Free the indirect table */
-	if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
-		kfree(phys_to_virt(vq->vring.desc[i].addr));
-
 	while (vq->vring.desc[i].flags & VRING_DESC_F_NEXT) {
+		vring_unmap_one(vq, &vq->vring.desc[i]);
 		i = vq->vring.desc[i].next;
 		vq->vq.num_free++;
 	}
 
+	vring_unmap_one(vq, &vq->vring.desc[i]);
 	vq->vring.desc[i].next = vq->free_head;
 	vq->free_head = head;
+
 	/* Plus final descriptor */
 	vq->vq.num_free++;
+
+	/* Free the indirect table, if any, now that it's unmapped. */
+	if (vq->desc_state[head].indir_desc) {
+		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
+		u32 len = vq->vring.desc[head].len;
+
+		BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT));
+		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
+
+		if (vq->use_dma_api)
+			for (j = 0; j < len / sizeof(struct vring_desc); j++)
+				vring_unmap_one(vq, &indir_desc[j]);
+
+		kfree(vq->desc_state[head].indir_desc);
+		vq->desc_state[head].indir_desc = NULL;
+	}
 }
 
 static inline bool more_used(const struct vring_virtqueue *vq)
@@ -498,13 +624,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned
int *len)
 		BAD_RING(vq, "id %u out of range\n", i);
 		return NULL;
 	}
-	if (unlikely(!vq->data[i])) {
+	if (unlikely(!vq->desc_state[i].data)) {
 		BAD_RING(vq, "id %u is not a head!\n", i);
 		return NULL;
 	}
 
 	/* detach_buf clears data, so grab it now. */
-	ret = vq->data[i];
+	ret = vq->desc_state[i].data;
 	detach_buf(vq, i);
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
@@ -665,10 +791,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 	START_USE(vq);
 
 	for (i = 0; i < vq->vring.num; i++) {
-		if (!vq->data[i])
+		if (!vq->desc_state[i].data)
 			continue;
 		/* detach_buf clears data, so grab it now. */
-		buf = vq->data[i];
+		buf = vq->desc_state[i].data;
 		detach_buf(vq, i);
 		vq->vring.avail->idx--;
 		END_USE(vq);
@@ -707,6 +833,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
 				      bool weak_barriers,
+				      bool use_dma_api,
 				      void *pages,
 				      bool (*notify)(struct virtqueue *),
 				      void (*callback)(struct virtqueue *),
@@ -721,7 +848,13 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 		return NULL;
 	}
 
-	vq = kmalloc(sizeof(*vq) + sizeof(void *)*num, GFP_KERNEL);
+#ifndef CONFIG_HAS_DMA
+	if (use_dma_api)
+		return NULL;
+#endif
+
+	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
+		     GFP_KERNEL);
 	if (!vq)
 		return NULL;
 
@@ -733,6 +866,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 	vq->vq.index = index;
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
+	vq->use_dma_api = use_dma_api;
 	vq->broken = false;
 	vq->last_used_idx = 0;
 	vq->num_added = 0;
@@ -751,11 +885,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 
 	/* Put everything in free lists. */
 	vq->free_head = 0;
-	for (i = 0; i < num-1; i++) {
+	for (i = 0; i < num-1; i++)
 		vq->vring.desc[i].next = i+1;
-		vq->data[i] = NULL;
-	}
-	vq->data[i] = NULL;
+	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
 
 	return &vq->vq;
 }
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 67e06fe18c03..60f761a38a09 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -70,6 +70,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
 				      bool weak_barriers,
+				      bool use_dma_api,
 				      void *pages,
 				      bool (*notify)(struct virtqueue *vq),
 				      void (*callback)(struct virtqueue *vq),
diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h
new file mode 100644
index 000000000000..4f93af89ae16
--- /dev/null
+++ b/tools/virtio/linux/dma-mapping.h
@@ -0,0 +1,17 @@
+#ifndef _LINUX_DMA_MAPPING_H
+#define _LINUX_DMA_MAPPING_H
+
+#ifdef CONFIG_HAS_DMA
+# error Virtio userspace code does not support CONFIG_HAS_DMA
+#endif
+
+#define PCI_DMA_BUS_IS_PHYS 1
+
+enum dma_data_direction {
+	DMA_BIDIRECTIONAL = 0,
+	DMA_TO_DEVICE = 1,
+	DMA_FROM_DEVICE = 2,
+	DMA_NONE = 3,
+};
+
+#endif
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 5a2d1f0f6bc7..5d42dc6a6201 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -78,6 +78,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
 				      bool weak_barriers,
+				      bool use_dma_api,
 				      void *pages,
 				      bool (*notify)(struct virtqueue *vq),
 				      void (*callback)(struct virtqueue *vq),
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index 00ea679b3826..860cc89900a7 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -99,7 +99,7 @@ static void vq_info_add(struct vdev_info *dev, int num)
 	vring_init(&info->vring, num, info->ring, 4096);
 	info->vq = vring_new_virtqueue(info->idx,
 				       info->vring.num, 4096, &dev->vdev,
-				       true, info->ring,
+				       true, false, info->ring,
 				       vq_notify, vq_callback, "test");
 	assert(info->vq);
 	info->vq->priv = info;
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
index 14a4f4cab5b9..67d3c3a1ba88 100644
--- a/tools/virtio/vringh_test.c
+++ b/tools/virtio/vringh_test.c
@@ -312,7 +312,8 @@ static int parallel_test(unsigned long features,
 		if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
 			err(1, "Could not set affinity to cpu %u", first_cpu);
 
-		vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
+		vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev,
+					 true, false,
 					 guest_map, fast_vringh ? no_notify_host
 					 : parallel_notify_host,
 					 never_callback_guest, "guest vq");
-- 
1.9.3

Andy Lutomirski

2014-Sep-17 05:22 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On non-PPC systems, virtio_pci should use the DMA API.  This fixes
virtio_pci on Xen.  On PPC, using the DMA API would break things, so
we need to preserve the old behavior.

The big comment in this patch explains the considerations in more
detail.

Signed-off-by: Andy Lutomirski <luto at amacapital.net>
---
 drivers/virtio/virtio_pci.c | 90 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 81 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a1f299fa4626..8ddb0a641878 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -80,8 +80,10 @@ struct virtio_pci_vq_info
 	/* the number of entries in the queue */
 	int num;
 
-	/* the virtual address of the ring queue */
-	void *queue;
+	/* the ring queue */
+	void *queue;			/* virtual address */
+	dma_addr_t queue_dma_addr;	/* bus address */
+	bool use_dma_api;		/* are we using the DMA API? */
 
 	/* the list node for the virtqueues list */
 	struct list_head node;
@@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev)
 	return err;
 }
 
+static bool vp_use_dma_api(void)
+{
+	/*
+	 * Due to limitations of the DMA API, we only have two choices:
+	 * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
+	 * physical-to-machine translation) or use direct physical
+	 * addressing.  Furthermore, there's no sensible way yet for the
+	 * PCI bus code to tell us whether we're supposed to act like a
+	 * normal PCI device (and use the DMA API) or to do something
+	 * else.  So we're stuck with heuristics here.
+	 *
+	 * In general, we would prefer to use the DMA API, since we
+	 * might be driving a physical device, and such devices *must*
+	 * use the DMA API if there is an IOMMU involved.
+	 *
+	 * On x86, there are no physically-mapped emulated virtio PCI
+	 * devices that live behind an IOMMU.  On ARM, there don't seem
+	 * to be any hypervisors that use virtio_pci (as opposed to
+	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
+	 * API is safe.
+	 *
+	 * On PowerPC, it's the other way around.  There usually is an
+	 * IOMMU between us and the virtio PCI device, but the device is
+	 * probably emulated and ignores the IOMMU.  Unfortunately, we
+	 * can't tell whether we're talking to an emulated device or to
+	 * a physical device that really lives behind the IOMMU.  That
+	 * means that we're stuck with ignoring the DMA API.
+	 */
+
+#ifdef CONFIG_PPC
+	return false;
+#else
+	/*
+	 * Minor optimization: if the platform promises to have physical
+	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
+	 * platform's DMA API implementation is well optimized, this
+	 * should have almost no effect, but we already have a branch in
+	 * the vring code, and we can avoid any further indirection with
+	 * very little effort.
+	 */
+	return !PCI_DMA_BUS_IS_PHYS;
+#endif
+}
+
 static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 				  void (*callback)(struct virtqueue *vq),
 				  const char *name,
@@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct virtio_device
*vdev, unsigned index,
 
 	info->num = num;
 	info->msix_vector = msix_vec;
+	info->use_dma_api = vp_use_dma_api();
 
-	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
-	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
+	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
+	if (info->use_dma_api) {
+		info->queue = dma_zalloc_coherent(vdev->dev.parent, size,
+						  &info->queue_dma_addr,
+						  GFP_KERNEL);
+	} else {
+		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
+						GFP_KERNEL|__GFP_ZERO);
+		info->queue_dma_addr = virt_to_phys(info->queue);
+	}
 	if (info->queue == NULL) {
 		err = -ENOMEM;
 		goto out_info;
 	}
 
 	/* activate the queue */
-	iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
+	iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
 		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
 	/* create the vring */
 	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
-				 true, false, info->queue,
+				 true, info->use_dma_api, info->queue,
 				 vp_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
@@ -463,7 +518,12 @@ out_assign:
 	vring_del_virtqueue(vq);
 out_activate_queue:
 	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
-	free_pages_exact(info->queue, size);
+	if (info->use_dma_api) {
+		dma_free_coherent(vdev->dev.parent, size,
+				  info->queue, info->queue_dma_addr);
+	} else {
+		free_pages_exact(info->queue, PAGE_ALIGN(size));
+	}
 out_info:
 	kfree(info);
 	return ERR_PTR(err);
@@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
 	/* Select and deactivate the queue */
 	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
-	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
-	free_pages_exact(info->queue, size);
+	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
+	if (info->use_dma_api) {
+		dma_free_coherent(vq->vdev->dev.parent, size,
+				  info->queue, info->queue_dma_addr);
+	} else {
+		free_pages_exact(info->queue, PAGE_ALIGN(size));
+	}
 	kfree(info);
 }
 
@@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 	if (err)
 		goto out;
 
+	err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
+	if (err)
+		err = dma_set_mask_and_coherent(&pci_dev->dev,
+						DMA_BIT_MASK(32));
+	if (err)
+		dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit DMA. 
Trying to continue, but this might not work.\n");
+
 	err = pci_request_regions(pci_dev, "virtio-pci");
 	if (err)
 		goto out_enable_device;
-- 
1.9.3

Andy Lutomirski

2014-Sep-17 05:22 UTC

head link

[PATCH v5 3/3] virtio_net: Stop doing DMA from the stack

Now that virtio supports real DMA, drivers should play by the rules.
For virtio_net, that means that DMA should be done to and from
dynamically-allocated memory, not the kernel stack.

This should have no effect on any performance-critical code paths.

Signed-off-by: Andy Lutomirski <luto at amacapital.net>
---
 drivers/net/virtio_net.c | 53 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce4468ce6..0bda62be5fe0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -971,31 +971,43 @@ static bool virtnet_send_command(struct virtnet_info *vi,
u8 class, u8 cmd,
 				 struct scatterlist *out)
 {
 	struct scatterlist *sgs[4], hdr, stat;
-	struct virtio_net_ctrl_hdr ctrl;
-	virtio_net_ctrl_ack status = ~0;
+
+	struct {
+		struct virtio_net_ctrl_hdr ctrl;
+		virtio_net_ctrl_ack status;
+	} *buf;
+
 	unsigned out_num = 0, tmp;
+	bool ret;
 
 	/* Caller should know better */
 	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
 
-	ctrl.class = class;
-	ctrl.cmd = cmd;
+	buf = kmalloc(sizeof(*buf), GFP_ATOMIC);
+	if (!buf)
+		return false;
+	buf->status = ~0;
+
+	buf->ctrl.class = class;
+	buf->ctrl.cmd = cmd;
 	/* Add header */
-	sg_init_one(&hdr, &ctrl, sizeof(ctrl));
+	sg_init_one(&hdr, &buf->ctrl, sizeof(buf->ctrl));
 	sgs[out_num++] = &hdr;
 
 	if (out)
 		sgs[out_num++] = out;
 
 	/* Add return status. */
-	sg_init_one(&stat, &status, sizeof(status));
+	sg_init_one(&stat, &buf->status, sizeof(buf->status));
 	sgs[out_num] = &stat;
 
 	BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
 	virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
 
-	if (unlikely(!virtqueue_kick(vi->cvq)))
-		return status == VIRTIO_NET_OK;
+	if (unlikely(!virtqueue_kick(vi->cvq))) {
+		ret = (buf->status == VIRTIO_NET_OK);
+		goto out;
+	}
 
 	/* Spin for a response, the kick causes an ioport write, trapping
 	 * into the hypervisor, so the request should be handled immediately.
@@ -1004,7 +1016,11 @@ static bool virtnet_send_command(struct virtnet_info *vi,
u8 class, u8 cmd,
 	       !virtqueue_is_broken(vi->cvq))
 		cpu_relax();
 
-	return status == VIRTIO_NET_OK;
+	ret = (buf->status == VIRTIO_NET_OK);
+
+out:
+	kfree(buf);
+	return ret;
 }
 
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
@@ -1145,7 +1161,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct scatterlist sg[2];
-	u8 promisc, allmulti;
+	u8 *cmdbyte;
 	struct virtio_net_ctrl_mac *mac_data;
 	struct netdev_hw_addr *ha;
 	int uc_count;
@@ -1157,22 +1173,25 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
 		return;
 
-	promisc = ((dev->flags & IFF_PROMISC) != 0);
-	allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
+	cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC);
+	if (!cmdbyte)
+		return;
 
-	sg_init_one(sg, &promisc, sizeof(promisc));
+	sg_init_one(sg, cmdbyte, sizeof(*cmdbyte));
 
+	*cmdbyte = ((dev->flags & IFF_PROMISC) != 0);
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
 				  VIRTIO_NET_CTRL_RX_PROMISC, sg))
 		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
-			 promisc ? "en" : "dis");
-
-	sg_init_one(sg, &allmulti, sizeof(allmulti));
+			 *cmdbyte ? "en" : "dis");
 
+	*cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0);
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
 				  VIRTIO_NET_CTRL_RX_ALLMULTI, sg))
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
-			 allmulti ? "en" : "dis");
+			 *cmdbyte ? "en" : "dis");
+
+	kfree(cmdbyte);
 
 	uc_count = netdev_uc_count(dev);
 	mc_count = netdev_mc_count(dev);
-- 
1.9.3

Benjamin Herrenschmidt

2014-Sep-17 12:02 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski
wrote:> On non-PPC systems, virtio_pci should use the DMA API.  This fixes
> virtio_pci on Xen.  On PPC, using the DMA API would break things, so
> we need to preserve the old behavior.
> 
> The big comment in this patch explains the considerations in more
> detail.
I still disagree with using CONFIG_PPC as a trigger here.

Fundamentally, the qemu implementation today bypasses IOMMUs on all
platforms as far as I can tell.

If that changes, we'll have a backward compatibility problem.

The virtio device should advertise whether it's using that bypass
mode of operation and virtio_pci should react accordingly.

There is a demand for being able to operate on top of an IOMMU on
powerpc as well for some embedded stuff using PCI as a transport so your
patch precludes that.

Cheers,
Ben.
> 
> Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> ---
>  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 81 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index a1f299fa4626..8ddb0a641878 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
>  	/* the number of entries in the queue */
>  	int num;
>  
> -	/* the virtual address of the ring queue */
> -	void *queue;
> +	/* the ring queue */
> +	void *queue;			/* virtual address */
> +	dma_addr_t queue_dma_addr;	/* bus address */
> +	bool use_dma_api;		/* are we using the DMA API? */
>  
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
> @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static bool vp_use_dma_api(void)
> +{
> +	/*
> +	 * Due to limitations of the DMA API, we only have two choices:
> +	 * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
> +	 * physical-to-machine translation) or use direct physical
> +	 * addressing.  Furthermore, there's no sensible way yet for the
> +	 * PCI bus code to tell us whether we're supposed to act like a
> +	 * normal PCI device (and use the DMA API) or to do something
> +	 * else.  So we're stuck with heuristics here.
> +	 *
> +	 * In general, we would prefer to use the DMA API, since we
> +	 * might be driving a physical device, and such devices *must*
> +	 * use the DMA API if there is an IOMMU involved.
> +	 *
> +	 * On x86, there are no physically-mapped emulated virtio PCI
> +	 * devices that live behind an IOMMU.  On ARM, there don't seem
> +	 * to be any hypervisors that use virtio_pci (as opposed to
> +	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
> +	 * API is safe.
> +	 *
> +	 * On PowerPC, it's the other way around.  There usually is an
> +	 * IOMMU between us and the virtio PCI device, but the device is
> +	 * probably emulated and ignores the IOMMU.  Unfortunately, we
> +	 * can't tell whether we're talking to an emulated device or to
> +	 * a physical device that really lives behind the IOMMU.  That
> +	 * means that we're stuck with ignoring the DMA API.
> +	 */
> +
> +#ifdef CONFIG_PPC
> +	return false;
> +#else
> +	/*
> +	 * Minor optimization: if the platform promises to have physical
> +	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> +	 * platform's DMA API implementation is well optimized, this
> +	 * should have almost no effect, but we already have a branch in
> +	 * the vring code, and we can avoid any further indirection with
> +	 * very little effort.
> +	 */
> +	return !PCI_DMA_BUS_IS_PHYS;
> +#endif
> +}
> +
>  static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned
index,
>  				  void (*callback)(struct virtqueue *vq),
>  				  const char *name,
> @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
>  
>  	info->num = num;
>  	info->msix_vector = msix_vec;
> +	info->use_dma_api = vp_use_dma_api();
>  
> -	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> -	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> +	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		info->queue = dma_zalloc_coherent(vdev->dev.parent, size,
> +						  &info->queue_dma_addr,
> +						  GFP_KERNEL);
> +	} else {
> +		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> +						GFP_KERNEL|__GFP_ZERO);
> +		info->queue_dma_addr = virt_to_phys(info->queue);
> +	}
>  	if (info->queue == NULL) {
>  		err = -ENOMEM;
>  		goto out_info;
>  	}
>  
>  	/* activate the queue */
> -	iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> +	iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
>  		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
>  	/* create the vring */
>  	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN,
vdev,
> -				 true, false, info->queue,
> +				 true, info->use_dma_api, info->queue,
>  				 vp_notify, callback, name);
>  	if (!vq) {
>  		err = -ENOMEM;
> @@ -463,7 +518,12 @@ out_assign:
>  	vring_del_virtqueue(vq);
>  out_activate_queue:
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> -	free_pages_exact(info->queue, size);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  out_info:
>  	kfree(info);
>  	return ERR_PTR(err);
> @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
>  	/* Select and deactivate the queue */
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
> -	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
> -	free_pages_exact(info->queue, size);
> +	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vq->vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  	kfree(info);
>  }
>  
> @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  	if (err)
>  		goto out;
>  
> +	err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
> +	if (err)
> +		err = dma_set_mask_and_coherent(&pci_dev->dev,
> +						DMA_BIT_MASK(32));
> +	if (err)
> +		dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit
DMA.  Trying to continue, but this might not work.\n");
> +
>  	err = pci_request_regions(pci_dev, "virtio-pci");
>  	if (err)
>  		goto out_enable_device;

Ira W. Snyder

2014-Sep-17 16:09 UTC

head link

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

On Tue, Sep 16, 2014 at 10:22:27PM -0700, Andy Lutomirski
wrote:> On non-PPC systems, virtio_pci should use the DMA API.  This fixes
> virtio_pci on Xen.  On PPC, using the DMA API would break things, so
> we need to preserve the old behavior.
> 
> The big comment in this patch explains the considerations in more
> detail.
> 
> Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> ---
>  drivers/virtio/virtio_pci.c | 90
++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 81 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index a1f299fa4626..8ddb0a641878 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -80,8 +80,10 @@ struct virtio_pci_vq_info
>  	/* the number of entries in the queue */
>  	int num;
>  
> -	/* the virtual address of the ring queue */
> -	void *queue;
> +	/* the ring queue */
> +	void *queue;			/* virtual address */
> +	dma_addr_t queue_dma_addr;	/* bus address */
> +	bool use_dma_api;		/* are we using the DMA API? */
>  
>  	/* the list node for the virtqueues list */
>  	struct list_head node;
> @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static bool vp_use_dma_api(void)
> +{
> +	/*
> +	 * Due to limitations of the DMA API, we only have two choices:
> +	 * use the DMA API (e.g. set up IOMMU mappings or apply Xen's
> +	 * physical-to-machine translation) or use direct physical
> +	 * addressing.  Furthermore, there's no sensible way yet for the
> +	 * PCI bus code to tell us whether we're supposed to act like a
> +	 * normal PCI device (and use the DMA API) or to do something
> +	 * else.  So we're stuck with heuristics here.
> +	 *
> +	 * In general, we would prefer to use the DMA API, since we
> +	 * might be driving a physical device, and such devices *must*
> +	 * use the DMA API if there is an IOMMU involved.
> +	 *
> +	 * On x86, there are no physically-mapped emulated virtio PCI
> +	 * devices that live behind an IOMMU.  On ARM, there don't seem
> +	 * to be any hypervisors that use virtio_pci (as opposed to
> +	 * virtio_mmio) that also emulate an IOMMU.  So using the DMI
Hi,

I noticed a typo here. It should say "DMA" not "DMI". Just
thought I'd
point it out.

Ira
> +	 * API is safe.
> +	 *
> +	 * On PowerPC, it's the other way around.  There usually is an
> +	 * IOMMU between us and the virtio PCI device, but the device is
> +	 * probably emulated and ignores the IOMMU.  Unfortunately, we
> +	 * can't tell whether we're talking to an emulated device or to
> +	 * a physical device that really lives behind the IOMMU.  That
> +	 * means that we're stuck with ignoring the DMA API.
> +	 */
> +
> +#ifdef CONFIG_PPC
> +	return false;
> +#else
> +	/*
> +	 * Minor optimization: if the platform promises to have physical
> +	 * PCI DMA, we turn off DMA mapping in virtio_ring.  If the
> +	 * platform's DMA API implementation is well optimized, this
> +	 * should have almost no effect, but we already have a branch in
> +	 * the vring code, and we can avoid any further indirection with
> +	 * very little effort.
> +	 */
> +	return !PCI_DMA_BUS_IS_PHYS;
> +#endif
> +}
> +
>  static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned
index,
>  				  void (*callback)(struct virtqueue *vq),
>  				  const char *name,
> @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct
virtio_device *vdev, unsigned index,
>  
>  	info->num = num;
>  	info->msix_vector = msix_vec;
> +	info->use_dma_api = vp_use_dma_api();
>  
> -	size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> -	info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> +	size = vring_size(num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		info->queue = dma_zalloc_coherent(vdev->dev.parent, size,
> +						  &info->queue_dma_addr,
> +						  GFP_KERNEL);
> +	} else {
> +		info->queue = alloc_pages_exact(PAGE_ALIGN(size),
> +						GFP_KERNEL|__GFP_ZERO);
> +		info->queue_dma_addr = virt_to_phys(info->queue);
> +	}
>  	if (info->queue == NULL) {
>  		err = -ENOMEM;
>  		goto out_info;
>  	}
>  
>  	/* activate the queue */
> -	iowrite32(virt_to_phys(info->queue) >>
VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> +	iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
>  		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
>  	/* create the vring */
>  	vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN,
vdev,
> -				 true, false, info->queue,
> +				 true, info->use_dma_api, info->queue,
>  				 vp_notify, callback, name);
>  	if (!vq) {
>  		err = -ENOMEM;
> @@ -463,7 +518,12 @@ out_assign:
>  	vring_del_virtqueue(vq);
>  out_activate_queue:
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> -	free_pages_exact(info->queue, size);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  out_info:
>  	kfree(info);
>  	return ERR_PTR(err);
> @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq)
>  	/* Select and deactivate the queue */
>  	iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>  
> -	size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN));
> -	free_pages_exact(info->queue, size);
> +	size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN);
> +	if (info->use_dma_api) {
> +		dma_free_coherent(vq->vdev->dev.parent, size,
> +				  info->queue, info->queue_dma_addr);
> +	} else {
> +		free_pages_exact(info->queue, PAGE_ALIGN(size));
> +	}
>  	kfree(info);
>  }
>  
> @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>  	if (err)
>  		goto out;
>  
> +	err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
> +	if (err)
> +		err = dma_set_mask_and_coherent(&pci_dev->dev,
> +						DMA_BIT_MASK(32));
> +	if (err)
> +		dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit
DMA.  Trying to continue, but this might not work.\n");
> +
>  	err = pci_request_regions(pci_dev, "virtio-pci");
>  	if (err)
>  		goto out_enable_device;
> -- 
> 1.9.3
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Konrad Rzeszutek Wilk

2014-Sep-19 18:25 UTC

head link

[PATCH v5 0/3] virtio: Use the DMA API when appropriate

On Tue, Sep 16, 2014 at 10:22:25PM -0700, Andy Lutomirski
wrote:> This fixes virtio on Xen guests as well as on any other platform
> that uses virtio_pci on which physical addresses don't match bus
> addresses.
I can do 'Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk at
oracle.com>'
but not sure through whom this patch should go through?

> 
> This can be tested with:
> 
>     virtme-run --xen xen --kimg arch/x86/boot/bzImage --console
> 
> using virtme from here:
> 
>     https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git
> 
> Without these patches, the guest hangs forever.  With these patches,
> everything works.
> 
> This should be safe on all platforms that I'm aware of.  That
> doesn't mean that there isn't anything that I missed.
> 
> Applies to net-next.
> 
> Changes from v4:
>  - Rebased onto net-next.
>  - Dropped the sg end mark changes from virtio_net, as that issue
>    is solved differently in net-next.
>  - virtio_pci does not use the DMA API on powerpc, for reasons that
>    are explained in a detailed comment in virtio_ring.
> 
> Changes from v3:
>  - virtio_pci only asks virtio_ring to use the DMA API if
>    !PCI_DMA_BUS_IS_PHYS.
>  - Reduce tools/virtio breakage.  It's now merely as broken as before
>    instead of being even more broken.
>  - Drop the sg_next changes -- Rusty's version is better.
> 
> Changes from v2:
>  - Reordered patches.
>  - Fixed a virtio_net OOPS.
> 
> Changes from v1:
>  - Using the DMA API is optional now.  It would be nice to improve the
>    DMA API to the point that it could be used unconditionally, but s390
>    proves that we're not there yet.
>  - Includes patch 4, which fixes DMA debugging warnings from virtio_net.
> 
> Andy Lutomirski (3):
>   virtio_ring: Support DMA APIs if requested
>   virtio_pci: Use the DMA API for virtqueues when possible
>   virtio_net: Stop doing DMA from the stack
> 
>  drivers/lguest/lguest_device.c         |   3 +-
>  drivers/misc/mic/card/mic_virtio.c     |   2 +-
>  drivers/net/virtio_net.c               |  53 ++++++---
>  drivers/remoteproc/remoteproc_virtio.c |   4 +-
>  drivers/s390/kvm/kvm_virtio.c          |   2 +-
>  drivers/s390/kvm/virtio_ccw.c          |   4 +-
>  drivers/virtio/virtio_mmio.c           |   5 +-
>  drivers/virtio/virtio_pci.c            |  91 ++++++++++++++--
>  drivers/virtio/virtio_ring.c           | 194
+++++++++++++++++++++++++++------
>  include/linux/virtio_ring.h            |   1 +
>  tools/virtio/linux/dma-mapping.h       |  17 +++
>  tools/virtio/linux/virtio.h            |   1 +
>  tools/virtio/virtio_test.c             |   2 +-
>  tools/virtio/vringh_test.c             |   3 +-
>  14 files changed, 314 insertions(+), 68 deletions(-)
>  create mode 100644 tools/virtio/linux/dma-mapping.h
> 
> -- 
> 1.9.3
>

Reasonably Related Threads

Search for more apparently analagous threads

Virtualization - Sep 2014 - [PATCH v5 0/3] virtio: Use the DMA API when appropriate

[PATCH v5 0/3] virtio: Use the DMA API when appropriate

[PATCH v5 1/3] virtio_ring: Support DMA APIs if requested

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 3/3] virtio_net: Stop doing DMA from the stack

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible

[PATCH v5 0/3] virtio: Use the DMA API when appropriate

Reasonably Related Threads