This switches virtio to use the DMA API unconditionally. I'm sure it breaks things, but it seems to work on x86 using virtio-pci, with and without Xen, and using both the modern 1.0 variant and the legacy variant. Changes from v2: - Fix really embarrassing bug. This version actually works. Changes from v1: - Fix an endian conversion error causing a BUG to hit. - Fix a DMA ordering issue (swiotlb=force works now). - Minor cleanups. Andy Lutomirski (3): virtio_net: Stop doing DMA from the stack virtio_ring: Support DMA APIs virtio_pci: Use the DMA API drivers/net/virtio_net.c | 53 +++++++---- drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_pci_common.h | 3 +- drivers/virtio/virtio_pci_legacy.c | 19 +++- drivers/virtio/virtio_pci_modern.c | 34 +++++-- drivers/virtio/virtio_ring.c | 187 ++++++++++++++++++++++++++++++------- tools/virtio/linux/dma-mapping.h | 17 ++++ 7 files changed, 246 insertions(+), 69 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h -- 2.4.3
Andy Lutomirski
2015-Oct-28 06:38 UTC
[PATCH v3 1/3] virtio_net: Stop doing DMA from the stack
From: Andy Lutomirski <luto at amacapital.net> Once virtio starts using the DMA API, we won't be able to safely DMA from the stack. virtio-net does a couple of config DMA requests from small stack buffers -- switch to using dynamically-allocated memory. This should have no effect on any performance-critical code paths. Cc: "Michael S. Tsirkin" <mst at redhat.com> Cc: virtualization at lists.linux-foundation.org Reviewed-by: Joerg Roedel <jroedel at suse.de> Signed-off-by: Andy Lutomirski <luto at kernel.org> --- Hi Michael and DaveM- This is a prerequisite for the virtio DMA fixing project. It works as a standalone patch, though. Would it make sense to apply it to an appropriate networking tree now? (This is unchanged from v2.) drivers/net/virtio_net.c | 53 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d8838dedb7a4..4f10f8a58811 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, struct scatterlist *out) { struct scatterlist *sgs[4], hdr, stat; - struct virtio_net_ctrl_hdr ctrl; - virtio_net_ctrl_ack status = ~0; + + struct { + struct virtio_net_ctrl_hdr ctrl; + virtio_net_ctrl_ack status; + } *buf; + unsigned out_num = 0, tmp; + bool ret; /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); - ctrl.class = class; - ctrl.cmd = cmd; + buf = kmalloc(sizeof(*buf), GFP_ATOMIC); + if (!buf) + return false; + buf->status = ~0; + + buf->ctrl.class = class; + buf->ctrl.cmd = cmd; /* Add header */ - sg_init_one(&hdr, &ctrl, sizeof(ctrl)); + sg_init_one(&hdr, &buf->ctrl, sizeof(buf->ctrl)); sgs[out_num++] = &hdr; if (out) sgs[out_num++] = out; /* Add return status. */ - sg_init_one(&stat, &status, sizeof(status)); + sg_init_one(&stat, &buf->status, sizeof(buf->status)); sgs[out_num] = &stat; BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); - if (unlikely(!virtqueue_kick(vi->cvq))) - return status == VIRTIO_NET_OK; + if (unlikely(!virtqueue_kick(vi->cvq))) { + ret = (buf->status == VIRTIO_NET_OK); + goto out; + } /* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, !virtqueue_is_broken(vi->cvq)) cpu_relax(); - return status == VIRTIO_NET_OK; + ret = (buf->status == VIRTIO_NET_OK); + +out: + kfree(buf); + return ret; } static int virtnet_set_mac_address(struct net_device *dev, void *p) @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg[2]; - u8 promisc, allmulti; + u8 *cmdbyte; struct virtio_net_ctrl_mac *mac_data; struct netdev_hw_addr *ha; int uc_count; @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device *dev) if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) return; - promisc = ((dev->flags & IFF_PROMISC) != 0); - allmulti = ((dev->flags & IFF_ALLMULTI) != 0); + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC); + if (!cmdbyte) + return; - sg_init_one(sg, &promisc, sizeof(promisc)); + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte)); + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_PROMISC, sg)) dev_warn(&dev->dev, "Failed to %sable promisc mode.\n", - promisc ? "en" : "dis"); - - sg_init_one(sg, &allmulti, sizeof(allmulti)); + *cmdbyte ? "en" : "dis"); + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n", - allmulti ? "en" : "dis"); + *cmdbyte ? "en" : "dis"); + + kfree(cmdbyte); uc_count = netdev_uc_count(dev); mc_count = netdev_mc_count(dev); -- 2.4.3
virtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. With this patch, if enabled, virtfs survives kmemleak and CONFIG_DMA_API_DEBUG. Signed-off-by: Andy Lutomirski <luto at kernel.org> --- drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_ring.c | 187 +++++++++++++++++++++++++++++++-------- tools/virtio/linux/dma-mapping.h | 17 ++++ 3 files changed, 169 insertions(+), 37 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index cab9f3f63a38..77590320d44c 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -60,7 +60,7 @@ config VIRTIO_INPUT config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" - depends on HAS_IOMEM + depends on HAS_IOMEM && HAS_DMA select VIRTIO ---help--- This drivers provides support for memory mapped virtio diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 096b857e7b75..6962ea37ade0 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -24,6 +24,7 @@ #include <linux/module.h> #include <linux/hrtimer.h> #include <linux/kmemleak.h> +#include <linux/dma-mapping.h> #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ @@ -54,7 +55,14 @@ #define END_USE(vq) #endif -struct vring_virtqueue { +struct vring_desc_state +{ + void *data; /* Data for callback. */ + struct vring_desc *indir_desc; /* Indirect descriptor, if any. */ +}; + +struct vring_virtqueue +{ struct virtqueue vq; /* Actual memory layout for this queue */ @@ -92,12 +100,71 @@ struct vring_virtqueue { ktime_t last_add_time; #endif - /* Tokens for callbacks. */ - void *data[]; + /* Per-descriptor state. */ + struct vring_desc_state desc_state[]; }; #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +/* + * The DMA ops on various arches are rather gnarly right now, and + * making all of the arch DMA ops work on the vring device itself + * is a mess. For now, we use the parent device for DMA ops. + */ +struct device *vring_dma_dev(const struct vring_virtqueue *vq) +{ + return vq->vq.vdev->dev.parent; +} + +/* Map one sg entry. */ +static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, + struct scatterlist *sg, + enum dma_data_direction direction) +{ + /* + * We can't use dma_map_sg, because we don't use scatterlists in + * the way it expects (we don't guarantee that the scatterlist + * will exist for the lifetime of the mapping). + */ + return dma_map_page(vring_dma_dev(vq), + sg_page(sg), sg->offset, sg->length, + direction); +} + +static dma_addr_t vring_map_single(const struct vring_virtqueue *vq, + void *cpu_addr, size_t size, + enum dma_data_direction direction) +{ + return dma_map_single(vring_dma_dev(vq), + cpu_addr, size, direction); +} + +static void vring_unmap_one(const struct vring_virtqueue *vq, + struct vring_desc *desc) +{ + u16 flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + + if (flags & VRING_DESC_F_INDIRECT) { + dma_unmap_single(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } else { + dma_unmap_page(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } +} + +static int vring_mapping_error(const struct vring_virtqueue *vq, + dma_addr_t addr) +{ + return dma_mapping_error(vring_dma_dev(vq), addr); +} + static struct vring_desc *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg, gfp_t gfp) { @@ -131,7 +198,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, struct vring_virtqueue *vq = to_vvq(_vq); struct scatterlist *sg; struct vring_desc *desc; - unsigned int i, n, avail, descs_used, uninitialized_var(prev); + unsigned int i, n, avail, descs_used, uninitialized_var(prev), err_idx; int head; bool indirect; @@ -171,21 +238,15 @@ static inline int virtqueue_add(struct virtqueue *_vq, if (desc) { /* Use a single buffer which doesn't continue */ - vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT); - vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, virt_to_phys(desc)); - /* avoid kmemleak false positive (hidden by virt_to_phys) */ - kmemleak_ignore(desc); - vq->vring.desc[head].len = cpu_to_virtio32(_vq->vdev, total_sg * sizeof(struct vring_desc)); - + indirect = true; /* Set up rest to use this indirect table. */ i = 0; descs_used = 1; - indirect = true; } else { + indirect = false; desc = vq->vring.desc; i = head; descs_used = total_sg; - indirect = false; } if (vq->vq.num_free < descs_used) { @@ -200,14 +261,13 @@ static inline int virtqueue_add(struct virtqueue *_vq, return -ENOSPC; } - /* We're about to use some buffers from the free list. */ - vq->vq.num_free -= descs_used; - for (n = 0; n < out_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT); - desc[i].addr = cpu_to_virtio64(_vq->vdev, sg_phys(sg)); + desc[i].addr = cpu_to_virtio64(_vq->vdev, vring_map_one_sg(vq, sg, DMA_TO_DEVICE)); desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + if (vring_mapping_error(vq, desc[i].addr)) + goto unmap_release; prev = i; i = virtio16_to_cpu(_vq->vdev, desc[i].next); } @@ -215,8 +275,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, for (; n < (out_sgs + in_sgs); n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE); - desc[i].addr = cpu_to_virtio64(_vq->vdev, sg_phys(sg)); + desc[i].addr = cpu_to_virtio64(_vq->vdev, vring_map_one_sg(vq, sg, DMA_FROM_DEVICE)); desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + if (vring_mapping_error(vq, desc[i].addr)) + goto unmap_release; prev = i; i = virtio16_to_cpu(_vq->vdev, desc[i].next); } @@ -224,14 +286,34 @@ static inline int virtqueue_add(struct virtqueue *_vq, /* Last one doesn't continue. */ desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); + if (indirect) { + /* Now that the indirect table is filled in, map it. */ + dma_addr_t addr = vring_map_single( + vq, desc, total_sg * sizeof(struct vring_desc), + DMA_TO_DEVICE); + + if (vring_mapping_error(vq, addr)) + goto unmap_release; + + vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT); + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr); + + vq->vring.desc[head].len = cpu_to_virtio32(_vq->vdev, total_sg * sizeof(struct vring_desc)); + } + + /* We're using some buffers from the free list. */ + vq->vq.num_free -= descs_used; + /* Update free pointer */ if (indirect) vq->free_head = virtio16_to_cpu(_vq->vdev, vq->vring.desc[head].next); else vq->free_head = i; - /* Set token. */ - vq->data[head] = data; + /* Store token and indirect buffer state. */ + vq->desc_state[head].data = data; + if (indirect) + vq->desc_state[head].indir_desc = desc; /* Put entry in available array (but don't update avail->idx until they * do sync). */ @@ -253,6 +335,24 @@ static inline int virtqueue_add(struct virtqueue *_vq, virtqueue_kick(_vq); return 0; + +unmap_release: + err_idx = i; + i = head; + + for (n = 0; n < total_sg; n++) { + if (i == err_idx) + break; + vring_unmap_one(vq, &desc[i]); + i = vq->vring.desc[i].next; + } + + vq->vq.num_free += total_sg; + + if (indirect) + kfree(desc); + + return -EIO; } /** @@ -423,27 +523,43 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); static void detach_buf(struct vring_virtqueue *vq, unsigned int head) { - unsigned int i; + unsigned int i, j; + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); /* Clear data ptr. */ - vq->data[head] = NULL; + vq->desc_state[head].data = NULL; - /* Put back on free list: find end */ + /* Put back on free list: unmap first-level descriptors and find end */ i = head; - /* Free the indirect table */ - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)) - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, vq->vring.desc[i].addr))); - - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT)) { + while (vq->vring.desc[i].flags & nextflag) { + vring_unmap_one(vq, &vq->vring.desc[i]); i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); vq->vq.num_free++; } + vring_unmap_one(vq, &vq->vring.desc[i]); vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); vq->free_head = head; + /* Plus final descriptor */ vq->vq.num_free++; + + /* Free the indirect table, if any, now that it's unmapped. */ + if (vq->desc_state[head].indir_desc) { + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc; + u32 len = vq->vring.desc[head].len; + + BUG_ON(!(vq->vring.desc[head].flags & + cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT))); + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); + + for (j = 0; j < len / sizeof(struct vring_desc); j++) + vring_unmap_one(vq, &indir_desc[j]); + + kfree(vq->desc_state[head].indir_desc); + vq->desc_state[head].indir_desc = NULL; + } } static inline bool more_used(const struct vring_virtqueue *vq) @@ -498,13 +614,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) BAD_RING(vq, "id %u out of range\n", i); return NULL; } - if (unlikely(!vq->data[i])) { + if (unlikely(!vq->desc_state[i].data)) { BAD_RING(vq, "id %u is not a head!\n", i); return NULL; } /* detach_buf clears data, so grab it now. */ - ret = vq->data[i]; + ret = vq->desc_state[i].data; detach_buf(vq, i); vq->last_used_idx++; /* If we expect an interrupt for the next entry, tell host @@ -665,10 +781,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) START_USE(vq); for (i = 0; i < vq->vring.num; i++) { - if (!vq->data[i]) + if (!vq->desc_state[i].data) continue; /* detach_buf clears data, so grab it now. */ - buf = vq->data[i]; + buf = vq->desc_state[i].data; detach_buf(vq, i); vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - 1); END_USE(vq); @@ -721,7 +837,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, return NULL; } - vq = kmalloc(sizeof(*vq) + sizeof(void *)*num, GFP_KERNEL); + vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state), + GFP_KERNEL); if (!vq) return NULL; @@ -751,11 +868,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, /* Put everything in free lists. */ vq->free_head = 0; - for (i = 0; i < num-1; i++) { + for (i = 0; i < num-1; i++) vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1); - vq->data[i] = NULL; - } - vq->data[i] = NULL; + memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state)); return &vq->vq; } diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h new file mode 100644 index 000000000000..4f93af89ae16 --- /dev/null +++ b/tools/virtio/linux/dma-mapping.h @@ -0,0 +1,17 @@ +#ifndef _LINUX_DMA_MAPPING_H +#define _LINUX_DMA_MAPPING_H + +#ifdef CONFIG_HAS_DMA +# error Virtio userspace code does not support CONFIG_HAS_DMA +#endif + +#define PCI_DMA_BUS_IS_PHYS 1 + +enum dma_data_direction { + DMA_BIDIRECTIONAL = 0, + DMA_TO_DEVICE = 1, + DMA_FROM_DEVICE = 2, + DMA_NONE = 3, +}; + +#endif -- 2.4.3
This fixes virtio-pci on platforms and busses that have IOMMUs. This will break the experimental QEMU Q35 IOMMU support until QEMU is fixed. In exchange, it fixes physical virtio hardware as well as virtio-pci running under Xen. We should clean up the virtqueue API to do its own allocation and teach virtqueue_get_avail and virtqueue_get_used to return DMA addresses directly. Signed-off-by: Andy Lutomirski <luto at kernel.org> --- drivers/virtio/virtio_pci_common.h | 3 ++- drivers/virtio/virtio_pci_legacy.c | 19 +++++++++++++++---- drivers/virtio/virtio_pci_modern.c | 34 ++++++++++++++++++++++++---------- 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index b976d968e793..cd6196b513ad 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -38,8 +38,9 @@ struct virtio_pci_vq_info { /* the number of entries in the queue */ int num; - /* the virtual address of the ring queue */ + /* the ring queue */ void *queue; + dma_addr_t queue_dma_addr; /* bus address */ /* the list node for the virtqueues list */ struct list_head node; diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 48bc9797e530..b5293e5f2af4 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -135,12 +135,14 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, info->msix_vector = msix_vec; size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN)); - info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO); + info->queue = dma_zalloc_coherent(&vp_dev->pci_dev->dev, size, + &info->queue_dma_addr, + GFP_KERNEL); if (info->queue == NULL) return ERR_PTR(-ENOMEM); /* activate the queue */ - iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, + iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); /* create the vring */ @@ -169,7 +171,8 @@ out_assign: vring_del_virtqueue(vq); out_activate_queue: iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); - free_pages_exact(info->queue, size); + dma_free_coherent(&vp_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); return ERR_PTR(err); } @@ -194,7 +197,8 @@ static void del_vq(struct virtio_pci_vq_info *info) iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN)); - free_pages_exact(info->queue, size); + dma_free_coherent(&vp_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); } static const struct virtio_config_ops virtio_pci_config_ops = { @@ -227,6 +231,13 @@ int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev) return -ENODEV; } + rc = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64)); + if (rc) + rc = dma_set_mask_and_coherent(&pci_dev->dev, + DMA_BIT_MASK(32)); + if (rc) + dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); + rc = pci_request_region(pci_dev, 0, "virtio-pci-legacy"); if (rc) return rc; diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 8e5cf194cc0b..fbe0bd1c4881 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -293,14 +293,16 @@ static size_t vring_pci_size(u16 num) return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES)); } -static void *alloc_virtqueue_pages(int *num) +static void *alloc_virtqueue_pages(struct virtio_pci_device *vp_dev, + int *num, dma_addr_t *dma_addr) { void *pages; /* TODO: allocate each queue chunk individually */ for (; *num && vring_pci_size(*num) > PAGE_SIZE; *num /= 2) { - pages = alloc_pages_exact(vring_pci_size(*num), - GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN); + pages = dma_zalloc_coherent( + &vp_dev->pci_dev->dev, vring_pci_size(*num), + dma_addr, GFP_KERNEL|__GFP_NOWARN); if (pages) return pages; } @@ -309,7 +311,9 @@ static void *alloc_virtqueue_pages(int *num) return NULL; /* Try to get a single page. You are my only hope! */ - return alloc_pages_exact(vring_pci_size(*num), GFP_KERNEL|__GFP_ZERO); + return dma_zalloc_coherent( + &vp_dev->pci_dev->dev, vring_pci_size(*num), + dma_addr, GFP_KERNEL); } static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, @@ -346,7 +350,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, info->num = num; info->msix_vector = msix_vec; - info->queue = alloc_virtqueue_pages(&info->num); + info->queue = alloc_virtqueue_pages(vp_dev, &info->num, + &info->queue_dma_addr); if (info->queue == NULL) return ERR_PTR(-ENOMEM); @@ -361,11 +366,11 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, /* activate the queue */ vp_iowrite16(num, &cfg->queue_size); - vp_iowrite64_twopart(virt_to_phys(info->queue), + vp_iowrite64_twopart(info->queue_dma_addr, &cfg->queue_desc_lo, &cfg->queue_desc_hi); - vp_iowrite64_twopart(virt_to_phys(virtqueue_get_avail(vq)), + vp_iowrite64_twopart(info->queue_dma_addr + ((char *)virtqueue_get_avail(vq) - (char *)info->queue), &cfg->queue_avail_lo, &cfg->queue_avail_hi); - vp_iowrite64_twopart(virt_to_phys(virtqueue_get_used(vq)), + vp_iowrite64_twopart(info->queue_dma_addr + ((char *)virtqueue_get_used(vq) - (char *)info->queue), &cfg->queue_used_lo, &cfg->queue_used_hi); if (vp_dev->notify_base) { @@ -411,7 +416,8 @@ err_assign_vector: err_map_notify: vring_del_virtqueue(vq); err_new_queue: - free_pages_exact(info->queue, vring_pci_size(info->num)); + dma_free_coherent(&vp_dev->pci_dev->dev, vring_pci_size(info->num), + info->queue, info->queue_dma_addr); return ERR_PTR(err); } @@ -457,7 +463,8 @@ static void del_vq(struct virtio_pci_vq_info *info) vring_del_virtqueue(vq); - free_pages_exact(info->queue, vring_pci_size(info->num)); + dma_free_coherent(&vp_dev->pci_dev->dev, vring_pci_size(info->num), + info->queue, info->queue_dma_addr); } static const struct virtio_config_ops virtio_pci_config_nodev_ops = { @@ -641,6 +648,13 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev) return -EINVAL; } + err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64)); + if (err) + err = dma_set_mask_and_coherent(&pci_dev->dev, + DMA_BIT_MASK(32)); + if (err) + dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); + /* Device capability is only mandatory for devices that have * device-specific configuration. */ -- 2.4.3
On Tue, 2015-10-27 at 23:38 -0700, Andy Lutomirski wrote:> > Changes from v2: > - Fix really embarrassing bug. This version actually works.So embarrassing you didn't want to tell us what it was? ... --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -292,7 +292,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, vq, desc, total_sg * sizeof(struct vring_desc), DMA_TO_DEVICE); - if (vring_mapping_error(vq, vq->vring.desc[head].addr)) + if (vring_mapping_error(vq, addr)) goto unmap_release; vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT); That wasn't going to be the reason for Christian's failure, was it? -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20151028/e486074b/attachment.bin>
Michael S. Tsirkin
2015-Oct-28 07:08 UTC
[PATCH v3 1/3] virtio_net: Stop doing DMA from the stack
On Tue, Oct 27, 2015 at 11:38:58PM -0700, Andy Lutomirski wrote:> From: Andy Lutomirski <luto at amacapital.net> > > Once virtio starts using the DMA API, we won't be able to safely DMA > from the stack. virtio-net does a couple of config DMA requests > from small stack buffers -- switch to using dynamically-allocated > memory. > > This should have no effect on any performance-critical code paths. > > Cc: "Michael S. Tsirkin" <mst at redhat.com> > Cc: virtualization at lists.linux-foundation.org > Reviewed-by: Joerg Roedel <jroedel at suse.de> > Signed-off-by: Andy Lutomirski <luto at kernel.org>Same issues as v2 (I only saw v3 now). I've proposed an alternative patch.> --- > > Hi Michael and DaveM- > > This is a prerequisite for the virtio DMA fixing project. It works > as a standalone patch, though. Would it make sense to apply it to > an appropriate networking tree now? > > (This is unchanged from v2.) > > drivers/net/virtio_net.c | 53 ++++++++++++++++++++++++++++++++---------------- > 1 file changed, 36 insertions(+), 17 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index d8838dedb7a4..4f10f8a58811 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > struct scatterlist *out) > { > struct scatterlist *sgs[4], hdr, stat; > - struct virtio_net_ctrl_hdr ctrl; > - virtio_net_ctrl_ack status = ~0; > + > + struct { > + struct virtio_net_ctrl_hdr ctrl; > + virtio_net_ctrl_ack status; > + } *buf; > + > unsigned out_num = 0, tmp; > + bool ret; > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > - ctrl.class = class; > - ctrl.cmd = cmd; > + buf = kmalloc(sizeof(*buf), GFP_ATOMIC); > + if (!buf) > + return false; > + buf->status = ~0; > + > + buf->ctrl.class = class; > + buf->ctrl.cmd = cmd; > /* Add header */ > - sg_init_one(&hdr, &ctrl, sizeof(ctrl)); > + sg_init_one(&hdr, &buf->ctrl, sizeof(buf->ctrl)); > sgs[out_num++] = &hdr; > > if (out) > sgs[out_num++] = out; > > /* Add return status. */ > - sg_init_one(&stat, &status, sizeof(status)); > + sg_init_one(&stat, &buf->status, sizeof(buf->status)); > sgs[out_num] = &stat; > > BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); > virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); > > - if (unlikely(!virtqueue_kick(vi->cvq))) > - return status == VIRTIO_NET_OK; > + if (unlikely(!virtqueue_kick(vi->cvq))) { > + ret = (buf->status == VIRTIO_NET_OK); > + goto out; > + } > > /* Spin for a response, the kick causes an ioport write, trapping > * into the hypervisor, so the request should be handled immediately. > @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > !virtqueue_is_broken(vi->cvq)) > cpu_relax(); > > - return status == VIRTIO_NET_OK; > + ret = (buf->status == VIRTIO_NET_OK); > + > +out: > + kfree(buf); > + return ret; > } > > static int virtnet_set_mac_address(struct net_device *dev, void *p) > @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev) > { > struct virtnet_info *vi = netdev_priv(dev); > struct scatterlist sg[2]; > - u8 promisc, allmulti; > + u8 *cmdbyte; > struct virtio_net_ctrl_mac *mac_data; > struct netdev_hw_addr *ha; > int uc_count; > @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device *dev) > if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) > return; > > - promisc = ((dev->flags & IFF_PROMISC) != 0); > - allmulti = ((dev->flags & IFF_ALLMULTI) != 0); > + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC); > + if (!cmdbyte) > + return; > > - sg_init_one(sg, &promisc, sizeof(promisc)); > + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte)); > > + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0); > if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, > VIRTIO_NET_CTRL_RX_PROMISC, sg)) > dev_warn(&dev->dev, "Failed to %sable promisc mode.\n", > - promisc ? "en" : "dis"); > - > - sg_init_one(sg, &allmulti, sizeof(allmulti)); > + *cmdbyte ? "en" : "dis"); > > + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0); > if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, > VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) > dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n", > - allmulti ? "en" : "dis"); > + *cmdbyte ? "en" : "dis"); > + > + kfree(cmdbyte); > > uc_count = netdev_uc_count(dev); > mc_count = netdev_mc_count(dev); > -- > 2.4.3
On Tue, Oct 27, 2015 at 11:53 PM, David Woodhouse <dwmw2 at infradead.org> wrote:> On Tue, 2015-10-27 at 23:38 -0700, Andy Lutomirski wrote: >> >> Changes from v2: >> - Fix really embarrassing bug. This version actually works. > > So embarrassing you didn't want to tell us what it was? ...Shhh, it's a secret! I somehow managed to test-boot a different kernel than I thought I was booting.> > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -292,7 +292,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, > vq, desc, total_sg * sizeof(struct vring_desc), > DMA_TO_DEVICE); > > - if (vring_mapping_error(vq, vq->vring.desc[head].addr)) > + if (vring_mapping_error(vq, addr)) > goto unmap_release; > > vq->vring.desc[head].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT); > > That wasn't going to be the reason for Christian's failure, was it? >Not obviously, but it's possible. Now that I'm staring at it, I have some more big-endian issues, so there'll be a v4. I'll also play with Michael's thing. Expect a long delay, though -- my flight's about to leave. The readme notwithstanding, virtme (https://github.com/amluto/virtme) actually has s390x support, so I can try to debug when I get home. I'm not about to try doing this on a laptop :) --Andy
On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:> This switches virtio to use the DMA API unconditionally. I'm sure > it breaks things, but it seems to work on x86 using virtio-pci, with > and without Xen, and using both the modern 1.0 variant and the > legacy variant.I'm very glad to see work on this making progress. I suspect we'll have to find a way to make this optional though, and keep doing the non-DMA API thing with old devices. And I've been debating with myself whether a pci specific thing or a feature bit is preferable. Thoughts?> Changes from v2: > - Fix really embarrassing bug. This version actually works. > > Changes from v1: > - Fix an endian conversion error causing a BUG to hit. > - Fix a DMA ordering issue (swiotlb=force works now). > - Minor cleanups. > > Andy Lutomirski (3): > virtio_net: Stop doing DMA from the stack > virtio_ring: Support DMA APIs > virtio_pci: Use the DMA API > > drivers/net/virtio_net.c | 53 +++++++---- > drivers/virtio/Kconfig | 2 +- > drivers/virtio/virtio_pci_common.h | 3 +- > drivers/virtio/virtio_pci_legacy.c | 19 +++- > drivers/virtio/virtio_pci_modern.c | 34 +++++-- > drivers/virtio/virtio_ring.c | 187 ++++++++++++++++++++++++++++++------- > tools/virtio/linux/dma-mapping.h | 17 ++++ > 7 files changed, 246 insertions(+), 69 deletions(-) > create mode 100644 tools/virtio/linux/dma-mapping.h > > -- > 2.4.3
Am 28.10.2015 um 16:17 schrieb Michael S. Tsirkin:> On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote: >> This switches virtio to use the DMA API unconditionally. I'm sure >> it breaks things, but it seems to work on x86 using virtio-pci, with >> and without Xen, and using both the modern 1.0 variant and the >> legacy variant. > > I'm very glad to see work on this making progress. > > I suspect we'll have to find a way to make this optional though, and > keep doing the non-DMA API thing with old devices. And I've been > debating with myself whether a pci specific thing or a feature bit is > preferable. >We have discussed that at kernel summit. I will try to implement a dummy dma_ops for s390 that does 1:1 mapping and Ben will look into doing some quirk to handle "old" code in addition to also make it possible to mark devices as iommu bypass (IIRC, via device tree, Ben?) Christian> Thoughts? > >> Changes from v2: >> - Fix really embarrassing bug. This version actually works. >> >> Changes from v1: >> - Fix an endian conversion error causing a BUG to hit. >> - Fix a DMA ordering issue (swiotlb=force works now). >> - Minor cleanups. >> >> Andy Lutomirski (3): >> virtio_net: Stop doing DMA from the stack >> virtio_ring: Support DMA APIs >> virtio_pci: Use the DMA API >> >> drivers/net/virtio_net.c | 53 +++++++---- >> drivers/virtio/Kconfig | 2 +- >> drivers/virtio/virtio_pci_common.h | 3 +- >> drivers/virtio/virtio_pci_legacy.c | 19 +++- >> drivers/virtio/virtio_pci_modern.c | 34 +++++-- >> drivers/virtio/virtio_ring.c | 187 ++++++++++++++++++++++++++++++------- >> tools/virtio/linux/dma-mapping.h | 17 ++++ >> 7 files changed, 246 insertions(+), 69 deletions(-) >> create mode 100644 tools/virtio/linux/dma-mapping.h >> >> -- >> 2.4.3 >
On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote:> This switches virtio to use the DMA API unconditionally. I'm sure > it breaks things, but it seems to work on x86 using virtio-pci, with > and without Xen, and using both the modern 1.0 variant and the > legacy variant.So thinking hard about it, I don't see any real drawbacks to making this conditional on a new feature bit, that Xen can then set. As a bonus, host can distinguish between old and new guests using the feature bit, even though making driver *control* whether IOMMU is bypassed makes userspace drivers unsafe, so might not be a good idea. A tiny bit more code but not by much, and we clearly won't be breaking anything that's not already broken, and we will be able to drop the extra code later if we think it's a good idea. I'll run this by the virtio TC on OASIS next week so we can reserve a feature bit.> Changes from v2: > - Fix really embarrassing bug. This version actually works. > > Changes from v1: > - Fix an endian conversion error causing a BUG to hit. > - Fix a DMA ordering issue (swiotlb=force works now). > - Minor cleanups. > > Andy Lutomirski (3): > virtio_net: Stop doing DMA from the stack > virtio_ring: Support DMA APIs > virtio_pci: Use the DMA API > > drivers/net/virtio_net.c | 53 +++++++---- > drivers/virtio/Kconfig | 2 +- > drivers/virtio/virtio_pci_common.h | 3 +- > drivers/virtio/virtio_pci_legacy.c | 19 +++- > drivers/virtio/virtio_pci_modern.c | 34 +++++-- > drivers/virtio/virtio_ring.c | 187 ++++++++++++++++++++++++++++++------- > tools/virtio/linux/dma-mapping.h | 17 ++++ > 7 files changed, 246 insertions(+), 69 deletions(-) > create mode 100644 tools/virtio/linux/dma-mapping.h > > -- > 2.4.3
On Nov 19, 2015 5:45 AM, "Michael S. Tsirkin" <mst at redhat.com> wrote:> > On Tue, Oct 27, 2015 at 11:38:57PM -0700, Andy Lutomirski wrote: > > This switches virtio to use the DMA API unconditionally. I'm sure > > it breaks things, but it seems to work on x86 using virtio-pci, with > > and without Xen, and using both the modern 1.0 variant and the > > legacy variant. > > So thinking hard about it, I don't see any real drawbacks to making this > conditional on a new feature bit, that Xen can then set..Can you elaborate? If I run QEMU, hosting Xen, hosting Linux, and the virtio device is provided by QEMU, then how does Xen set the bit? Similarly, how would Xen set the bit for a real physical device? --Andy