Hi all, this series adds multiqueue support to the virtio-scsi driver, based on Jason Wang's work on virtio-net. It uses a simple queue steering algorithm that expects one queue per CPU. LUNs in the same target always use the same queue (so that commands are not reordered); queue switching occurs when the request being queued is the only one for the target. Also based on Jason's patches, the virtqueue affinity is set so that each CPU is associated to one virtqueue. I tested the patches with fio, using up to 32 virtio-scsi disks backed by tmpfs on the host, and 1 LUN per target. FIO configuration ----------------- [global] rw=read bsrange=4k-64k ioengine=libaio direct=1 iodepth=4 loops=20 overall bandwidth (MB/s) ----------------- # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs 1 540 626 599 2 795 965 925 4 997 1376 1500 8 1136 2130 2060 16 1440 2269 2474 24 1408 2179 2436 32 1515 1978 2319 (These numbers for single-queue are with 4 VCPUs, but the impact of adding more VCPUs is very limited). avg bandwidth per LUN (MB/s) --------------------- # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs 1 540 626 599 2 397 482 462 4 249 344 375 8 142 266 257 16 90 141 154 24 58 90 101 32 47 61 72 Testing this may require an irqbalance daemon that is built from git, due to http://code.google.com/p/irqbalance/issues/detail?id=37. Alternatively you can just set the affinity manually in /proc. Rusty, can you please give your Acked-by to the first two patches? Jason Wang (2): virtio-ring: move queue_index to vring_virtqueue virtio: introduce an API to set affinity for a virtqueue Paolo Bonzini (3): virtio-scsi: allocate target pointers in a separate memory block virtio-scsi: pass struct virtio_scsi to virtqueue completion function virtio-scsi: introduce multiqueue support drivers/lguest/lguest_device.c | 1 + drivers/remoteproc/remoteproc_virtio.c | 1 + drivers/s390/kvm/kvm_virtio.c | 1 + drivers/scsi/virtio_scsi.c | 200 ++++++++++++++++++++++++-------- drivers/virtio/virtio_mmio.c | 11 +- drivers/virtio/virtio_pci.c | 58 ++++++++- drivers/virtio/virtio_ring.c | 17 +++ include/linux/virtio.h | 4 + include/linux/virtio_config.h | 21 ++++ 9 files changed, 253 insertions(+), 61 deletions(-)
Paolo Bonzini
2012-Aug-28 11:54 UTC
[PATCH 1/5] virtio-ring: move queue_index to vring_virtqueue
From: Jason Wang <jasowang at redhat.com> Instead of storing the queue index in transport-specific virtio structs, this patch moves them to vring_virtqueue and introduces an helper to get the value. This lets drivers simplify their management and tracing of virtqueues. Signed-off-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Paolo Bonzini <pbonzini at redhat.com> --- I fixed the problems in Jason's v5 (posted at http://permalink.gmane.org/gmane.linux.kernel.virtualization/15910) and switched from virtio_set_queue_index to a new argument of vring_new_virtqueue. This breaks at compile-time any virtio transport that is not updated. drivers/lguest/lguest_device.c | 2 +- drivers/remoteproc/remoteproc_virtio.c | 2 +- drivers/s390/kvm/kvm_virtio.c | 2 +- drivers/virtio/virtio_mmio.c | 12 ++++-------- drivers/virtio/virtio_pci.c | 13 +++++-------- drivers/virtio/virtio_ring.c | 14 +++++++++++++- include/linux/virtio.h | 2 ++ include/linux/virtio_ring.h | 3 ++- 8 files changed, 29 insertions(+), 21 deletions(-) diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c index 9e8388e..ccb7dfb 100644 --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -296,7 +296,7 @@ static struct virtqueue *lg_find_vq(struct virtio_device *vdev, * to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu * barriers. */ - vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN, vdev, + vq = vring_new_virtqueue(index, lvq->config.num, LGUEST_VRING_ALIGN, vdev, true, lvq->pages, lg_notify, callback, name); if (!vq) { err = -ENOMEM; diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index 3541b44..343c194 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -103,7 +103,7 @@ static struct virtqueue *rp_find_vq(struct virtio_device *vdev, * Create the new vq, and tell virtio we're not interested in * the 'weak' smp barriers, since we're talking with a real device. */ - vq = vring_new_virtqueue(len, rvring->align, vdev, false, addr, + vq = vring_new_virtqueue(id, len, rvring->align, vdev, false, addr, rproc_virtio_notify, callback, name); if (!vq) { dev_err(dev, "vring_new_virtqueue %s failed\n", name); diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 47cccd5..5565af2 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -198,7 +198,7 @@ static struct virtqueue *kvm_find_vq(struct virtio_device *vdev, if (err) goto out; - vq = vring_new_virtqueue(config->num, KVM_S390_VIRTIO_RING_ALIGN, + vq = vring_new_virtqueue(index, config->num, KVM_S390_VIRTIO_RING_ALIGN, vdev, true, (void *) config->address, kvm_notify, callback, name); if (!vq) { diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 453db0c..008bf58 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -131,9 +131,6 @@ struct virtio_mmio_vq_info { /* the number of entries in the queue */ unsigned int num; - /* the index of the queue */ - int queue_index; - /* the virtual address of the ring queue */ void *queue; @@ -225,11 +222,10 @@ static void vm_reset(struct virtio_device *vdev) static void vm_notify(struct virtqueue *vq) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev); - struct virtio_mmio_vq_info *info = vq->priv; /* We write the queue's selector into the notification register to * signal the other end */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); } /* Notify all virtqueues on an interrupt. */ @@ -270,6 +266,7 @@ static void vm_del_vq(struct virtqueue *vq) struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev); struct virtio_mmio_vq_info *info = vq->priv; unsigned long flags, size; + unsigned int index = virtqueue_get_queue_index(vq); spin_lock_irqsave(&vm_dev->lock, flags); list_del(&info->node); @@ -278,7 +275,7 @@ static void vm_del_vq(struct virtqueue *vq) vring_del_virtqueue(vq); /* Select and deactivate the queue */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); + writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN)); @@ -324,7 +321,6 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, err = -ENOMEM; goto error_kmalloc; } - info->queue_index = index; /* Allocate pages for the queue - start with a queue as big as * possible (limited by maximum size allowed by device), drop down @@ -356,7 +352,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); /* Create the vring */ - vq = vring_new_virtqueue(info->num, VIRTIO_MMIO_VRING_ALIGN, vdev, + vq = vring_new_virtqueue(index, info->num, VIRTIO_MMIO_VRING_ALIGN, vdev, true, info->queue, vm_notify, callback, name); if (!vq) { err = -ENOMEM; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 2e03d41..d902464 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -79,9 +79,6 @@ struct virtio_pci_vq_info /* the number of entries in the queue */ int num; - /* the index of the queue */ - int queue_index; - /* the virtual address of the ring queue */ void *queue; @@ -202,11 +199,11 @@ static void vp_reset(struct virtio_device *vdev) static void vp_notify(struct virtqueue *vq) { struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); - struct virtio_pci_vq_info *info = vq->priv; /* we write the queue's selector into the notification register to * signal the other end */ - iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); + iowrite16(virtqueue_get_queue_index(vq), + vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); } /* Handle a configuration change: Tell driver if it wants to know. */ @@ -402,7 +399,6 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, if (!info) return ERR_PTR(-ENOMEM); - info->queue_index = index; info->num = num; info->msix_vector = msix_vec; @@ -418,7 +414,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); /* create the vring */ - vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN, vdev, + vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev, true, info->queue, vp_notify, callback, name); if (!vq) { err = -ENOMEM; @@ -467,7 +463,8 @@ static void vp_del_vq(struct virtqueue *vq) list_del(&info->node); spin_unlock_irqrestore(&vp_dev->lock, flags); - iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); + iowrite16(virtqueue_get_queue_index(vq), + vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); if (vp_dev->msix_enabled) { iowrite16(VIRTIO_MSI_NO_VECTOR, diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5aa43c3..e639584 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -106,6 +106,9 @@ struct vring_virtqueue /* How to notify other side. FIXME: commonalize hcalls! */ void (*notify)(struct virtqueue *vq); + /* Index of the queue */ + int queue_index; + #ifdef DEBUG /* They're supposed to lock for us. */ unsigned int in_use; @@ -171,6 +174,13 @@ static int vring_add_indirect(struct vring_virtqueue *vq, return head; } +int virtqueue_get_queue_index(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + return vq->queue_index; +} +EXPORT_SYMBOL_GPL(virtqueue_get_queue_index); + /** * virtqueue_add_buf - expose buffer to other end * @vq: the struct virtqueue we're talking about. @@ -616,7 +626,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq) } EXPORT_SYMBOL_GPL(vring_interrupt); -struct virtqueue *vring_new_virtqueue(unsigned int num, +struct virtqueue *vring_new_virtqueue(unsigned int index, + unsigned int num, unsigned int vring_align, struct virtio_device *vdev, bool weak_barriers, @@ -647,6 +658,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num, vq->broken = false; vq->last_used_idx = 0; vq->num_added = 0; + vq->queue_index = index; list_add_tail(&vq->vq.list, &vdev->vqs); #ifdef DEBUG vq->in_use = false; diff --git a/include/linux/virtio.h b/include/linux/virtio.h index a1ba8bb..533b115 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -50,6 +50,8 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq); unsigned int virtqueue_get_vring_size(struct virtqueue *vq); +int virtqueue_get_queue_index(struct virtqueue *vq); + /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index e338730..c2d793a 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -165,7 +165,8 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old) struct virtio_device; struct virtqueue; -struct virtqueue *vring_new_virtqueue(unsigned int num, +struct virtqueue *vring_new_virtqueue(unsigned int index, + unsigned int num, unsigned int vring_align, struct virtio_device *vdev, bool weak_barriers, -- 1.7.1
Paolo Bonzini
2012-Aug-28 11:54 UTC
[PATCH 2/5] virtio: introduce an API to set affinity for a virtqueue
From: Jason Wang <jasowang at redhat.com> Sometimes, virtio device need to configure irq affinity hint to maximize the performance. Instead of just exposing the irq of a virtqueue, this patch introduce an API to set the affinity for a virtqueue. The api is best-effort, the affinity hint may not be set as expected due to platform support, irq sharing or irq type. Currently, only pci method were implemented and we set the affinity according to: - if device uses INTX, we just ignore the request - if device has per vq vector, we force the affinity hint - if the virtqueues share MSI, make the affinity OR over all affinities requested Signed-off-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Paolo Bonzini <pbonzini at redhat.com> --- drivers/virtio/virtio_pci.c | 46 +++++++++++++++++++++++++++++++++++++++++ include/linux/virtio_config.h | 21 ++++++++++++++++++ 2 files changed, 67 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index adb24f2..2ff0451 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -48,6 +48,7 @@ struct virtio_pci_device int msix_enabled; int intx_enabled; struct msix_entry *msix_entries; + cpumask_var_t *msix_affinity_masks; /* Name strings for interrupts. This size should be enough, * and I'm too lazy to allocate each name separately. */ char (*msix_names)[256]; @@ -276,6 +277,10 @@ static void vp_free_vectors(struct virtio_device *vdev) for (i = 0; i < vp_dev->msix_used_vectors; ++i) free_irq(vp_dev->msix_entries[i].vector, vp_dev); + for (i = 0; i < vp_dev->msix_vectors; i++) + if (vp_dev->msix_affinity_masks[i]) + free_cpumask_var(vp_dev->msix_affinity_masks[i]); + if (vp_dev->msix_enabled) { /* Disable the vector used for configuration */ iowrite16(VIRTIO_MSI_NO_VECTOR, @@ -293,6 +298,8 @@ static void vp_free_vectors(struct virtio_device *vdev) vp_dev->msix_names = NULL; kfree(vp_dev->msix_entries); vp_dev->msix_entries = NULL; + kfree(vp_dev->msix_affinity_masks); + vp_dev->msix_affinity_masks = NULL; } static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, @@ -311,6 +318,15 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, GFP_KERNEL); if (!vp_dev->msix_names) goto error; + vp_dev->msix_affinity_masks + = kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks, + GFP_KERNEL); + if (!vp_dev->msix_affinity_masks) + goto error; + for (i = 0; i < nvectors; ++i) + if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i], + GFP_KERNEL)) + goto error; for (i = 0; i < nvectors; ++i) vp_dev->msix_entries[i].entry = i; @@ -607,6 +623,35 @@ static const char *vp_bus_name(struct virtio_device *vdev) return pci_name(vp_dev->pci_dev); } +/* Setup the affinity for a virtqueue: + * - force the affinity for per vq vector + * - OR over all affinities for shared MSI + * - ignore the affinity request if we're using INTX + */ +static int vp_set_vq_affinity(struct virtqueue *vq, int cpu) +{ + struct virtio_device *vdev = vq->vdev; + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info = vq->priv; + struct cpumask *mask; + unsigned int irq; + + if (!vq->callback) + return -EINVAL; + + if (vp_dev->msix_enabled) { + mask = vp_dev->msix_affinity_masks[info->msix_vector]; + irq = vp_dev->msix_entries[info->msix_vector].vector; + if (cpu == -1) + irq_set_affinity_hint(irq, NULL); + else { + cpumask_set_cpu(cpu, mask); + irq_set_affinity_hint(irq, mask); + } + } + return 0; +} + static struct virtio_config_ops virtio_pci_config_ops = { .get = vp_get, .set = vp_set, @@ -618,6 +663,7 @@ static struct virtio_config_ops virtio_pci_config_ops = { .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, + .set_vq_affinity = vp_set_vq_affinity, }; static void virtio_pci_release_dev(struct device *_d) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index fc457f4..2c4a989 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -98,6 +98,7 @@ * vdev: the virtio_device * This returns a pointer to the bus name a la pci_name from which * the caller can then copy. + * @set_vq_affinity: set the affinity for a virtqueue. */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -116,6 +117,7 @@ struct virtio_config_ops { u32 (*get_features)(struct virtio_device *vdev); void (*finalize_features)(struct virtio_device *vdev); const char *(*bus_name)(struct virtio_device *vdev); + int (*set_vq_affinity)(struct virtqueue *vq, int cpu); }; /* If driver didn't advertise the feature, it will never appear. */ @@ -190,5 +192,24 @@ const char *virtio_bus_name(struct virtio_device *vdev) return vdev->config->bus_name(vdev); } +/** + * virtqueue_set_affinity - setting affinity for a virtqueue + * @vq: the virtqueue + * @cpu: the cpu no. + * + * Pay attention the function are best-effort: the affinity hint may not be set + * due to config support, irq type and sharing. + * + */ +static inline +int virtqueue_set_affinity(struct virtqueue *vq, int cpu) +{ + struct virtio_device *vdev = vq->vdev; + if (vdev->config->set_vq_affinity) + return vdev->config->set_vq_affinity(vq, cpu); + return 0; +} + + #endif /* __KERNEL__ */ #endif /* _LINUX_VIRTIO_CONFIG_H */ -- 1.7.1
Paolo Bonzini
2012-Aug-28 11:54 UTC
[PATCH 3/5] virtio-scsi: allocate target pointers in a separate memory block
We will place the request virtqueues in the flexible array member. Refining the virtqueue API would let us drop the sglist copy, at which point the pointer-to-array-of-pointers can become a simple pointer-to-array. It would both simplify the allocation and remove a dereference in several hot paths. Signed-off-by: Paolo Bonzini <pbonzini at redhat.com> --- drivers/scsi/virtio_scsi.c | 23 +++++++++++++++-------- 1 files changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 595af1a..62fec04 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -77,7 +77,7 @@ struct virtio_scsi { /* Get some buffers ready for event vq */ struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN]; - struct virtio_scsi_target_state *tgt[]; + struct virtio_scsi_target_state **tgt; }; static struct kmem_cache *virtscsi_cmd_cache; @@ -615,10 +615,13 @@ static void virtscsi_remove_vqs(struct virtio_device *vdev) /* Stop all the virtqueues. */ vdev->config->reset(vdev); - num_targets = sh->max_id; - for (i = 0; i < num_targets; i++) { - kfree(vscsi->tgt[i]); - vscsi->tgt[i] = NULL; + if (vscsi->tgt) { + num_targets = sh->max_id; + for (i = 0; i < num_targets; i++) { + kfree(vscsi->tgt[i]); + vscsi->tgt[i] = NULL; + } + kfree(vscsi->tgt); } vdev->config->del_vqs(vdev); @@ -660,6 +663,12 @@ static int virtscsi_init(struct virtio_device *vdev, /* We need to know how many segments before we allocate. */ sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1; + vscsi->tgt = kmalloc(num_targets * + sizeof(struct virtio_scsi_target_state *), GFP_KERNEL); + if (!vscsi->tgt) { + err = -ENOMEM; + goto out; + } for (i = 0; i < num_targets; i++) { vscsi->tgt[i] = virtscsi_alloc_tgt(vdev, sg_elems); if (!vscsi->tgt[i]) { @@ -685,9 +694,7 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) /* Allocate memory and link the structs together. */ num_targets = virtscsi_config_get(vdev, max_target) + 1; - shost = scsi_host_alloc(&virtscsi_host_template, - sizeof(*vscsi) - + num_targets * sizeof(struct virtio_scsi_target_state)); + shost = scsi_host_alloc(&virtscsi_host_template, sizeof(*vscsi)); if (!shost) return -ENOMEM; -- 1.7.1
Paolo Bonzini
2012-Aug-28 11:54 UTC
[PATCH 4/5] virtio-scsi: pass struct virtio_scsi to virtqueue completion function
This will be needed soon in order to retrieve the per-target struct. Signed-off-by: Paolo Bonzini <pbonzini at redhat.com> --- drivers/scsi/virtio_scsi.c | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 62fec04..6414ea0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -107,7 +107,7 @@ static void virtscsi_compute_resid(struct scsi_cmnd *sc, u32 resid) * * Called with vq_lock held. */ -static void virtscsi_complete_cmd(void *buf) +static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; struct scsi_cmnd *sc = cmd->sc; @@ -168,7 +168,8 @@ static void virtscsi_complete_cmd(void *buf) sc->scsi_done(sc); } -static void virtscsi_vq_done(struct virtqueue *vq, void (*fn)(void *buf)) +static void virtscsi_vq_done(struct virtio_scsi *vscsi, struct virtqueue *vq, + void (*fn)(struct virtio_scsi *vscsi, void *buf)) { void *buf; unsigned int len; @@ -176,7 +177,7 @@ static void virtscsi_vq_done(struct virtqueue *vq, void (*fn)(void *buf)) do { virtqueue_disable_cb(vq); while ((buf = virtqueue_get_buf(vq, &len)) != NULL) - fn(buf); + fn(vscsi, buf); } while (!virtqueue_enable_cb(vq)); } @@ -187,11 +188,11 @@ static void virtscsi_req_done(struct virtqueue *vq) unsigned long flags; spin_lock_irqsave(&vscsi->req_vq.vq_lock, flags); - virtscsi_vq_done(vq, virtscsi_complete_cmd); + virtscsi_vq_done(vscsi, vq, virtscsi_complete_cmd); spin_unlock_irqrestore(&vscsi->req_vq.vq_lock, flags); }; -static void virtscsi_complete_free(void *buf) +static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; @@ -208,7 +209,7 @@ static void virtscsi_ctrl_done(struct virtqueue *vq) unsigned long flags; spin_lock_irqsave(&vscsi->ctrl_vq.vq_lock, flags); - virtscsi_vq_done(vq, virtscsi_complete_free); + virtscsi_vq_done(vscsi, vq, virtscsi_complete_free); spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags); }; @@ -331,7 +332,7 @@ static void virtscsi_handle_event(struct work_struct *work) virtscsi_kick_event(vscsi, event_node); } -static void virtscsi_complete_event(void *buf) +static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_event_node *event_node = buf; @@ -346,7 +347,7 @@ static void virtscsi_event_done(struct virtqueue *vq) unsigned long flags; spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags); - virtscsi_vq_done(vq, virtscsi_complete_event); + virtscsi_vq_done(vscsi, vq, virtscsi_complete_event); spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags); }; -- 1.7.1
This patch adds queue steering to virtio-scsi. When a target is sent multiple requests, we always drive them to the same queue so that FIFO processing order is kept. However, if a target was idle, we can choose a queue arbitrarily. In this case the queue is chosen according to the current VCPU, so the driver expects the number of request queues to be equal to the number of VCPUs. This makes it easy and fast to select the queue, and also lets the driver optimize the IRQ affinity for the virtqueues (each virtqueue's affinity is set to the CPU that "owns" the queue). Signed-off-by: Paolo Bonzini <pbonzini at redhat.com> --- drivers/scsi/virtio_scsi.c | 162 +++++++++++++++++++++++++++++++++++--------- 1 files changed, 130 insertions(+), 32 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 6414ea0..0c4b096 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -26,6 +26,7 @@ #define VIRTIO_SCSI_MEMPOOL_SZ 64 #define VIRTIO_SCSI_EVENT_LEN 8 +#define VIRTIO_SCSI_VQ_BASE 2 /* Command queue element */ struct virtio_scsi_cmd { @@ -59,9 +60,13 @@ struct virtio_scsi_vq { /* Per-target queue state */ struct virtio_scsi_target_state { - /* Protects sg. Lock hierarchy is tgt_lock -> vq_lock. */ + /* Protects sg, req_vq. Lock hierarchy is tgt_lock -> vq_lock. */ spinlock_t tgt_lock; + struct virtio_scsi_vq *req_vq; + + atomic_t reqs; + /* For sglist construction when adding commands to the virtqueue. */ struct scatterlist sg[]; }; @@ -70,14 +75,15 @@ struct virtio_scsi_target_state { struct virtio_scsi { struct virtio_device *vdev; - struct virtio_scsi_vq ctrl_vq; - struct virtio_scsi_vq event_vq; - struct virtio_scsi_vq req_vq; - /* Get some buffers ready for event vq */ struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN]; + u32 num_queues; struct virtio_scsi_target_state **tgt; + + struct virtio_scsi_vq ctrl_vq; + struct virtio_scsi_vq event_vq; + struct virtio_scsi_vq req_vqs[]; }; static struct kmem_cache *virtscsi_cmd_cache; @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf) struct virtio_scsi_cmd *cmd = buf; struct scsi_cmnd *sc = cmd->sc; struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd; + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; + + atomic_dec(&tgt->reqs); dev_dbg(&sc->device->sdev_gendev, "cmd %p response %u status %#02x sense_len %u\n", @@ -185,11 +194,13 @@ static void virtscsi_req_done(struct virtqueue *vq) { struct Scsi_Host *sh = virtio_scsi_host(vq->vdev); struct virtio_scsi *vscsi = shost_priv(sh); + int index = virtqueue_get_queue_index(vq) - VIRTIO_SCSI_VQ_BASE; + struct virtio_scsi_vq *req_vq = &vscsi->req_vqs[index]; unsigned long flags; - spin_lock_irqsave(&vscsi->req_vq.vq_lock, flags); + spin_lock_irqsave(&req_vq->vq_lock, flags); virtscsi_vq_done(vscsi, vq, virtscsi_complete_cmd); - spin_unlock_irqrestore(&vscsi->req_vq.vq_lock, flags); + spin_unlock_irqrestore(&req_vq->vq_lock, flags); }; static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) @@ -429,10 +440,10 @@ static int virtscsi_kick_cmd(struct virtio_scsi_target_state *tgt, return ret; } -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) +static int virtscsi_queuecommand(struct virtio_scsi *vscsi, + struct virtio_scsi_target_state *tgt, + struct scsi_cmnd *sc) { - struct virtio_scsi *vscsi = shost_priv(sh); - struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; struct virtio_scsi_cmd *cmd; int ret; @@ -466,7 +477,7 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE); memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len); - if (virtscsi_kick_cmd(tgt, &vscsi->req_vq, cmd, + if (virtscsi_kick_cmd(tgt, tgt->req_vq, cmd, sizeof cmd->req.cmd, sizeof cmd->resp.cmd, GFP_ATOMIC) >= 0) ret = 0; @@ -475,6 +486,38 @@ out: return ret; } +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, + struct scsi_cmnd *sc) +{ + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; + + atomic_inc(&tgt->reqs); + return virtscsi_queuecommand(vscsi, tgt, sc); +} + +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, + struct scsi_cmnd *sc) +{ + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; + unsigned long flags; + u32 queue_num; + + /* Using an atomic_t for tgt->reqs lets the virtqueue handler + * decrement it without taking the spinlock. + */ + spin_lock_irqsave(&tgt->tgt_lock, flags); + if (atomic_inc_return(&tgt->reqs) == 1) { + queue_num = smp_processor_id(); + while (unlikely(queue_num >= vscsi->num_queues)) + queue_num -= vscsi->num_queues; + tgt->req_vq = &vscsi->req_vqs[queue_num]; + } + spin_unlock_irqrestore(&tgt->tgt_lock, flags); + return virtscsi_queuecommand(vscsi, tgt, sc); +} + static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) { DECLARE_COMPLETION_ONSTACK(comp); @@ -544,12 +585,26 @@ static int virtscsi_abort(struct scsi_cmnd *sc) return virtscsi_tmf(vscsi, cmd); } -static struct scsi_host_template virtscsi_host_template = { +static struct scsi_host_template virtscsi_host_template_single = { .module = THIS_MODULE, .name = "Virtio SCSI HBA", .proc_name = "virtio_scsi", - .queuecommand = virtscsi_queuecommand, .this_id = -1, + .queuecommand = virtscsi_queuecommand_single, + .eh_abort_handler = virtscsi_abort, + .eh_device_reset_handler = virtscsi_device_reset, + + .can_queue = 1024, + .dma_boundary = UINT_MAX, + .use_clustering = ENABLE_CLUSTERING, +}; + +static struct scsi_host_template virtscsi_host_template_multi = { + .module = THIS_MODULE, + .name = "Virtio SCSI HBA", + .proc_name = "virtio_scsi", + .this_id = -1, + .queuecommand = virtscsi_queuecommand_multi, .eh_abort_handler = virtscsi_abort, .eh_device_reset_handler = virtscsi_device_reset, @@ -575,15 +630,19 @@ static struct scsi_host_template virtscsi_host_template = { &__val, sizeof(__val)); \ }) + static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, - struct virtqueue *vq) + struct virtqueue *vq, bool affinity) { spin_lock_init(&virtscsi_vq->vq_lock); virtscsi_vq->vq = vq; + if (affinity) + virtqueue_set_affinity(vq, virtqueue_get_queue_index(vq) - + VIRTIO_SCSI_VQ_BASE); } static struct virtio_scsi_target_state *virtscsi_alloc_tgt( - struct virtio_device *vdev, int sg_elems) + struct virtio_scsi *vscsi, u32 sg_elems) { struct virtio_scsi_target_state *tgt; gfp_t gfp_mask = GFP_KERNEL; @@ -597,6 +656,13 @@ static struct virtio_scsi_target_state *virtscsi_alloc_tgt( spin_lock_init(&tgt->tgt_lock); sg_init_table(tgt->sg, sg_elems + 2); + atomic_set(&tgt->reqs, 0); + + /* + * The default is unused for multiqueue, but with a single queue + * or target we use it in virtscsi_queuecommand. + */ + tgt->req_vq = &vscsi->req_vqs[0]; return tgt; } @@ -632,28 +698,41 @@ static int virtscsi_init(struct virtio_device *vdev, struct virtio_scsi *vscsi, int num_targets) { int err; - struct virtqueue *vqs[3]; u32 i, sg_elems; + u32 num_vqs; + vq_callback_t **callbacks; + const char **names; + struct virtqueue **vqs; - vq_callback_t *callbacks[] = { - virtscsi_ctrl_done, - virtscsi_event_done, - virtscsi_req_done - }; - const char *names[] = { - "control", - "event", - "request" - }; + num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE; + vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL); + callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL); + names = kmalloc(num_vqs * sizeof(char *), GFP_KERNEL); + + if (!callbacks || !vqs || !names) { + err = -ENOMEM; + goto out; + } + + callbacks[0] = virtscsi_ctrl_done; + callbacks[1] = virtscsi_event_done; + names[0] = "control"; + names[1] = "event"; + for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++) { + callbacks[i] = virtscsi_req_done; + names[i] = "request"; + } /* Discover virtqueues and write information to configuration. */ - err = vdev->config->find_vqs(vdev, 3, vqs, callbacks, names); + err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names); if (err) return err; - virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0]); - virtscsi_init_vq(&vscsi->event_vq, vqs[1]); - virtscsi_init_vq(&vscsi->req_vq, vqs[2]); + virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0], false); + virtscsi_init_vq(&vscsi->event_vq, vqs[1], false); + for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++) + virtscsi_init_vq(&vscsi->req_vqs[i - VIRTIO_SCSI_VQ_BASE], + vqs[i], vscsi->num_queues > 1); virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE); virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE); @@ -671,7 +750,7 @@ static int virtscsi_init(struct virtio_device *vdev, goto out; } for (i = 0; i < num_targets; i++) { - vscsi->tgt[i] = virtscsi_alloc_tgt(vdev, sg_elems); + vscsi->tgt[i] = virtscsi_alloc_tgt(vscsi, sg_elems); if (!vscsi->tgt[i]) { err = -ENOMEM; goto out; @@ -680,6 +759,9 @@ static int virtscsi_init(struct virtio_device *vdev, err = 0; out: + kfree(names); + kfree(callbacks); + kfree(vqs); if (err) virtscsi_remove_vqs(vdev); return err; @@ -692,11 +774,26 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) int err; u32 sg_elems, num_targets; u32 cmd_per_lun; + u32 num_queues; + struct scsi_host_template *hostt; + + /* We need to know how many queues before we allocate. */ + num_queues = virtscsi_config_get(vdev, num_queues) ?: 1; /* Allocate memory and link the structs together. */ num_targets = virtscsi_config_get(vdev, max_target) + 1; - shost = scsi_host_alloc(&virtscsi_host_template, sizeof(*vscsi)); + /* Multiqueue is not beneficial with a single target. */ + if (num_targets == 1) + num_queues = 1; + + if (num_queues == 1) + hostt = &virtscsi_host_template_single; + else + hostt = &virtscsi_host_template_multi; + + shost = scsi_host_alloc(hostt, + sizeof(*vscsi) + sizeof(vscsi->req_vqs[0]) * num_queues); if (!shost) return -ENOMEM; @@ -704,6 +801,7 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) shost->sg_tablesize = sg_elems; vscsi = shost_priv(shost); vscsi->vdev = vdev; + vscsi->num_queues = num_queues; vdev->priv = shost; err = virtscsi_init(vdev, vscsi, num_targets); -- 1.7.1
On Tue, Aug 28, 2012 at 01:54:12PM +0200, Paolo Bonzini wrote:> this series adds multiqueue support to the virtio-scsi driver, based > on Jason Wang's work on virtio-net. It uses a simple queue steering > algorithm that expects one queue per CPU. LUNs in the same target always > use the same queue (so that commands are not reordered); queue switching > occurs when the request being queued is the only one for the target. > Also based on Jason's patches, the virtqueue affinity is set so that > each CPU is associated to one virtqueue.Reviewed-by: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com>
On Tue, Aug 28, 2012 at 01:54:12PM +0200, Paolo Bonzini wrote:> Hi all, > > this series adds multiqueue support to the virtio-scsi driver, based > on Jason Wang's work on virtio-net. It uses a simple queue steering > algorithm that expects one queue per CPU. LUNs in the same target always > use the same queue (so that commands are not reordered); queue switching > occurs when the request being queued is the only one for the target. > Also based on Jason's patches, the virtqueue affinity is set so that > each CPU is associated to one virtqueue.Is there a spec patch? I did not see one.> I tested the patches with fio, using up to 32 virtio-scsi disks backed > by tmpfs on the host, and 1 LUN per target. > > FIO configuration > ----------------- > [global] > rw=read > bsrange=4k-64k > ioengine=libaio > direct=1 > iodepth=4 > loops=20 > > overall bandwidth (MB/s) > ----------------- > > # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs > 1 540 626 599 > 2 795 965 925 > 4 997 1376 1500 > 8 1136 2130 2060 > 16 1440 2269 2474 > 24 1408 2179 2436 > 32 1515 1978 2319 > > (These numbers for single-queue are with 4 VCPUs, but the impact of adding > more VCPUs is very limited). > > avg bandwidth per LUN (MB/s) > --------------------- > > # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs > 1 540 626 599 > 2 397 482 462 > 4 249 344 375 > 8 142 266 257 > 16 90 141 154 > 24 58 90 101 > 32 47 61 72 > > Testing this may require an irqbalance daemon that is built from git, > due to http://code.google.com/p/irqbalance/issues/detail?id=37. > Alternatively you can just set the affinity manually in /proc. > > Rusty, can you please give your Acked-by to the first two patches? > > Jason Wang (2): > virtio-ring: move queue_index to vring_virtqueue > virtio: introduce an API to set affinity for a virtqueue > > Paolo Bonzini (3): > virtio-scsi: allocate target pointers in a separate memory block > virtio-scsi: pass struct virtio_scsi to virtqueue completion function > virtio-scsi: introduce multiqueue support > > drivers/lguest/lguest_device.c | 1 + > drivers/remoteproc/remoteproc_virtio.c | 1 + > drivers/s390/kvm/kvm_virtio.c | 1 + > drivers/scsi/virtio_scsi.c | 200 ++++++++++++++++++++++++-------- > drivers/virtio/virtio_mmio.c | 11 +- > drivers/virtio/virtio_pci.c | 58 ++++++++- > drivers/virtio/virtio_ring.c | 17 +++ > include/linux/virtio.h | 4 + > include/linux/virtio_config.h | 21 ++++ > 9 files changed, 253 insertions(+), 61 deletions(-) > > _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Michael S. Tsirkin
2012-Sep-04 13:35 UTC
[PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 01:18:31PM +0200, Paolo Bonzini wrote:> Il 04/09/2012 13:09, Michael S. Tsirkin ha scritto: > >> > queuecommand on CPU #0 queuecommand #2 on CPU #1 > >> > -------------------------------------------------------------- > >> > atomic_inc_return(...) == 1 > >> > atomic_inc_return(...) == 2 > >> > virtscsi_queuecommand to queue #1 > >> > tgt->req_vq = queue #0 > >> > virtscsi_queuecommand to queue #0 > >> > > >> > then two requests are issued to different queues without a quiescent > >> > point in the middle. > > What happens then? Does this break correctness? > > Yes, requests to the same target should be processed in FIFO order, or > you have things like a flush issued before the write it was supposed to > flush. This is why I can only change the queue when there is no request > pending. > > PaoloI see. I guess you can rewrite this as: atomic_inc if (atomic_read() == 1) which is a bit cheaper, and make the fact that you do not need increment and return to be atomic, explicit. Another simple idea: store last processor id in target, if it is unchanged no need to play with req_vq and take spinlock. Also - some kind of comment explaining why a similar race can not happen with this lock in place would be nice: I see why this specific race can not trigger but since lock is dropped later before you submit command, I have hard time convincing myself what exactly gurantees that vq is never switched before or even while command is submitted. -- MST
Il 04/09/2012 16:19, Michael S. Tsirkin ha scritto:> > > Also - some kind of comment explaining why a similar race can not happen > > > with this lock in place would be nice: I see why this specific race can > > > not trigger but since lock is dropped later before you submit command, I > > > have hard time convincing myself what exactly gurantees that vq is never > > > switched before or even while command is submitted. > > > > Because tgt->reqs will never become zero (which is a necessary condition > > for tgt->req_vq to change), as long as one request is executing > > virtscsi_queuecommand. > > Yes but this logic would apparently imply the lock is not necessary, and > it actually is. I am not saying anything is wrong just that it > looks scary.Ok, I get the misunderstanding. For the logic to hold, you need a serialization point after which tgt->req_vq is not changed. The lock provides one such serialization point: after you unlock tgt->tgt_lock, nothing else will change tgt->req_vq until your request completes. Without the lock, there could always be a thread that is in the "then" branch but has been scheduled out, and when rescheduled it will change tgt->req_vq. Perhaps the confusion comes from the atomic_inc_return, and that was what my "why is this atomic" wanted to clear. **tgt->reqs is only atomic to avoid taking a spinlock in the ISR**. If you read the code with the lock, but with tgt->reqs as a regular non-atomic int, it should be much easier to reason on the code. I can split the patch if needed. Paolo
Michael S. Tsirkin
2012-Sep-04 15:03 UTC
[PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 04:55:56PM +0200, Paolo Bonzini wrote:> Il 04/09/2012 16:47, Michael S. Tsirkin ha scritto: > >> > static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, > >> > - struct virtqueue *vq) > >> > + struct virtqueue *vq, bool affinity) > >> > { > >> > spin_lock_init(&virtscsi_vq->vq_lock); > >> > virtscsi_vq->vq = vq; > >> > + if (affinity) > >> > + virtqueue_set_affinity(vq, virtqueue_get_queue_index(vq) - > >> > + VIRTIO_SCSI_VQ_BASE); > >> > } > >> > > > This means in practice if you have less virtqueues than CPUs, > > things are not going to work well, will they? > > Not particularly. It could be better or worse than single queue > depending on the workload.Well interrupts will go to CPU different from the one that sends commands so ...> > Any idea what to do? > > Two possibilities: > > 1) Add a stride argument to virtqueue_set_affinity, and make it equal to > the number of queues. > > 2) Make multiqueue the default in QEMU, and make the default number of > queues equal to the number of VCPUs. > > I was going for (2). > > Paolo3. use per target queue if less targets than cpus? -- MST
Rusty Russell
2012-Sep-05 23:32 UTC
[PATCH 2/5] virtio: introduce an API to set affinity for a virtqueue
Paolo Bonzini <pbonzini at redhat.com> writes:> From: Jason Wang <jasowang at redhat.com> > > Sometimes, virtio device need to configure irq affinity hint to maximize the > performance. Instead of just exposing the irq of a virtqueue, this patch > introduce an API to set the affinity for a virtqueue. > > The api is best-effort, the affinity hint may not be set as expected due to > platform support, irq sharing or irq type. Currently, only pci method were > implemented and we set the affinity according to: > > - if device uses INTX, we just ignore the request > - if device has per vq vector, we force the affinity hint > - if the virtqueues share MSI, make the affinity OR over all affinities > requested > > Signed-off-by: Jason Wang <jasowang at redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>Applied, thanks. Acked-by: Rusty Russell <rusty at rustcorp.com.au> Cheers, Rusty.