Stefan Hajnoczi
2021-Jul-13 16:19 UTC
[RFC 0/3] cpuidle: add poll_source API and virtio vq polling
These patches are not polished yet but I would like request feedback on this approach and share performance results with you. Idle CPUs tentatively enter a busy wait loop before halting when the cpuidle haltpoll driver is enabled inside a virtual machine. This reduces wakeup latency for events that occur soon after the vCPU becomes idle. This patch series extends the cpuidle busy wait loop with the new poll_source API so drivers can participate in polling. Such polling-aware drivers disable their device's irq during the busy wait loop to avoid the cost of interrupts. This reduces latency further than regular cpuidle haltpoll, which still relies on irqs. Virtio drivers are modified to use the poll_source API so all virtio device types get this feature. The following virtio-blk fio benchmark results show the improvement: IOPS (numjobs=4, iodepth=1, 4 virtqueues) before poll_source io_poll 4k randread 167102 186049 (+11%) 186654 (+11%) 4k randwrite 162204 181214 (+11%) 181850 (+12%) 4k randrw 159520 177071 (+11%) 177928 (+11%) The comparison against io_poll shows that cpuidle poll_source achieves equivalent performance to the block layer's io_poll feature (which I implemented in a separate patch series [1]). The advantage of poll_source is that applications do not need to explicitly set the RWF_HIPRI I/O request flag. The poll_source approach is attractive because few applications actually use RWF_HIPRI and it takes advantage of CPU cycles we would have spent in cpuidle haltpoll anyway. The current series does not improve virtio-net. I haven't investigated deeply, but it is possible that NAPI and poll_source do not combine. See the final patch for a starting point on making the two work together. I have not tried this on bare metal but it might help there too. The cost of disabling a device's irq must be less than the savings from avoiding irq handling for this optimization to make sense. [1] https://lore.kernel.org/linux-block/20210520141305.355961-1-stefanha at redhat.com/ Stefan Hajnoczi (3): cpuidle: add poll_source API virtio: add poll_source virtqueue polling softirq: participate in cpuidle polling drivers/cpuidle/Makefile | 1 + drivers/virtio/virtio_pci_common.h | 7 ++ include/linux/interrupt.h | 2 + include/linux/poll_source.h | 53 +++++++++++++++ include/linux/virtio.h | 2 + include/linux/virtio_config.h | 2 + drivers/cpuidle/poll_source.c | 102 +++++++++++++++++++++++++++++ drivers/cpuidle/poll_state.c | 6 ++ drivers/virtio/virtio.c | 34 ++++++++++ drivers/virtio/virtio_pci_common.c | 86 ++++++++++++++++++++++++ drivers/virtio/virtio_pci_modern.c | 2 + kernel/softirq.c | 14 ++++ 12 files changed, 311 insertions(+) create mode 100644 include/linux/poll_source.h create mode 100644 drivers/cpuidle/poll_source.c -- 2.31.1
Introduce an API for adding cpuidle poll callbacks: struct poll_source_ops { void (*start)(struct poll_source *src); void (*stop)(struct poll_source *src); void (*poll)(struct poll_source *src); }; int poll_source_register(struct poll_source *src); int poll_source_unregister(struct poll_source *src); When cpuidle enters the poll state it invokes ->start() and then invokes ->poll() repeatedly from the busy wait loop. Finally ->stop() is invoked when the busy wait loop finishes. The ->poll() function should check for activity and cause TIF_NEED_RESCHED to be set in order to stop the busy wait loop. This API is intended to be used by drivers that can cheaply poll for events. Participating in cpuidle polling allows them to avoid interrupt latencies during periods where the CPU is going to poll anyway. Note that each poll_source is bound to a particular CPU. The API is mainly intended to by used by drivers that have multiple queues with irq affinity. Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com> --- drivers/cpuidle/Makefile | 1 + include/linux/poll_source.h | 53 +++++++++++++++++++ drivers/cpuidle/poll_source.c | 99 +++++++++++++++++++++++++++++++++++ drivers/cpuidle/poll_state.c | 6 +++ 4 files changed, 159 insertions(+) create mode 100644 include/linux/poll_source.h create mode 100644 drivers/cpuidle/poll_source.c diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index 26bbc5e74123..994f72d6fe95 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -7,6 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o +obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_source.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o ################################################################################## diff --git a/include/linux/poll_source.h b/include/linux/poll_source.h new file mode 100644 index 000000000000..ccfb424e170b --- /dev/null +++ b/include/linux/poll_source.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * poll_source.h - cpuidle busy waiting API + */ +#ifndef __LINUX_POLLSOURCE_H__ +#define __LINUX_POLLSOURCE_H__ + +#include <linux/list.h> + +struct poll_source; + +struct poll_source_ops { + void (*start)(struct poll_source *src); + void (*stop)(struct poll_source *src); + void (*poll)(struct poll_source *src); +}; + +struct poll_source { + const struct poll_source_ops *ops; + struct list_head node; + int cpu; +}; + +/** + * poll_source_register - Add a poll_source for a CPU + */ +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX) +int poll_source_register(struct poll_source *src); +#else +static inline int poll_source_register(struct poll_source *src) +{ + return 0; +} +#endif + +/** + * poll_source_unregister - Remove a previously registered poll_source + */ +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX) +int poll_source_unregister(struct poll_source *src); +#else +static inline int poll_source_unregister(struct poll_source *src) +{ + return 0; +} +#endif + +/* Used by the cpuidle driver */ +void poll_source_start(void); +void poll_source_run_once(void); +void poll_source_stop(void); + +#endif /* __LINUX_POLLSOURCE_H__ */ diff --git a/drivers/cpuidle/poll_source.c b/drivers/cpuidle/poll_source.c new file mode 100644 index 000000000000..46100e5a71e4 --- /dev/null +++ b/drivers/cpuidle/poll_source.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * poll_source.c - cpuidle busy waiting API + */ + +#include <linux/lockdep.h> +#include <linux/percpu.h> +#include <linux/poll_source.h> + +/* The per-cpu list of registered poll sources */ +DEFINE_PER_CPU(struct list_head, poll_source_list); + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_start(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->start(src); +} + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_run_once(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->poll(src); +} + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_stop(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->stop(src); +} + +static void poll_source_register_this_cpu(void *opaque) +{ + struct poll_source *src = opaque; + + lockdep_assert_irqs_disabled(); + + list_add_tail(&src->node, this_cpu_ptr(&poll_source_list)); +} + +int poll_source_register(struct poll_source *src) +{ + if (!list_empty(&src->node)) + return -EBUSY; + + /* + * There is no race with src->cpu iterating over poll_source_list + * because smp_call_function_single() just sets TIF_NEED_RESCHED + * instead of sending an IPI during idle. + */ + /* TODO but what happens if the flag isn't set yet when smp_call_function_single() is invoked? */ + return smp_call_function_single(src->cpu, + poll_source_register_this_cpu, + src, + 1); +} +EXPORT_SYMBOL_GPL(poll_source_register); + +static void poll_source_unregister_this_cpu(void *opaque) +{ + struct poll_source *src = opaque; + + lockdep_assert_irqs_disabled(); + + /* + * See comment in poll_source_register() about why this does not race + * with the idle CPU iterating over poll_source_list. + */ + list_del_init(&src->node); +} + +int poll_source_unregister(struct poll_source *src) +{ + return smp_call_function_single(src->cpu, + poll_source_unregister_this_cpu, + src, + 1); +} +EXPORT_SYMBOL_GPL(poll_source_unregister); + +/* TODO what happens when a CPU goes offline? */ +static int __init poll_source_init(void) +{ + int i; + + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(poll_source_list, i)); + + return 0; +} +core_initcall(poll_source_init); diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c index f7e83613ae94..aa26870034ac 100644 --- a/drivers/cpuidle/poll_state.c +++ b/drivers/cpuidle/poll_state.c @@ -7,6 +7,7 @@ #include <linux/sched.h> #include <linux/sched/clock.h> #include <linux/sched/idle.h> +#include <linux/poll_source.h> #define POLL_IDLE_RELAX_COUNT 200 @@ -22,9 +23,12 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, unsigned int loop_count = 0; u64 limit; + poll_source_start(); + limit = cpuidle_poll_time(drv, dev); while (!need_resched()) { + poll_source_run_once(); cpu_relax(); if (loop_count++ < POLL_IDLE_RELAX_COUNT) continue; @@ -35,6 +39,8 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, break; } } + + poll_source_stop(); } current_clr_polling(); -- 2.31.1
VIRTIO drivers can cheaply disable interrupts by setting RING_EVENT_FLAGS_DISABLE in the packed virtqueues's Driver Event Suppression structure. See "2.7.10 Driver and Device Event Suppression" in the VIRTIO 1.1 specification for details (https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html). Add a per-virtqueue poll_source that disables events in ->start(), processes completed virtqueue buffers in ->poll(), and re-enables events in ->stop(). This optimization avoids interrupt injection during cpuidle polling. Workloads that submit requests and then halt the vCPU until completion benefit from this. To enable this optimization: 1. Enable the cpuidle haltpoll driver: https://www.kernel.org/doc/html/latest/virt/guest-halt-polling.html 2. Enable poll_source on the virtio device: # echo -n 1 > /sys/block/vda/device/poll_source Note that this feature currently as no effect on split virtqueues when VIRTIO_F_EVENT_IDX is negotiated. It may be possible to tweak virtqueue_disable_cb_split() but I haven't attempted it here. This optimization has been benchmarked successfully with virtio-blk devices. Currently it does not improve virtio-net performance, probably because it doesn't combine with NAPI polling. Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com> --- drivers/virtio/virtio_pci_common.h | 7 +++ include/linux/virtio.h | 2 + include/linux/virtio_config.h | 2 + drivers/virtio/virtio.c | 34 ++++++++++++ drivers/virtio/virtio_pci_common.c | 86 ++++++++++++++++++++++++++++++ drivers/virtio/virtio_pci_modern.c | 2 + 6 files changed, 133 insertions(+) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index beec047a8f8d..630746043183 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -21,6 +21,7 @@ #include <linux/pci.h> #include <linux/slab.h> #include <linux/interrupt.h> +#include <linux/poll_source.h> #include <linux/virtio.h> #include <linux/virtio_config.h> #include <linux/virtio_ring.h> @@ -38,6 +39,9 @@ struct virtio_pci_vq_info { /* MSI-X vector (or none) */ unsigned msix_vector; + + /* the cpuidle poll_source */ + struct poll_source poll_source; }; /* Our device structure */ @@ -102,6 +106,9 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev) return container_of(vdev, struct virtio_pci_device, vdev); } +/* enable poll_source API for vq polling */ +int vp_enable_poll_source(struct virtio_device *vdev, bool enable); + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev); /* the notify function used when creating a virt queue */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b1894e0323fa..baaa3c8fadb1 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -93,6 +93,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore) + * @poll_source_enabled: poll_source API enabled for vq polling * @config_enabled: configuration change reporting enabled * @config_change_pending: configuration change reported while disabled * @config_lock: protects configuration change reporting @@ -107,6 +108,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); struct virtio_device { int index; bool failed; + bool poll_source_enabled; bool config_enabled; bool config_change_pending; spinlock_t config_lock; diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 8519b3ae5d52..5fb78d56fd44 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -72,6 +72,7 @@ struct virtio_shm_region { * @set_vq_affinity: set the affinity for a virtqueue (optional). * @get_vq_affinity: get the affinity for a virtqueue (optional). * @get_shm_region: get a shared memory region based on the index. + * @enable_poll_source: enable/disable poll_source API vq polling (optional). */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -97,6 +98,7 @@ struct virtio_config_ops { int index); bool (*get_shm_region)(struct virtio_device *vdev, struct virtio_shm_region *region, u8 id); + int (*enable_poll_source)(struct virtio_device *dev, bool enable); }; /* If driver didn't advertise the feature, it will never appear. */ diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 4b15c00c0a0a..22fee71bbe34 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -59,12 +59,44 @@ static ssize_t features_show(struct device *_d, } static DEVICE_ATTR_RO(features); +static ssize_t poll_source_show(struct device *_d, + struct device_attribute *attr, char *buf) +{ + struct virtio_device *dev = dev_to_virtio(_d); + return sprintf(buf, "%d\n", dev->poll_source_enabled); +} + +static ssize_t poll_source_store(struct device *_d, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct virtio_device *dev = dev_to_virtio(_d); + bool val; + int rc; + + rc = kstrtobool(buf, &val); + if (rc) + return rc; + + if (val == dev->poll_source_enabled) + return count; + if (!dev->config->enable_poll_source) + return -ENOTSUPP; + + rc = dev->config->enable_poll_source(dev, val); + if (rc) + return rc; + return count; +} +static DEVICE_ATTR_RW(poll_source); + static struct attribute *virtio_dev_attrs[] = { &dev_attr_device.attr, &dev_attr_vendor.attr, &dev_attr_status.attr, &dev_attr_modalias.attr, &dev_attr_features.attr, + &dev_attr_poll_source.attr, NULL, }; ATTRIBUTE_GROUPS(virtio_dev); @@ -343,6 +375,8 @@ int register_virtio_device(struct virtio_device *dev) dev->index = err; dev_set_name(&dev->dev, "virtio%u", dev->index); + dev->poll_source_enabled = false; + spin_lock_init(&dev->config_lock); dev->config_enabled = false; dev->config_change_pending = false; diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 222d630c41fc..6de372e12afb 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -24,6 +24,83 @@ MODULE_PARM_DESC(force_legacy, "Force legacy mode for transitional virtio 1 devices"); #endif +static void vp_poll_source_start(struct poll_source *src) +{ + struct virtio_pci_vq_info *info + container_of(src, struct virtio_pci_vq_info, poll_source); + + /* This API does not require a lock */ + virtqueue_disable_cb(info->vq); +} + +static void vp_poll_source_stop(struct poll_source *src) +{ + struct virtio_pci_vq_info *info + container_of(src, struct virtio_pci_vq_info, poll_source); + + /* Poll one last time in case */ + /* TODO allow driver to provide spinlock */ + if (!virtqueue_enable_cb(info->vq)) + vring_interrupt(0 /* ignored */, info->vq); +} + +static void vp_poll_source_poll(struct poll_source *src) +{ + struct virtio_pci_vq_info *info + container_of(src, struct virtio_pci_vq_info, poll_source); + + vring_interrupt(0 /* ignored */, info->vq); +} + +static const struct poll_source_ops vp_poll_source_ops = { + .start = vp_poll_source_start, + .stop = vp_poll_source_stop, + .poll = vp_poll_source_poll, +}; + +/* call this when irq affinity changes to update cpuidle poll_source */ +/* TODO this function waits for a smp_call_function_single() completion, is that allowed in all caller contexts? */ +/* TODO this function is not thread-safe, do all callers hold the same lock? */ +static int vp_update_poll_source(struct virtio_device *vdev, int index) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct poll_source *src = &vp_dev->vqs[index]->poll_source; + const struct cpumask *mask; + + if (!list_empty(&src->node)) + poll_source_unregister(src); + + if (!vdev->poll_source_enabled) + return 0; + + mask = vp_get_vq_affinity(vdev, index); + if (!mask) + return -ENOTTY; + + /* Update the poll_source cpu */ + src->cpu = cpumask_first(mask); + + /* Don't use poll_source if interrupts are handled on multiple CPUs */ + if (cpumask_next(src->cpu, mask) < nr_cpu_ids) + return 0; + + return poll_source_register(src); +} + +/* TODO add this to virtio_pci_legacy config ops? */ +int vp_enable_poll_source(struct virtio_device *vdev, bool enable) +{ + struct virtqueue *vq; + + vdev->poll_source_enabled = enable; + + /* TODO locking */ + list_for_each_entry(vq, &vdev->vqs, list) { + vp_update_poll_source(vdev, vq->index); /* TODO handle errors? */ + } + return 0; +} + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev) { @@ -186,6 +263,9 @@ static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned index, if (!info) return ERR_PTR(-ENOMEM); + info->poll_source.ops = &vp_poll_source_ops; + INIT_LIST_HEAD(&info->poll_source.node); + vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, ctx, msix_vec); if (IS_ERR(vq)) @@ -237,6 +317,7 @@ void vp_del_vqs(struct virtio_device *vdev) int irq = pci_irq_vector(vp_dev->pci_dev, v); irq_set_affinity_hint(irq, NULL); + vp_update_poll_source(vdev, vq->index); free_irq(irq, vq); } } @@ -342,6 +423,9 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, vqs[i]); if (err) goto error_find; + + if (desc) + vp_update_poll_source(vdev, i); } return 0; @@ -440,6 +524,8 @@ int vp_set_vq_affinity(struct virtqueue *vq, const struct cpumask *cpu_mask) cpumask_copy(mask, cpu_mask); irq_set_affinity_hint(irq, mask); } + + vp_update_poll_source(vdev, vq->index); } return 0; } diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 30654d3a0b41..417d568590f2 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = { .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, .get_shm_region = vp_get_shm_region, + .enable_poll_source = vp_enable_poll_source, }; static const struct virtio_config_ops virtio_pci_config_ops = { @@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = { .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, .get_shm_region = vp_get_shm_region, + .enable_poll_source = vp_enable_poll_source, }; /* the PCI probing function */ -- 2.31.1
Normally softirqs are invoked when exiting irqs. When polling in the cpuidle driver there may be no irq. Therefore pending softirqs go unnoticed and polling continues without invoking them. Add a softirq_poll() function to explicitly check for and invoke softirqs. Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com> --- This commit is not needed for virtio-blk. I added it when I realized virtio-net's NAPI scheduling might not be detected by the cpuidle busy wait loop because it is unaware of softirqs. However, even after doing this virtio-net's NAPI polling doesn't combine with cpuidle haltpoll. Perhaps this patch is still desirable for cpuidle poll_state in case a softirq is raised? --- include/linux/interrupt.h | 2 ++ drivers/cpuidle/poll_source.c | 3 +++ kernel/softirq.c | 14 ++++++++++++++ 3 files changed, 19 insertions(+) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 4777850a6dc7..9bfdcc466ba8 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -573,6 +573,8 @@ struct softirq_action asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); +extern void softirq_poll(void); + extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); extern void __raise_softirq_irqoff(unsigned int nr); diff --git a/drivers/cpuidle/poll_source.c b/drivers/cpuidle/poll_source.c index 46100e5a71e4..ed200feb0daa 100644 --- a/drivers/cpuidle/poll_source.c +++ b/drivers/cpuidle/poll_source.c @@ -6,6 +6,7 @@ #include <linux/lockdep.h> #include <linux/percpu.h> #include <linux/poll_source.h> +#include <linux/interrupt.h> /* The per-cpu list of registered poll sources */ DEFINE_PER_CPU(struct list_head, poll_source_list); @@ -26,6 +27,8 @@ void poll_source_run_once(void) list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) src->ops->poll(src); + + softirq_poll(); } /* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 4992853ef53d..f45bf0204218 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -611,6 +611,20 @@ void irq_enter(void) irq_enter_rcu(); } +/** + * softirq_poll() - invoke pending softirqs + * + * Normally it is not necessary to explicitly poll for softirqs, but in the + * cpuidle driver a polling function may have raised a softirq with no irq exit + * to invoke it. Therefore it is necessary to poll for pending softirqs and + * invoke them explicitly. + */ +void softirq_poll(void) +{ + if (!in_interrupt() && local_softirq_pending()) + invoke_softirq(); +} + static inline void tick_irq_exit(void) { #ifdef CONFIG_NO_HZ_COMMON -- 2.31.1
Jason Wang
2021-Jul-21 03:29 UTC
[RFC 0/3] cpuidle: add poll_source API and virtio vq polling
? 2021/7/14 ??12:19, Stefan Hajnoczi ??:> These patches are not polished yet but I would like request feedback on this > approach and share performance results with you. > > Idle CPUs tentatively enter a busy wait loop before halting when the cpuidle > haltpoll driver is enabled inside a virtual machine. This reduces wakeup > latency for events that occur soon after the vCPU becomes idle. > > This patch series extends the cpuidle busy wait loop with the new poll_source > API so drivers can participate in polling. Such polling-aware drivers disable > their device's irq during the busy wait loop to avoid the cost of interrupts. > This reduces latency further than regular cpuidle haltpoll, which still relies > on irqs. > > Virtio drivers are modified to use the poll_source API so all virtio device > types get this feature. The following virtio-blk fio benchmark results show the > improvement: > > IOPS (numjobs=4, iodepth=1, 4 virtqueues) > before poll_source io_poll > 4k randread 167102 186049 (+11%) 186654 (+11%) > 4k randwrite 162204 181214 (+11%) 181850 (+12%) > 4k randrw 159520 177071 (+11%) 177928 (+11%) > > The comparison against io_poll shows that cpuidle poll_source achieves > equivalent performance to the block layer's io_poll feature (which I > implemented in a separate patch series [1]). > > The advantage of poll_source is that applications do not need to explicitly set > the RWF_HIPRI I/O request flag. The poll_source approach is attractive because > few applications actually use RWF_HIPRI and it takes advantage of CPU cycles we > would have spent in cpuidle haltpoll anyway. > > The current series does not improve virtio-net. I haven't investigated deeply, > but it is possible that NAPI and poll_source do not combine. See the final > patch for a starting point on making the two work together. > > I have not tried this on bare metal but it might help there too. The cost of > disabling a device's irq must be less than the savings from avoiding irq > handling for this optimization to make sense. > > [1] https://lore.kernel.org/linux-block/20210520141305.355961-1-stefanha at redhat.com/Hi Stefan: Some questions: 1) What's the advantages of introducing polling at virtio level instead of doing it at each subsystems? Polling in virtio level may only work well if all (or most) of the devices are virtio 2) What's the advantages of using cpuidle instead of using a thread (and leverage the scheduler)? 3) Any reason it's virtio_pci specific not a general virtio one? Thanks (Btw, do we need to cc scheduler guys?)> > Stefan Hajnoczi (3): > cpuidle: add poll_source API > virtio: add poll_source virtqueue polling > softirq: participate in cpuidle polling > > drivers/cpuidle/Makefile | 1 + > drivers/virtio/virtio_pci_common.h | 7 ++ > include/linux/interrupt.h | 2 + > include/linux/poll_source.h | 53 +++++++++++++++ > include/linux/virtio.h | 2 + > include/linux/virtio_config.h | 2 + > drivers/cpuidle/poll_source.c | 102 +++++++++++++++++++++++++++++ > drivers/cpuidle/poll_state.c | 6 ++ > drivers/virtio/virtio.c | 34 ++++++++++ > drivers/virtio/virtio_pci_common.c | 86 ++++++++++++++++++++++++ > drivers/virtio/virtio_pci_modern.c | 2 + > kernel/softirq.c | 14 ++++ > 12 files changed, 311 insertions(+) > create mode 100644 include/linux/poll_source.h > create mode 100644 drivers/cpuidle/poll_source.c >