David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
As raised by John Hubbard [1], offline_and_remove_memory() failing on fatal signals can be sub-optimal for out-of-tree drivers: dying user space might be the last one holding a device node open. As that device node gets closed, the driver might unplug the device and trigger offline_and_remove_memory() to unplug previously hotplugged device memory. This, however, will fail reliably when fatal signals are pending on the dying process, turning the device unusable until the machine gets rebooted. That can be optizied easily by ignoring fatal signals. In fact, checking for fatal signals in the case of offline_and_remove_memory() doesn't make too much sense; the check makes sense when offlining is triggered directly via sysfs. However, we actually do want a way to not end up stuck in offline_and_remove_memory() forever. What offline_and_remove_memory() users actually want is fail after some given timeout and not care about fatal signals. So let's implement that, optimizing virtio-mem along the way. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: "Michael S. Tsirkin" <mst at redhat.com> Cc: John Hubbard <jhubbard at nvidia.com> Cc: Oscar Salvador <osalvador at suse.de> Cc: Michal Hocko <mhocko at suse.com> Cc: Jason Wang <jasowang at redhat.com> Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com> [1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard at nvidia.com David Hildenbrand (5): mm/memory_hotplug: check for fatal signals only in offline_pages() virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds virtio-mem: check if the config changed before (fake) offlining memory drivers/virtio/virtio_mem.c | 22 +++++++++++++-- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++-- 3 files changed, 68 insertions(+), 6 deletions(-) base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1 -- 2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages()
Let's check for fatal signals only. That looks cleaner and still keeps the documented use case for manual user-space triggered memory offlining working. From Documentation/admin-guide/mm/memory-hotplug.rst: % timeout $TIMEOUT offline_block | failure_handling In fact, we even document there: "the offlining context can be terminated by sending a fatal signal". Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8e0fa209d533..0d2151df4ee1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, do { pfn = start_pfn; do { - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { ret = -EINTR; reason = "signal backoff"; goto failed_removal_isolated; -- 2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
Let's prepare for offline_and_remove_memory() to return other error codes that effectively translate to -EBUSY, such as -ETIMEDOUT. Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 835f6cc2fb66..cb8bc6f6aa90 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -750,7 +750,15 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm, dev_dbg(&vm->vdev->dev, "offlining and removing memory failed: %d\n", rc); } - return rc; + + switch (rc) { + case 0: + case -ENOMEM: + case -EINVAL: + return rc; + default: + return -EBUSY; + } } /* -- 2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
John Hubbard writes [1]: Some device drivers add memory to the system via memory hotplug. When the driver is unloaded, that memory is hot-unplugged. However, memory hot unplug can fail. And these days, it fails a little too easily, with respect to the above case. Specifically, if a signal is pending on the process, hot unplug fails. [...] So in this case, other things (unmovable pages, un-splittable huge pages) can also cause the above problem. However, those are demonstrably less common than simply having a pending signal. I've got bug reports from users who can trivially reproduce this by killing their process with a "kill -9", for example. Especially with ZONE_MOVABLE, offlining is supposed to work in most cases when offlining actually hotplugged (not boot) memory, and only fail in rare corner cases (e.g., some driver holds a reference to a page in ZONE_MOVABLE, turning it unmovable). In these corner cases we really don't want to be stuck forever in offline_and_remove_memory(). But in the general cases, we really want to do our best to make memory offlining succeed -- in a reasonable timeframe. Reliably failing in the described case when there is a fatal signal pending is sub-optimal. The pending signal check is mostly only relevant when user space explicitly triggers offlining of memory using sysfs device attributes ("state" or "online" attribute), but not when coming via offline_and_remove_memory(). So let's use a timer instead and ignore fatal signals, because they are not really expressive for offline_and_remove_memory() users. Let's default to 30 seconds if no timeout was specified, and limit the timeout to 120 seconds. This change is also valuable for virtio-mem in BBM (Big Block Mode) with "bbm_safe_unplug=off", to avoid endless loops when stuck forever in offline_and_remove_memory(). While at it, drop the "extern" from offline_and_remove_memory() to make it fit into a single line. [1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard at nvidia.com Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 2 +- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++-- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index cb8bc6f6aa90..f8792223f1db 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm, "offlining and removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); - rc = offline_and_remove_memory(addr, size); + rc = offline_and_remove_memory(addr, size, 0); if (!rc) { atomic64_sub(size, &vm->offline_size); /* diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 9fcbf5706595..d5f9e8b5a4a4 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -307,7 +307,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group); extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); -extern int offline_and_remove_memory(u64 start, u64 size); +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms); #else static inline void try_offline_node(int nid) {} diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0d2151df4ee1..ca635121644a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -152,6 +152,22 @@ void put_online_mems(void) bool movable_node_enabled = false; +/* + * Protected by the device hotplug lock: offline_and_remove_memory() + * will activate a timer such that offlining cannot be stuck forever. + * + * With an active timer, fatal signals will be ignored, because they can be + * counter-productive when dying user space triggers device unplug/driver + * unloading that ends up offlining+removing device memory. + */ +static bool mhp_offlining_timer_active; +static atomic_t mhp_offlining_timer_expired; + +static void mhp_offline_timer_fn(struct timer_list *unused) +{ + atomic_set(&mhp_offlining_timer_expired, 1); +} + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE int mhp_default_online_type = MMOP_OFFLINE; #else @@ -1879,7 +1895,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, do { pfn = start_pfn; do { - if (fatal_signal_pending(current)) { + /* + * If a timer is active, we're coming via + * offline_and_remove_memory() and want to ignore even + * fatal signals. + */ + if (mhp_offlining_timer_active) { + if (atomic_read(&mhp_offlining_timer_expired)) { + ret = -ETIMEDOUT; + reason = "timeout"; + goto failed_removal_isolated; + } + } else if (fatal_signal_pending(current)) { ret = -EINTR; reason = "signal backoff"; goto failed_removal_isolated; @@ -2232,11 +2259,17 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg) * memory is still in use. Primarily useful for memory devices that logically * unplugged all memory (so it's no longer in use) and want to offline + remove * that memory. + * + * offline_and_remove_memory() will not fail on fatal signals. Instead, it will + * fail once the timeout has been reached and offlining was not completed. If + * no timeout was specified, it will timeout after 30 seconds. The timeout is + * limited to 120 seconds. */ -int offline_and_remove_memory(u64 start, u64 size) +int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms) { const unsigned long mb_count = size / memory_block_size_bytes(); uint8_t *online_types, *tmp; + struct timer_list timer; int rc; if (!IS_ALIGNED(start, memory_block_size_bytes()) || @@ -2261,9 +2294,22 @@ int offline_and_remove_memory(u64 start, u64 size) lock_device_hotplug(); + if (!timeout_ms) + timeout_ms = 30 * MSEC_PER_SEC; + timeout_ms = min_t(unsigned int, timeout_ms, 120 * MSEC_PER_SEC); + + timer_setup_on_stack(&timer, mhp_offline_timer_fn, 0); + mod_timer(&timer, jiffies + msecs_to_jiffies(timeout_ms)); + mhp_offlining_timer_active = true; + tmp = online_types; rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block); + timer_delete_sync(&timer); + atomic_set(&mhp_offlining_timer_expired, 0); + mhp_offlining_timer_active = false; + destroy_timer_on_stack(&timer); + /* * In case we succeeded to offline all memory, remove it. * This cannot fail as it cannot get onlined in the meantime. -- 2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds
Currently we use the default (30 seconds), let's reduce it to 10 seconds. In BBM, we barely deal with blocks larger than 1/2 GiB, and after 10 seconds it's most probably best to give up on that memory block and try another one (or retry this one later). In the common fake-offline case where we effectively fake-offline memory using alloc_contig_range() first (SBM or BBM with bbm_safe_unplug=on), we expect offline_and_remove_memory() to be blazingly fast and never take anywhere close to 10seconds -- so this should only affect BBM with bbm_safe_unplug=off. While at it, update the parameter description and the relationship to unmovable pages. Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f8792223f1db..7468b4a907e3 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -41,7 +41,7 @@ MODULE_PARM_DESC(bbm_block_size, static bool bbm_safe_unplug = true; module_param(bbm_safe_unplug, bool, 0444); MODULE_PARM_DESC(bbm_safe_unplug, - "Use a safe unplug mechanism in BBM, avoiding long/endless loops"); + "Use a safe/fast unplug mechanism in BBM, failing faster on unmovable pages"); /* * virtio-mem currently supports the following modes of operation: @@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm, "offlining and removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); - rc = offline_and_remove_memory(addr, size, 0); + rc = offline_and_remove_memory(addr, size, 10 * MSEC_PER_SEC); if (!rc) { atomic64_sub(size, &vm->offline_size); /* -- 2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory
If we repeatedly fail to (fake) offline memory, we won't be sending any unplug requests to the device. However, we only check if the config changed when sending such (un)plug requests. So we could end up trying for a long time to offline memory, even though the config changed already and we're not supposed to unplug memory anymore. Let's optimize for that case, identified while testing the offline_and_remove() memory timeout and simulating it repeatedly running into the timeout. Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/virtio/virtio_mem.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 7468b4a907e3..247fb3e0ce61 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1922,6 +1922,10 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm, unsigned long start_pfn; int rc; + /* Stop fake offlining attempts if the config changed. */ + if (atomic_read(&vm->config_changed)) + return -EAGAIN; + start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + sb_id * vm->sbm.sb_size); @@ -2233,6 +2237,10 @@ static int virtio_mem_bbm_unplug_request(struct virtio_mem *vm, uint64_t diff) virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) { cond_resched(); + /* Stop (fake) offlining attempts if the config changed. */ + if (atomic_read(&vm->config_changed)) + return -EAGAIN; + /* * As we're holding no locks, these checks are racy, * but we don't care. -- 2.40.1