David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
As raised by John Hubbard [1], offline_and_remove_memory() failing on
fatal signals can be sub-optimal for out-of-tree drivers: dying user space
might be the last one holding a device node open.
As that device node gets closed, the driver might unplug the device
and trigger offline_and_remove_memory() to unplug previously
hotplugged device memory. This, however, will fail reliably when fatal
signals are pending on the dying process, turning the device unusable until
the machine gets rebooted.
That can be optizied easily by ignoring fatal signals. In fact, checking
for fatal signals in the case of offline_and_remove_memory() doesn't
make too much sense; the check makes sense when offlining is triggered
directly via sysfs. However, we actually do want a way to not end up
stuck in offline_and_remove_memory() forever.
What offline_and_remove_memory() users actually want is fail after some
given timeout and not care about fatal signals.
So let's implement that, optimizing virtio-mem along the way.
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: "Michael S. Tsirkin" <mst at redhat.com>
Cc: John Hubbard <jhubbard at nvidia.com>
Cc: Oscar Salvador <osalvador at suse.de>
Cc: Michal Hocko <mhocko at suse.com>
Cc: Jason Wang <jasowang at redhat.com>
Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
[1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard at nvidia.com
David Hildenbrand (5):
mm/memory_hotplug: check for fatal signals only in offline_pages()
virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
mm/memory_hotplug: make offline_and_remove_memory() timeout instead of
failing on fatal signals
virtio-mem: set the timeout for offline_and_remove_memory() to 10
seconds
virtio-mem: check if the config changed before (fake) offlining memory
drivers/virtio/virtio_mem.c | 22 +++++++++++++--
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++--
3 files changed, 68 insertions(+), 6 deletions(-)
base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
--
2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages()
Let's check for fatal signals only. That looks cleaner and still keeps
the documented use case for manual user-space triggered memory offlining
working. From Documentation/admin-guide/mm/memory-hotplug.rst:
% timeout $TIMEOUT offline_block | failure_handling
In fact, we even document there: "the offlining context can be terminated
by sending a fatal signal".
Signed-off-by: David Hildenbrand <david at redhat.com>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8e0fa209d533..0d2151df4ee1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned
long nr_pages,
do {
pfn = start_pfn;
do {
- if (signal_pending(current)) {
+ if (fatal_signal_pending(current)) {
ret = -EINTR;
reason = "signal backoff";
goto failed_removal_isolated;
--
2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
Let's prepare for offline_and_remove_memory() to return other error
codes that effectively translate to -EBUSY, such as -ETIMEDOUT.
Signed-off-by: David Hildenbrand <david at redhat.com>
---
drivers/virtio/virtio_mem.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 835f6cc2fb66..cb8bc6f6aa90 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -750,7 +750,15 @@ static int virtio_mem_offline_and_remove_memory(struct
virtio_mem *vm,
dev_dbg(&vm->vdev->dev,
"offlining and removing memory failed: %d\n", rc);
}
- return rc;
+
+ switch (rc) {
+ case 0:
+ case -ENOMEM:
+ case -EINVAL:
+ return rc;
+ default:
+ return -EBUSY;
+ }
}
/*
--
2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
John Hubbard writes [1]:
Some device drivers add memory to the system via memory hotplug.
When the driver is unloaded, that memory is hot-unplugged.
However, memory hot unplug can fail. And these days, it fails a
little too easily, with respect to the above case. Specifically, if
a signal is pending on the process, hot unplug fails.
[...]
So in this case, other things (unmovable pages, un-splittable huge
pages) can also cause the above problem. However, those are
demonstrably less common than simply having a pending signal. I've
got bug reports from users who can trivially reproduce this by
killing their process with a "kill -9", for example.
Especially with ZONE_MOVABLE, offlining is supposed to work in most
cases when offlining actually hotplugged (not boot) memory, and only fail
in rare corner cases (e.g., some driver holds a reference to a page in
ZONE_MOVABLE, turning it unmovable).
In these corner cases we really don't want to be stuck forever in
offline_and_remove_memory(). But in the general cases, we really want to
do our best to make memory offlining succeed -- in a reasonable
timeframe.
Reliably failing in the described case when there is a fatal signal pending
is sub-optimal. The pending signal check is mostly only relevant when user
space explicitly triggers offlining of memory using sysfs device attributes
("state" or "online" attribute), but not when coming via
offline_and_remove_memory().
So let's use a timer instead and ignore fatal signals, because they are
not really expressive for offline_and_remove_memory() users. Let's default
to 30 seconds if no timeout was specified, and limit the timeout to 120
seconds.
This change is also valuable for virtio-mem in BBM (Big Block Mode) with
"bbm_safe_unplug=off", to avoid endless loops when stuck forever in
offline_and_remove_memory().
While at it, drop the "extern" from offline_and_remove_memory() to
make
it fit into a single line.
[1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard at nvidia.com
Signed-off-by: David Hildenbrand <david at redhat.com>
---
drivers/virtio/virtio_mem.c | 2 +-
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 4 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index cb8bc6f6aa90..f8792223f1db 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct
virtio_mem *vm,
"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
- rc = offline_and_remove_memory(addr, size);
+ rc = offline_and_remove_memory(addr, size, 0);
if (!rc) {
atomic64_sub(size, &vm->offline_size);
/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9fcbf5706595..d5f9e8b5a4a4 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -307,7 +307,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned
long nr_pages,
struct zone *zone, struct memory_group *group);
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
-extern int offline_and_remove_memory(u64 start, u64 size);
+int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms);
#else
static inline void try_offline_node(int nid) {}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d2151df4ee1..ca635121644a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -152,6 +152,22 @@ void put_online_mems(void)
bool movable_node_enabled = false;
+/*
+ * Protected by the device hotplug lock: offline_and_remove_memory()
+ * will activate a timer such that offlining cannot be stuck forever.
+ *
+ * With an active timer, fatal signals will be ignored, because they can be
+ * counter-productive when dying user space triggers device unplug/driver
+ * unloading that ends up offlining+removing device memory.
+ */
+static bool mhp_offlining_timer_active;
+static atomic_t mhp_offlining_timer_expired;
+
+static void mhp_offline_timer_fn(struct timer_list *unused)
+{
+ atomic_set(&mhp_offlining_timer_expired, 1);
+}
+
#ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
int mhp_default_online_type = MMOP_OFFLINE;
#else
@@ -1879,7 +1895,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned
long nr_pages,
do {
pfn = start_pfn;
do {
- if (fatal_signal_pending(current)) {
+ /*
+ * If a timer is active, we're coming via
+ * offline_and_remove_memory() and want to ignore even
+ * fatal signals.
+ */
+ if (mhp_offlining_timer_active) {
+ if (atomic_read(&mhp_offlining_timer_expired)) {
+ ret = -ETIMEDOUT;
+ reason = "timeout";
+ goto failed_removal_isolated;
+ }
+ } else if (fatal_signal_pending(current)) {
ret = -EINTR;
reason = "signal backoff";
goto failed_removal_isolated;
@@ -2232,11 +2259,17 @@ static int try_reonline_memory_block(struct memory_block
*mem, void *arg)
* memory is still in use. Primarily useful for memory devices that logically
* unplugged all memory (so it's no longer in use) and want to offline +
remove
* that memory.
+ *
+ * offline_and_remove_memory() will not fail on fatal signals. Instead, it will
+ * fail once the timeout has been reached and offlining was not completed. If
+ * no timeout was specified, it will timeout after 30 seconds. The timeout is
+ * limited to 120 seconds.
*/
-int offline_and_remove_memory(u64 start, u64 size)
+int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms)
{
const unsigned long mb_count = size / memory_block_size_bytes();
uint8_t *online_types, *tmp;
+ struct timer_list timer;
int rc;
if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
@@ -2261,9 +2294,22 @@ int offline_and_remove_memory(u64 start, u64 size)
lock_device_hotplug();
+ if (!timeout_ms)
+ timeout_ms = 30 * MSEC_PER_SEC;
+ timeout_ms = min_t(unsigned int, timeout_ms, 120 * MSEC_PER_SEC);
+
+ timer_setup_on_stack(&timer, mhp_offline_timer_fn, 0);
+ mod_timer(&timer, jiffies + msecs_to_jiffies(timeout_ms));
+ mhp_offlining_timer_active = true;
+
tmp = online_types;
rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
+ timer_delete_sync(&timer);
+ atomic_set(&mhp_offlining_timer_expired, 0);
+ mhp_offlining_timer_active = false;
+ destroy_timer_on_stack(&timer);
+
/*
* In case we succeeded to offline all memory, remove it.
* This cannot fail as it cannot get onlined in the meantime.
--
2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds
Currently we use the default (30 seconds), let's reduce it to 10
seconds. In BBM, we barely deal with blocks larger than 1/2 GiB, and
after 10 seconds it's most probably best to give up on that memory block
and try another one (or retry this one later).
In the common fake-offline case where we effectively fake-offline memory
using alloc_contig_range() first (SBM or BBM with bbm_safe_unplug=on),
we expect offline_and_remove_memory() to be blazingly fast and never take
anywhere close to 10seconds -- so this should only affect BBM with
bbm_safe_unplug=off.
While at it, update the parameter description and the relationship to
unmovable pages.
Signed-off-by: David Hildenbrand <david at redhat.com>
---
drivers/virtio/virtio_mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f8792223f1db..7468b4a907e3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -41,7 +41,7 @@ MODULE_PARM_DESC(bbm_block_size,
static bool bbm_safe_unplug = true;
module_param(bbm_safe_unplug, bool, 0444);
MODULE_PARM_DESC(bbm_safe_unplug,
- "Use a safe unplug mechanism in BBM, avoiding long/endless
loops");
+ "Use a safe/fast unplug mechanism in BBM, failing faster on
unmovable pages");
/*
* virtio-mem currently supports the following modes of operation:
@@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct
virtio_mem *vm,
"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
- rc = offline_and_remove_memory(addr, size, 0);
+ rc = offline_and_remove_memory(addr, size, 10 * MSEC_PER_SEC);
if (!rc) {
atomic64_sub(size, &vm->offline_size);
/*
--
2.40.1
David Hildenbrand
2023-Jun-27 11:22 UTC
[PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory
If we repeatedly fail to (fake) offline memory, we won't be sending
any unplug requests to the device. However, we only check if the config
changed when sending such (un)plug requests.
So we could end up trying for a long time to offline memory, even though
the config changed already and we're not supposed to unplug memory
anymore.
Let's optimize for that case, identified while testing the
offline_and_remove() memory timeout and simulating it repeatedly running
into the timeout.
Signed-off-by: David Hildenbrand <david at redhat.com>
---
drivers/virtio/virtio_mem.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 7468b4a907e3..247fb3e0ce61 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1922,6 +1922,10 @@ static int virtio_mem_sbm_unplug_sb_online(struct
virtio_mem *vm,
unsigned long start_pfn;
int rc;
+ /* Stop fake offlining attempts if the config changed. */
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
sb_id * vm->sbm.sb_size);
@@ -2233,6 +2237,10 @@ static int virtio_mem_bbm_unplug_request(struct
virtio_mem *vm, uint64_t diff)
virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
cond_resched();
+ /* Stop (fake) offlining attempts if the config changed. */
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
/*
* As we're holding no locks, these checks are racy,
* but we don't care.
--
2.40.1