Richard W.M. Jones
2023-Mar-09 08:44 UTC
[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
This version simply removes the call to pthread_key_destroy. It fixes the problem and allows us to leave the asserts alone so we can still catch real errors. Of course this leaks pthread_key_t in the case where you dlclose() the library. glibc has a limit of 128 thread-specific keys (and the first 32 are somehow "fast" in a way I could quite follow), so that's a thing. Rich.
Richard W.M. Jones
2023-Mar-09 08:44 UTC
[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:
(1) An error occurs on one of the threads. nbdcopy calls exit(3).
(2) In lib/errors.c, the destructor calls pthread_key_delete.
(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().
(4) The call to set_error() calls pthread_getspecific which returns
NULL (since the key has already been destroyed in step (2)), and this
causes us to call pthread_setspecific which returns EINVAL because
glibc is able to detect invalid use of a pthread_key_t after it has
been destroyed. In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.
(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL. This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state. Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL. This
assertion failure causes a crash.
There aren't any good ways to fix this. I chose to leak the
pthread_key_t on the exit path.
---
lib/errors.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650ef3..6fbfaacd34 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -69,7 +69,11 @@ errors_key_destroy (void)
free (last_error->error);
free (last_error);
}
- pthread_key_delete (errors_key);
+
+ /* We could do this, but that causes a race condition described here:
+ * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html
+ */
+ //pthread_key_delete (errors_key);
}
/* This is called when a thread exits, to free the thread-local data
--
2.39.2
Seemingly Similar Threads
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code