Richard W.M. Jones
2023-Mar-09 09:50 UTC
[Libguestfs] [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
When a highly multi-threaded program such as nbdcopy encounters an error, there is a race condition in the library which can cause an assertion failure and thus a core dump: (1) An error occurs on one of the threads. nbdcopy calls exit(3). (2) In lib/errors.c, the destructor calls pthread_key_delete. (3) Another thread which is still running also encounters an error, and inside libnbd the library calls set_error(). (4) The call to set_error() calls pthread_getspecific which returns NULL (since the key has already been destroyed in step (2)), and this causes us to call pthread_setspecific which returns EINVAL because glibc is able to detect invalid use of a pthread_key_t after it has been destroyed. In any case, the error message is lost, and any subsequent call to nbd_get_error() will return NULL. (5) We enter the %DEAD state, where there is an assertion that nbd_get_error() is not NULL. This assertion is supposed to be for checking that the code called set_error() before entering the %DEAD state. Although we did call set_error(), the error message was lost at step (4) and nbd_get_error() will always return NULL. This assertion failure causes a crash. There aren't any good ways to fix this, so I chose the same method as used by libvirt in this commit: https://gitlab.com/libvirt/libvirt/-/commit/8e44e5593eb9b89fbc0b54fde15f130707a0d81e (a) Use '-z nodelete' to prevent the library from being unloaded on dlclose(). (b) Do not call pthread_key_destroy (thus leaking the key). (c) When threads exit they are still able to call free_errors_key because it is still present in memory. Thanks: Daniel P. Berrang?, Eric Blake --- configure.ac | 9 +++++++++ lib/Makefile.am | 1 + lib/errors.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index b6d60c3df6..f495926eff 100644 --- a/configure.ac +++ b/configure.ac @@ -607,6 +607,14 @@ AS_IF([test "x$enable_golang" != "xno"],[ ],[GOLANG=no]) AM_CONDITIONAL([HAVE_GOLANG],[test "x$GOLANG" != "xno"]) +AC_MSG_CHECKING([for how to mark DSO non-deletable at runtime]) +NODELETE+`$LD --help 2>&1 | grep -- "-z nodelete" >/dev/null` && \ + NODELETE="-Wl,-z -Wl,nodelete" +AC_MSG_RESULT([$NODELETE]) +AC_SUBST([NODELETE]) + +AC_MSG_CHECKING([for how to set DSO symbol versions]) case $host_os in darwin*) VERSION_SCRIPT@@ -615,6 +623,7 @@ case $host_os in VERSION_SCRIPT="-Wl,--version-script=${srcdir}/libnbd.syms" ;; esac +AC_MSG_RESULT([$VERSION_SCRIPT]) AC_SUBST([VERSION_SCRIPT]) dnl Produce output files. diff --git a/lib/Makefile.am b/lib/Makefile.am index 52b525819b..c886be7da0 100644 --- a/lib/Makefile.am +++ b/lib/Makefile.am @@ -78,6 +78,7 @@ libnbd_la_LIBADD = \ $(NULL) libnbd_la_LDFLAGS = \ $(PTHREAD_LIBS) \ + $(NODELETE) \ $(VERSION_SCRIPT) \ -version-info 0:0:0 \ $(NULL) diff --git a/lib/errors.c b/lib/errors.c index 8b77650ef3..6fbfaacd34 100644 --- a/lib/errors.c +++ b/lib/errors.c @@ -69,7 +69,11 @@ errors_key_destroy (void) free (last_error->error); free (last_error); } - pthread_key_delete (errors_key); + + /* We could do this, but that causes a race condition described here: + * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html + */ + //pthread_key_delete (errors_key); } /* This is called when a thread exits, to free the thread-local data -- 2.39.2
Daniel P. Berrangé
2023-Mar-09 11:11 UTC
[Libguestfs] [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
On Thu, Mar 09, 2023 at 09:50:00AM +0000, Richard W.M. Jones wrote:> When a highly multi-threaded program such as nbdcopy encounters an > error, there is a race condition in the library which can cause an > assertion failure and thus a core dump: > > (1) An error occurs on one of the threads. nbdcopy calls exit(3). > > (2) In lib/errors.c, the destructor calls pthread_key_delete. > > (3) Another thread which is still running also encounters an error, > and inside libnbd the library calls set_error(). > > (4) The call to set_error() calls pthread_getspecific which returns > NULL (since the key has already been destroyed in step (2)), and this > causes us to call pthread_setspecific which returns EINVAL because > glibc is able to detect invalid use of a pthread_key_t after it has > been destroyed. In any case, the error message is lost, and any > subsequent call to nbd_get_error() will return NULL. > > (5) We enter the %DEAD state, where there is an assertion that > nbd_get_error() is not NULL. This assertion is supposed to be for > checking that the code called set_error() before entering the %DEAD > state. Although we did call set_error(), the error message was lost > at step (4) and nbd_get_error() will always return NULL. This > assertion failure causes a crash. > > There aren't any good ways to fix this, so I chose the same method as > used by libvirt in this commit: > > https://gitlab.com/libvirt/libvirt/-/commit/8e44e5593eb9b89fbc0b54fde15f130707a0d81e > > (a) Use '-z nodelete' to prevent the library from being unloaded on > dlclose(). > > (b) Do not call pthread_key_destroy (thus leaking the key). > > (c) When threads exit they are still able to call free_errors_key > because it is still present in memory. > > Thanks: Daniel P. Berrang?, Eric Blake > --- > configure.ac | 9 +++++++++ > lib/Makefile.am | 1 + > lib/errors.c | 6 +++++- > 3 files changed, 15 insertions(+), 1 deletion(-)Reviewed-by: Daniel P. Berrang? <berrange at redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Reasonably Related Threads
- [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code