Richard W.M. Jones
2023-Mar-09 08:44 UTC
[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
When a highly multi-threaded program such as nbdcopy encounters an error, there is a race condition in the library which can cause an assertion failure and thus a core dump: (1) An error occurs on one of the threads. nbdcopy calls exit(3). (2) In lib/errors.c, the destructor calls pthread_key_delete. (3) Another thread which is still running also encounters an error, and inside libnbd the library calls set_error(). (4) The call to set_error() calls pthread_getspecific which returns NULL (since the key has already been destroyed in step (2)), and this causes us to call pthread_setspecific which returns EINVAL because glibc is able to detect invalid use of a pthread_key_t after it has been destroyed. In any case, the error message is lost, and any subsequent call to nbd_get_error() will return NULL. (5) We enter the %DEAD state, where there is an assertion that nbd_get_error() is not NULL. This assertion is supposed to be for checking that the code called set_error() before entering the %DEAD state. Although we did call set_error(), the error message was lost at step (4) and nbd_get_error() will always return NULL. This assertion failure causes a crash. There aren't any good ways to fix this. I chose to leak the pthread_key_t on the exit path. --- lib/errors.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/errors.c b/lib/errors.c index 8b77650ef3..6fbfaacd34 100644 --- a/lib/errors.c +++ b/lib/errors.c @@ -69,7 +69,11 @@ errors_key_destroy (void) free (last_error->error); free (last_error); } - pthread_key_delete (errors_key); + + /* We could do this, but that causes a race condition described here: + * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html + */ + //pthread_key_delete (errors_key); } /* This is called when a thread exits, to free the thread-local data -- 2.39.2
Richard W.M. Jones
2023-Mar-09 09:12 UTC
[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
This causes a different problem. We actually hit the problem in this test: https://gitlab.com/nbdkit/libnbd/-/blob/master/tests/dlopen.c (1) The thread calls dlclose(), unmapping the library but leaving the pthread_key_t in place, which contains a pointer to free_errors_key in the unmapped library. (2) The thread exits, and glibc calls free_errors_key, causing a crash. Maybe -z nodelete really is the only answer. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Daniel P. Berrangé
2023-Mar-09 09:13 UTC
[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
On Thu, Mar 09, 2023 at 08:44:51AM +0000, Richard W.M. Jones wrote:> When a highly multi-threaded program such as nbdcopy encounters an > error, there is a race condition in the library which can cause an > assertion failure and thus a core dump: > > (1) An error occurs on one of the threads. nbdcopy calls exit(3). > > (2) In lib/errors.c, the destructor calls pthread_key_delete. > > (3) Another thread which is still running also encounters an error, > and inside libnbd the library calls set_error(). > > (4) The call to set_error() calls pthread_getspecific which returns > NULL (since the key has already been destroyed in step (2)), and this > causes us to call pthread_setspecific which returns EINVAL because > glibc is able to detect invalid use of a pthread_key_t after it has > been destroyed. In any case, the error message is lost, and any > subsequent call to nbd_get_error() will return NULL. > > (5) We enter the %DEAD state, where there is an assertion that > nbd_get_error() is not NULL. This assertion is supposed to be for > checking that the code called set_error() before entering the %DEAD > state. Although we did call set_error(), the error message was lost > at step (4) and nbd_get_error() will always return NULL. This > assertion failure causes a crash. > > There aren't any good ways to fix this. I chose to leak the > pthread_key_t on the exit path. > --- > lib/errors.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/lib/errors.c b/lib/errors.c > index 8b77650ef3..6fbfaacd34 100644 > --- a/lib/errors.c > +++ b/lib/errors.c > @@ -69,7 +69,11 @@ errors_key_destroy (void) > free (last_error->error); > free (last_error); > } > - pthread_key_delete (errors_key); > + > + /* We could do this, but that causes a race condition described here: > + * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html > + */ > + //pthread_key_delete (errors_key); > }While I think this fixes the problem with pthread_setspecific crashing, I believe this then opens apps up to the problem hit by libvirt with destructors being run which don't exist in memory. Consider a thread is running that has done something with libnbd which caused allocate_last_error_on_demand() to run, which populates the 'errors_key' data for that thread. This thread is no longer doing anything with libnbd, but the 'errors_key' data remains populated none the less because that's just how thread locals work. Now nothing at all is using libnbd.so, so some thread calls dlclose(), causing libnbd.so to be unmapped from memory. Since we've not called pthread_key_delete, the thread local still exists and, crucially, so does its registered destructor function. The thread mentioned in the previous paragraph now terminates and this causes the destructor function 'free_errors_key' to be run.... except this function isn't mapped in memory any more. This will cause the program to crash. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Possibly Parallel Threads
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code