ThreadSanitizer [1] pointed out that in the nbd plugin, nbd_close() can attempt close() in the main thread while the worker thread is still attempting to start a read(). Normally, if the read() loses the race, it will get a harmless EBADF that exits the worker thread (which is what we want, as we are closing the connection anyway); but if another connection happens to start in that window, we could end up read()ing from the fd opened by the new connection, with disastrous results on the second connection. [1] ./configure CXFLAGS=-fsanitize=thread LDFLAGS=-fsanitize=thread Commits c70616f8 and 430f8141 tried to clean up deadlock during shutdown, but missed that without some sort of locking, a close-before-read was still possible. Swap lines so that pthread_join() now serves as the locking to ensure close is not attempted while another thread may be about to use the fd. Thanks: Richard W.M. Jones Signed-off-by: Eric Blake <eblake@redhat.com> --- It took me a while to decipher how ThreadSanitizer actually tests this race, which gets reported as a Write guarded by a mutex [caused by close()] racing with an earlier Read [caused by read()]. I finally realized that linking with libtsan installs wrappers around the syscalls for read(), close(), etc. where the wrappers create an underlying mutex and read/write operations on sentinel memory, so that it can then reuse its memory race analysis it has for more typical data races. The wrappers thus cause odd-looking reports for fd races (the report ends up claiming that Thread 1 performing close() lost a race to Thread 2 performing read() - even though the ACTUAL data race is only a bug when Thread 2 loses the race and read()s on an fd close()d by Thread 1 and possibly reused by Thread 3 in the meantime). plugins/nbd/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index b9a4523..9130642 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -575,9 +575,9 @@ nbd_close (void *handle) nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); shutdown (h->fd, SHUT_WR); } - close (h->fd); if ((errno = pthread_join (h->reader, NULL))) nbdkit_debug ("failed to join reader thread: %m"); + close (h->fd); pthread_mutex_destroy (&h->write_lock); pthread_mutex_destroy (&h->trans_lock); free (h); -- 2.17.2
Richard W.M. Jones
2018-Nov-08 10:50 UTC
Re: [Libguestfs] [nbdkit PATCH] nbd: Fix race during close
On Wed, Nov 07, 2018 at 08:22:30PM -0600, Eric Blake wrote:> ThreadSanitizer [1] pointed out that in the nbd plugin, nbd_close() can > attempt close() in the main thread while the worker thread is still > attempting to start a read(). Normally, if the read() loses the race, > it will get a harmless EBADF that exits the worker thread (which is what > we want, as we are closing the connection anyway); but if another > connection happens to start in that window, we could end up read()ing > from the fd opened by the new connection, with disastrous results on the > second connection. > > [1] ./configure CXFLAGS=-fsanitize=thread LDFLAGS=-fsanitize=thread > > Commits c70616f8 and 430f8141 tried to clean up deadlock during > shutdown, but missed that without some sort of locking, a > close-before-read was still possible. Swap lines so that pthread_join() > now serves as the locking to ensure close is not attempted while > another thread may be about to use the fd. > > Thanks: Richard W.M. Jones > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > It took me a while to decipher how ThreadSanitizer actually tests > this race, which gets reported as a Write guarded by a mutex [caused > by close()] racing with an earlier Read [caused by read()]. I > finally realized that linking with libtsan installs wrappers around > the syscalls for read(), close(), etc. where the wrappers create an > underlying mutex and read/write operations on sentinel memory, so that > it can then reuse its memory race analysis it has for more typical > data races. The wrappers thus cause odd-looking reports for fd races > (the report ends up claiming that Thread 1 performing close() lost a > race to Thread 2 performing read() - even though the ACTUAL data race > is only a bug when Thread 2 loses the race and read()s on an fd > close()d by Thread 1 and possibly reused by Thread 3 in the meantime). > > plugins/nbd/nbd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c > index b9a4523..9130642 100644 > --- a/plugins/nbd/nbd.c > +++ b/plugins/nbd/nbd.c > @@ -575,9 +575,9 @@ nbd_close (void *handle) > nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); > shutdown (h->fd, SHUT_WR); > } > - close (h->fd); > if ((errno = pthread_join (h->reader, NULL))) > nbdkit_debug ("failed to join reader thread: %m"); > + close (h->fd); > pthread_mutex_destroy (&h->write_lock); > pthread_mutex_destroy (&h->trans_lock); > free (h); > --Well spotted :-) Thanks. ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Possibly Parallel Threads
- [nbdkit PATCH] Experiment: nbd: Use ppoll() instead of pipe-to-self
- [nbdkit PATCH v2 0/4] enable parallel nbd forwarding
- [nbdkit PATCH v2 2/2] nbd: Split reading into separate thread
- [nbdkit PATCH] nbd: Drop nbd-standalone fallback
- [nbdkit PATCH 2/2] nbd: Use nbdkit aio_*_notify variants