Eric Blake
2023-Feb-23 20:40 UTC
[Libguestfs] [nbdkit PATCH] server: Don't assert on send if client hangs up early
libnbd's copy/copy-nbd-error.sh was triggering an assertion failure in
nbdkit:
$ nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M
error-pread-rate=0.5 ] null:
...
nbdkit: pattern.2: debug: error-inject: pread count=262144 offset=4718592
nbdkit: pattern.2: debug: pattern: pread count=262144 offset=4718592
nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4456448
nbdkit: pattern.1: error: injecting EIO error into pread
nbdkit: pattern.1: debug: sending error reply: Input/output error
nbdkit: pattern.3: debug: error-inject: pread count=262144 offset=4980736
nbdkit: pattern.3: debug: pattern: pread count=262144 offset=4980736
nbdkit: pattern.4: error: write data: NBD_CMD_READ: Broken pipe
nbdkit: pattern.4: debug: exiting worker thread pattern.4
nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed.
When there are multiple queued requests, and the client hangs up
abruptly as soon as the first error response is seen (rather than
waiting until all other queued responses are flushed then sending
NBD_CMD_DISC), another thread that has a queued response ready to send
sees EPIPE (the client is no longer reading) and closes the socket,
then the third thread triggers the assertion failure because it not
expecting to attempt a send to a closed socket. Thus, my claim that
sock would always be non-negative after collecting data from the
plugin was wrong. The fix is to gracefully handle an abrupt client
disconnect, by not attempting to send on an already-closed socket.
For the nbdcopy example, --exit-with-parent means it's just an
annoyance (nbdcopy has already exited, and no further client will be
connecting); but for a longer-running nbdkit server accepting parallel
clients, it means any one client can trigger the SIGABRT by
intentionally queueing multiple NBD_CMD_READ then disconnecting early,
and thereby kill nbdkit and starve other clients. Whether it rises to
the level of CVE depends on whether you consider one client being able
to starve others a privilege escalation (if you are not using TLS,
there are other ways for a bad client to starve peers; if you are
using TLS, then the starved client has the same credentials as the
client that caused the SIGABRT so there is no privilege boundary
escalation).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2173054
Fixes: daef505e ("server: Give client EOF when we are done writing",
v1.32.4)
---
server/connections.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/server/connections.c b/server/connections.c
index 4d776f2a..1b63eb73 100644
--- a/server/connections.c
+++ b/server/connections.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2013-2022 Red Hat Inc.
+ * Copyright (C) 2013-2023 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -400,7 +400,10 @@ raw_send_socket (const void *vbuf, size_t len, int flags)
ssize_t r;
int f = 0;
- assert (sock >= 0);
+ if (sock < 0) {
+ errno = EBADF;
+ return -1;
+ }
#ifdef MSG_MORE
if (flags & SEND_MORE)
f |= MSG_MORE;
--
2.39.2
Laszlo Ersek
2023-Feb-24 05:41 UTC
[Libguestfs] [nbdkit PATCH] server: Don't assert on send if client hangs up early
On 2/23/23 21:40, Eric Blake wrote:> libnbd's copy/copy-nbd-error.sh was triggering an assertion failure in > nbdkit: > > $ nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M error-pread-rate=0.5 ] null: > ... > nbdkit: pattern.2: debug: error-inject: pread count=262144 offset=4718592 > nbdkit: pattern.2: debug: pattern: pread count=262144 offset=4718592 > nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4456448 > nbdkit: pattern.1: error: injecting EIO error into pread > nbdkit: pattern.1: debug: sending error reply: Input/output error > nbdkit: pattern.3: debug: error-inject: pread count=262144 offset=4980736 > nbdkit: pattern.3: debug: pattern: pread count=262144 offset=4980736 > nbdkit: pattern.4: error: write data: NBD_CMD_READ: Broken pipe > nbdkit: pattern.4: debug: exiting worker thread pattern.4 > nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed. > > When there are multiple queued requests, and the client hangs up > abruptly as soon as the first error response is seen (rather than > waiting until all other queued responses are flushed then sending > NBD_CMD_DISC), another thread that has a queued response ready to send > sees EPIPE (the client is no longer reading) and closes the socket, > then the third thread triggers the assertion failure because it not > expecting to attempt a send to a closed socket. Thus, my claim that > sock would always be non-negative after collecting data from the > plugin was wrong. The fix is to gracefully handle an abrupt client > disconnect, by not attempting to send on an already-closed socket. > > For the nbdcopy example, --exit-with-parent means it's just an > annoyance (nbdcopy has already exited, and no further client will be > connecting); but for a longer-running nbdkit server accepting parallel > clients, it means any one client can trigger the SIGABRT by > intentionally queueing multiple NBD_CMD_READ then disconnecting early, > and thereby kill nbdkit and starve other clients. Whether it rises to > the level of CVE depends on whether you consider one client being able > to starve others a privilege escalation (if you are not using TLS, > there are other ways for a bad client to starve peers; if you are > using TLS, then the starved client has the same credentials as the > client that caused the SIGABRT so there is no privilege boundary > escalation). > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2173054 > Fixes: daef505e ("server: Give client EOF when we are done writing", v1.32.4) > --- > server/connections.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/server/connections.c b/server/connections.c > index 4d776f2a..1b63eb73 100644 > --- a/server/connections.c > +++ b/server/connections.c > @@ -1,5 +1,5 @@ > /* nbdkit > - * Copyright (C) 2013-2022 Red Hat Inc. > + * Copyright (C) 2013-2023 Red Hat Inc. > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions are > @@ -400,7 +400,10 @@ raw_send_socket (const void *vbuf, size_t len, int flags) > ssize_t r; > int f = 0; > > - assert (sock >= 0); > + if (sock < 0) { > + errno = EBADF; > + return -1; > + } > #ifdef MSG_MORE > if (flags & SEND_MORE) > f |= MSG_MORE;Is it possible that, after the second thread sees EPIPE and closes the socket, a new client connects, the original file descriptor is (effectively) repurposed, and the third thread now attempts to send on a valid, but *wrong* socket? I don't know if "multiple clients for a single nbdkit server" is a thing, but I believe that at least "multiconn" could lead to the same phenomenon. If this problem exists, then I guess the socket abstraction (?) would have to be extended with a refcount, under the protection of a mutex. Whenever sending fails (and note that EPIPE is a persistent error, so once one thread sees EPIPE, the others will keep seeing that too!), the refcount is decreased, and the one thread that reaches refcount 0 closes the socket. This is just brainstorming, could be totally disconnected from nbdkit's current behavior. Laszlo
Possibly Parallel Threads
- [nbdkit PATCH v2 0/2] Reduce network overhead with MSG_MORE/corking
- [nbdkit PATCH v2 2/2] server: Group related transmission send()s
- [PATCH libnbd] copy: Allowing copying from NBD server to NBD server.
- [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
- [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code