There's enough here to need a review; some of it probably needs backporting to stable-1.12. This probably breaks tests on Haiku or other platforms that have not been as on-the-ball about atomic CLOEXEC; feel free to report issues that arise, and I'll help come up with workarounds (even if we end up leaving a rare fd leak on less-capable systems). Meanwhile, I'm still working on my promise to add an nbdkit_nanosleep for use in the delay and stat filters, and which makes nbdkit more responsive to ^C where it currently waits for a full delay to expire before exiting. (Nothing like finding several other pre-requisite bugs to fix first before getting to my real goal...) Eric Blake (8): rate: Pass through delay failures server: Don't leave uninit variable on failure server: Add test for nbdkit_read_password Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC." cow, cache: Better mkostemp fallback server: Atomically set CLOEXEC on all fds filters: Set CLOEXEC on files opened during .config rate: Atomically set CLOEXEC on fds server/internal.h | 8 ---- filters/cache/blk.c | 19 ++++++++- filters/cow/blk.c | 19 ++++++++- filters/log/log.c | 20 ++++++++- filters/rate/rate.c | 26 ++++++++---- filters/stats/stats.c | 18 +++++++- filters/xz/xzfile.c | 4 -- plugins/example2/example2.c | 4 -- plugins/file/file.c | 4 -- plugins/streaming/streaming.c | 4 -- server/connections.c | 1 + server/quit.c | 5 ++- server/sockets.c | 7 ++-- server/test-utils.c | 79 +++++++++++++++++++++++++++++++++-- server/utils.c | 10 ++++- 15 files changed, 182 insertions(+), 46 deletions(-) -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 1/8] rate: Pass through delay failures
If nanosleep fails (typically with EINTR because we handled a signal), we end up not sleeping long enough. In practice, this patch seldom makes a difference: directing signals at the nbdkit process tends to get serviced by the thread running poll(), and when that happens, the thread blocked in nanosleep() continues uninterrupted. As a result, nbdkit stalls in exiting because it is waiting for the filter plugin to quit servicing commands. But if you get lucky (or if you direct the kill to the specific thread stuck in nanosleep rather than the process as a whole), then this patch causes us to report early failure with EINTR mapped to EIO over the wire, at which point, we are back to practice: the only signals we handle trigger nbdkit to shutdown, so we may not even have a chance to get our unusual error reported over the wire. In theory, we could resume nanosleep where it left off by passing a non-null second argument, but that only matters if we have a signal handler whose action doesn't intend to request nbdkit to shut down soon. A later patch will add a utility function so that the server can do a better job, both for preventing short sleeps by resuming on EINTR (assuming the server ever adds support for a non-fatal signal, whether that be support for SIGHUP re-reading configuration or support for SIGUSR1/SIGINFO in providing server statistics so far), and for terminating sleeps with a better error than EINTR the moment we know a particular client is shutting down (whether because a signal is terminating all of nbdkit, or we got NBD_CMD_DISC or detected EOF on this particular connection). At that point, we'll only have to tweak the delay function to use that utility function in place of nanosleep. Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/rate/rate.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/filters/rate/rate.c b/filters/rate/rate.c index 30e093ee..f8dda0b0 100644 --- a/filters/rate/rate.c +++ b/filters/rate/rate.c @@ -228,8 +228,9 @@ maybe_adjust (const char *file, struct bucket *bucket, pthread_mutex_t *lock) old_rate, new_rate); } -static inline void -maybe_sleep (struct bucket *bucket, pthread_mutex_t *lock, uint32_t count) +static inline int +maybe_sleep (struct bucket *bucket, pthread_mutex_t *lock, uint32_t count, + int *err) { struct timespec ts; uint64_t bits; @@ -248,8 +249,13 @@ maybe_sleep (struct bucket *bucket, pthread_mutex_t *lock, uint32_t count) } if (bits > 0) - nanosleep (&ts, NULL); + if (nanosleep (&ts, NULL) == -1) { + nbdkit_error ("nanosleep: %m"); + *err = errno; + return -1; + } } + return 0; } /* Read data. */ @@ -261,9 +267,11 @@ rate_pread (struct nbdkit_next_ops *next_ops, void *nxdata, struct rate_handle *h = handle; maybe_adjust (rate_file, &read_bucket, &read_bucket_lock); - maybe_sleep (&read_bucket, &read_bucket_lock, count); + if (maybe_sleep (&read_bucket, &read_bucket_lock, count, err)) + return -1; maybe_adjust (connection_rate_file, &h->read_bucket, &h->read_bucket_lock); - maybe_sleep (&h->read_bucket, &h->read_bucket_lock, count); + if (maybe_sleep (&h->read_bucket, &h->read_bucket_lock, count, err)) + return -1; return next_ops->pread (nxdata, buf, count, offset, flags, err); } @@ -278,9 +286,11 @@ rate_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, struct rate_handle *h = handle; maybe_adjust (rate_file, &write_bucket, &write_bucket_lock); - maybe_sleep (&write_bucket, &write_bucket_lock, count); + if (maybe_sleep (&write_bucket, &write_bucket_lock, count, err)) + return -1; maybe_adjust (connection_rate_file, &h->write_bucket, &h->write_bucket_lock); - maybe_sleep (&h->write_bucket, &h->write_bucket_lock, count); + if (maybe_sleep (&h->write_bucket, &h->write_bucket_lock, count, err)) + return -1; return next_ops->pwrite (nxdata, buf, count, offset, flags, err); } -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 2/8] server: Don't leave uninit variable on failure
It is not good to leak uninitialized variables back to the user on failure paths. Signed-off-by: Eric Blake <eblake@redhat.com> --- server/utils.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/server/utils.c b/server/utils.c index 57910f07..5a2d471a 100644 --- a/server/utils.c +++ b/server/utils.c @@ -201,6 +201,8 @@ nbdkit_read_password (const char *value, char **password) size_t n; FILE *fp; + *password = NULL; + /* Read from stdin. */ if (strcmp (value, "-") == 0) { printf ("password: "); -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 3/8] server: Add test for nbdkit_read_password
Testing interactive passwords is difficult (we'd have to spawn a pty), and left for another day. But reading passwords from a file or as a direct argument is easy. Signed-off-by: Eric Blake <eblake@redhat.com> --- server/test-utils.c | 79 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 3 deletions(-) diff --git a/server/test-utils.c b/server/test-utils.c index d6bdf465..264b1793 100644 --- a/server/test-utils.c +++ b/server/test-utils.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -35,6 +35,8 @@ #include <stdio.h> #include <inttypes.h> #include <stdbool.h> +#include <string.h> +#include <unistd.h> #include "internal.h" @@ -137,11 +139,82 @@ test_nbdkit_parse_size (void) return pass; } +static bool +test_nbdkit_read_password (void) +{ + bool pass = true; + char template[] = "+/tmp/nbdkit_testpw_XXXXXX"; + char *pw = template; + int fd; + + /* Test expected failure - no such file */ + error_flagged = false; + if (nbdkit_read_password ("+/nosuch", &pw) != -1) { + fprintf (stderr, "Failed to diagnose failed password file\n"); + pass = false; + } + else if (pw != NULL) { + fprintf (stderr, "Failed to set password to NULL on failure\n"); + pass = false; + } + else if (!error_flagged) { + fprintf (stderr, "Wrong error message handling\n"); + pass = false; + } + error_flagged = false; + + /* Test direct password */ + if (nbdkit_read_password ("abc", &pw) != 0) { + fprintf (stderr, "Failed to reuse direct password\n"); + pass = false; + } + else if (strcmp (pw, "abc") != 0) { + fprintf (stderr, "Wrong direct password, expected 'abc' got '%s'\n", pw); + pass = false; + } + free (pw); + pw = NULL; + + /* Test reading password from file */ + fd = mkstemp (&template[1]); + if (fd < 0) { + perror ("mkstemp"); + pass = false; + } + else if (write (fd, "abc\n", 4) != 4) { + fprintf (stderr, "Failed to write to file %s\n", &template[1]); + pass = false; + } + else if (nbdkit_read_password (template, &pw) != 0) { + fprintf (stderr, "Failed to read password from file %s\n", &template[1]); + pass = false; + } + else if (strcmp (pw, "abc") != 0) { + fprintf (stderr, "Wrong file password, expected 'abc' got '%s'\n", pw); + pass = false; + } + free (pw); + + if (fd >= 0) { + close (fd); + unlink (&template[1]); + } + + if (error_flagged) { + fprintf (stderr, "Wrong error message handling\n"); + pass = false; + } + + /* XXX Testing reading from stdin would require setting up a pty */ + return pass; +} + int main (int argc, char *argv[]) { bool pass = true; pass &= test_nbdkit_parse_size (); - /* XXX add tests for nbdkit_absolute_path, nbdkit_read_password */ - return pass ? 0 : 1; + pass &= test_nbdkit_read_password (); + /* XXX add tests for nbdkit_absolute_path */ + return pass ? EXIT_SUCCESS : EXIT_FAILURE; } -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
This reverts commit 25206df20275aeff346d9b86adf5e9be99cc9e43. An upcoming patch wants to ensure no leaked fds from the server to a child process. POSIX has required O_CLOEXEC since 2008, and although current POSIX doesn't yet specify full atomic interfaces everywhere such as SOCK_CLOEXEC, it does have an open bug since 2014 [1] recommending the full set of interfaces that will be mandatory in the next revision of POSIX. Most modern OS have caught up to that (RHEL 6 and FreeBSD 10 support SOCK_CLOEXEC, for example), and we're doing ourselves a disservice by using silent fallback instead of actively detecting with compile failure on systems that are still behind. [1] http://austingroupbugs.net/view.php?id=411 Conflicts: filters/xz/xzfile.c - The fallback moved from plugins/xz/xzfile.c (and in fact has been dead code since commit c879d310 made it a filter) plugins/file/file.c - context changes in the meantime src/internal.h - Moved to server/internal.h --- server/internal.h | 8 -------- filters/xz/xzfile.c | 4 ---- plugins/example2/example2.c | 4 ---- plugins/file/file.c | 4 ---- plugins/streaming/streaming.c | 4 ---- 5 files changed, 24 deletions(-) diff --git a/server/internal.h b/server/internal.h index 80ab879c..6207f0cf 100644 --- a/server/internal.h +++ b/server/internal.h @@ -50,14 +50,6 @@ #define UNIX_PATH_MAX 108 #endif -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - -#ifndef SOCK_CLOEXEC -#define SOCK_CLOEXEC 0 -#endif - #if HAVE_VALGRIND # include <valgrind.h> /* http://valgrind.org/docs/manual/faq.html#faq.unhelpful */ diff --git a/filters/xz/xzfile.c b/filters/xz/xzfile.c index ebe7f29c..ee4af713 100644 --- a/filters/xz/xzfile.c +++ b/filters/xz/xzfile.c @@ -52,10 +52,6 @@ #include "xzfile.h" -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - #define XZ_HEADER_MAGIC "\xfd" "7zXZ\0" #define XZ_HEADER_MAGIC_LEN 6 #define XZ_FOOTER_MAGIC "YZ" diff --git a/plugins/example2/example2.c b/plugins/example2/example2.c index d0bce155..6adc100a 100644 --- a/plugins/example2/example2.c +++ b/plugins/example2/example2.c @@ -48,10 +48,6 @@ #include <nbdkit-plugin.h> -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - static char *filename = NULL; /* A debug flag which can be set on the command line using diff --git a/plugins/file/file.c b/plugins/file/file.c index 52d330f7..9df5001d 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -61,10 +61,6 @@ #include "cleanup.h" #include "isaligned.h" -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - #ifndef HAVE_FDATASYNC #define fdatasync fsync #endif diff --git a/plugins/streaming/streaming.c b/plugins/streaming/streaming.c index 4ca3e765..4ab73b8b 100644 --- a/plugins/streaming/streaming.c +++ b/plugins/streaming/streaming.c @@ -43,10 +43,6 @@ #include <nbdkit-plugin.h> -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - static char *filename = NULL; static int fd = -1; -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 5/8] cow, cache: Better mkostemp fallback
Haiku is seriously behind in adding support for atomic CLOEXEC; in 2018, we added fallback support since it does not yet provide mkostemp as requested by an upcoming POSIX addition [1]. However, the fallback we added is rather brain-dead - if mkstemp() fails, we blindly call fcntl(-1) (which probably changes the errno later reported against "mkostemp: %s: %m"); and although it is historically unlikely that any other bits will be set, F_SETFD should be used in a read-modify-write pattern rather than a blind overwrite pattern. The fact that our fallback is non-atomic is not a problem in practice, since we are only using this code during .load (where we are still more-or-less single-threaded), rather than while another thread may be actively inside a plugin (such as sh) using fork(). [1] http://austingroupbugs.net/view.php?id=411 Fixes: 60076fbcfd Fixes: b962272a56 Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/cache/blk.c | 19 ++++++++++++++++++- filters/cow/blk.c | 19 ++++++++++++++++++- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/filters/cache/blk.c b/filters/cache/blk.c index cf7145d8..1fa9ea43 100644 --- a/filters/cache/blk.c +++ b/filters/cache/blk.c @@ -109,8 +109,25 @@ blk_init (void) #ifdef HAVE_MKOSTEMP fd = mkostemp (template, O_CLOEXEC); #else + /* Not atomic, but this is only invoked during .load, so the race + * won't affect any plugin actions trying to fork + */ fd = mkstemp (template); - fcntl (fd, F_SETFD, FD_CLOEXEC); + if (fd >= 0) { + int f = fcntl (fd, F_GETFD); + if (f == -1) { + nbdkit_error ("fcntl F_GETFD: %s: %m", template); + close (fd); + unlink (template); + return -1; + } + if (fcntl (fd, F_SETFD, f | FD_CLOEXEC) == -1) { + nbdkit_error ("fcntl F_SETFD: %s: %m", template); + close (fd); + unlink (template); + return -1; + } + } #endif if (fd == -1) { nbdkit_error ("mkostemp: %s: %m", tmpdir); diff --git a/filters/cow/blk.c b/filters/cow/blk.c index be43f2ff..12ad7820 100644 --- a/filters/cow/blk.c +++ b/filters/cow/blk.c @@ -114,8 +114,25 @@ blk_init (void) #ifdef HAVE_MKOSTEMP fd = mkostemp (template, O_CLOEXEC); #else + /* Not atomic, but this is only invoked during .load, so the race + * won't affect any plugin actions trying to fork + */ fd = mkstemp (template); - fcntl (fd, F_SETFD, FD_CLOEXEC); + if (fd >= 0) { + int f = fcntl (fd, F_GETFD); + if (f == -1) { + nbdkit_error ("fcntl F_GETFD: %s: %m", template); + close (fd); + unlink (template); + return -1; + } + if (fcntl (fd, F_SETFD, f | FD_CLOEXEC) == -1) { + nbdkit_error ("fcntl F_SETFD: %s: %m", template); + close (fd); + unlink (template); + return -1; + } + } #endif if (fd == -1) { nbdkit_error ("mkostemp: %s: %m", tmpdir); -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 6/8] server: Atomically set CLOEXEC on all fds
The sh plugin makes it easy to test whether we leak fds into a plugin that fork()s as part of handling a client request: ./nbdkit -U - sh - --run 'qemu-io -r -f raw -c "r 0 1" $nbd' <<\EOF case $1 in get_size) echo 1m;; pread) ls -l /proc/$$/fd >/dev/tty dd iflag=count_bytes count=$3 if=/dev/zero || exit 1 ;; *) exit 2 ;; esac EOF Pre-patch, this demonstrates that we are leaking internal pipe and socket fds (the listing should only include fds 0, 1, 2, and whatever sh has open on the temporary script file). [Another patch will deal with leaks in several filters, also observable in this manner.] What's more, we need CLOEXEC to be set atomically for anything that can happen in parallel with plugin processing (in our case, during accept_connection); otherwise, there is a race window: client1 client2 fd1 = accept pread fd2 = accept fork fcntl(F_SETFD, FD_CLOEXEC) where the fd2 socket for the second client leaks into the fork for handling the first client's pread callback. Use of accept4() is not yet in POSIX, but has been documented [1] for several years now, and many implementations already support it (RHEL 6 and FreeBSD 10, for example). We can worry about a racy fallback for outliers if we hit compilation failure. It is less important whether quit.c use pipe2 or falls back to pipe/fcntl, as that code is never run in parallel with plugin code, but as long as we are assuming modern interfaces, fewer lines of code is nice. [1] http://austingroupbugs.net/view.php?id=411 For nbdkit_read_password, it is harder to test whether fopen("re") is supported on all platforms: we won't get a compilation failure. But if our unit tests start failing on platforms that have not caught up to POSIX (my first guess? Haiku, given that the previous patch shows that it also lacks mkostemp), then we can either fall back to fdopen(open(O_CLOEXEC)) or just simply document that nbdkit_read_password is only safe during .config when there are no other threads running in parallel to worry about a window where the fd can be leaked. Signed-off-by: Eric Blake <eblake@redhat.com> --- server/connections.c | 1 + server/quit.c | 5 +++-- server/sockets.c | 7 ++++--- server/utils.c | 4 ++-- 4 files changed, 10 insertions(+), 7 deletions(-) diff --git a/server/connections.c b/server/connections.c index f56590c2..b1a15f23 100644 --- a/server/connections.c +++ b/server/connections.c @@ -39,6 +39,7 @@ #include <string.h> #include <unistd.h> #include <sys/socket.h> +#include <fcntl.h> #include "internal.h" diff --git a/server/quit.c b/server/quit.c index c2ac11ef..537c4e47 100644 --- a/server/quit.c +++ b/server/quit.c @@ -37,6 +37,7 @@ #include <stdarg.h> #include <string.h> #include <unistd.h> +#include <fcntl.h> #include "internal.h" @@ -54,8 +55,8 @@ set_up_quit_pipe (void) { int fds[2]; - if (pipe (fds) < 0) { - perror ("pipe"); + if (pipe2 (fds, O_CLOEXEC) < 0) { + perror ("pipe2"); exit (EXIT_FAILURE); } quit_fd = fds[0]; diff --git a/server/sockets.c b/server/sockets.c index b25405cb..587763cf 100644 --- a/server/sockets.c +++ b/server/sockets.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -286,8 +286,9 @@ accept_connection (int listen_sock) thread_data->instance_num = instance_num++; thread_data->addrlen = sizeof thread_data->addr; again: - thread_data->sock = accept (listen_sock, - &thread_data->addr, &thread_data->addrlen); + thread_data->sock = accept4 (listen_sock, + &thread_data->addr, &thread_data->addrlen, + SOCK_CLOEXEC); if (thread_data->sock == -1) { if (errno == EINTR || errno == EAGAIN) goto again; diff --git a/server/utils.c b/server/utils.c index 5a2d471a..eabef200 100644 --- a/server/utils.c +++ b/server/utils.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -237,7 +237,7 @@ nbdkit_read_password (const char *value, char **password) /* Read password from a file. */ else if (value[0] == '+') { - fp = fopen (&value[1], "r"); + fp = fopen (&value[1], "re"); if (fp == NULL) { nbdkit_error ("open %s: %m", &value[1]); return -1; -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 7/8] filters: Set CLOEXEC on files opened during .config
Any file descriptor opened during .config must eventually be marked CLOEXEC; otherwise, if our plugin fork()s (such as the sh plugin), we leak an fd into that plugin (easy enough to observe by modifying the example in the previous patch to add --filter=log or --filter=stats). However, we need not worry about atomic CLOEXEC, as config is effectively single-threaded and before any plugin code in another thread will be attempting a fork(); put another way, we don't have to worry about whether fopen("we") is univerally supported yet. Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/log/log.c | 20 +++++++++++++++++++- filters/stats/stats.c | 18 +++++++++++++++++- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/filters/log/log.c b/filters/log/log.c index 9c0f35cd..71ea4d63 100644 --- a/filters/log/log.c +++ b/filters/log/log.c @@ -42,6 +42,7 @@ #include <pthread.h> #include <sys/time.h> #include <assert.h> +#include <fcntl.h> #include <nbdkit-filter.h> @@ -88,13 +89,30 @@ log_config (nbdkit_next_config *next, void *nxdata, static int log_config_complete (nbdkit_next_config_complete *next, void *nxdata) { + int f; + if (!logfilename) { nbdkit_error ("missing logfile= parameter for the log filter"); return -1; } + /* Using fopen("ae"/"we") would be more convenient, but as .config + * is called before other threads can be executing plugin code, + * using the older racy paradigm for portability doesn't hurt. + */ logfile = fopen (logfilename, append ? "a" : "w"); if (!logfile) { - nbdkit_error ("fopen: %m"); + nbdkit_error ("fopen: %s: %m", logfilename); + return -1; + } + f = fcntl (fileno (logfile), F_GETFD); + if (f == -1) { + nbdkit_error ("fcntl: %s: %m", logfilename); + fclose (logfile); + return -1; + } + if (fcntl (fileno (logfile), F_SETFD, f | FD_CLOEXEC) == -1) { + nbdkit_error ("fcntl: %s: %m", logfilename); + fclose (logfile); return -1; } diff --git a/filters/stats/stats.c b/filters/stats/stats.c index 9631cb39..17eb9431 100644 --- a/filters/stats/stats.c +++ b/filters/stats/stats.c @@ -39,6 +39,7 @@ #include <inttypes.h> #include <string.h> #include <sys/time.h> +#include <fcntl.h> #include <pthread.h> @@ -139,16 +140,31 @@ stats_config (nbdkit_next_config *next, void *nxdata, static int stats_config_complete (nbdkit_next_config_complete *next, void *nxdata) { + int f; + if (filename == NULL) { nbdkit_error ("stats filter requires statsfile parameter"); return -1; } + /* Using fopen("ae"/"we") would be more convenient, but as .config + * is called before other threads can be executing plugin code, + * using the older racy paradigm for portability doesn't hurt. + */ fp = fopen (filename, append ? "a" : "w"); if (fp == NULL) { - nbdkit_error ("%s: %m", filename); + nbdkit_error ("fopen: %s: %m", filename); return -1; } + f = fcntl (fileno (fp), F_GETFD); + if (f == -1) { + nbdkit_error ("fcntl: %s: %m", filename); + fclose (fp); + } + if (fcntl (fileno (fp), F_SETFD, f | FD_CLOEXEC) == -1) { + nbdkit_error ("fcntl: %s: %m", filename); + fclose (fp); + } gettimeofday (&start_t, NULL); -- 2.20.1
Eric Blake
2019-Jul-31 21:31 UTC
[Libguestfs] [nbdkit PATCH 8/8] rate: Atomically set CLOEXEC on fds
The rate filter is potentially opening fds in one thread while another thread is processing a fork() in the plugin. Although the file is not open for long, we MUST atomically use CLOEXEC to avoid fd leaks. This one is a bit harder to observe using only the sh plugin, because the window is small; you'll have better success at catching the leak by using gdb or recompiling code to insert strategic sleeps. Although fopen("re") is not required by POSIX yet, it has been documented for several years [1] and many platforms already support it. Failure to support it will not be seen at compile time, and it's hard to state whether our unit tests will catch it; thus, I decided to also add a comment to another use of fopen("re") that is much more likely to show if we hit a problematic system. [1] http://austingroupbugs.net/view.php?id=411 Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/rate/rate.c | 2 +- server/utils.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/filters/rate/rate.c b/filters/rate/rate.c index f8dda0b0..9476a27f 100644 --- a/filters/rate/rate.c +++ b/filters/rate/rate.c @@ -204,7 +204,7 @@ maybe_adjust (const char *file, struct bucket *bucket, pthread_mutex_t *lock) if (!file) return; - fp = fopen (file, "r"); + fp = fopen (file, "re"); if (fp == NULL) return; /* this is not an error */ diff --git a/server/utils.c b/server/utils.c index eabef200..d7e202ec 100644 --- a/server/utils.c +++ b/server/utils.c @@ -236,6 +236,10 @@ nbdkit_read_password (const char *value, char **password) } /* Read password from a file. */ + /* Note: If the unit test for this fails because fopen("re") is + * unsupported by a given system, you must also fix the rate filter + * to use a similar workaround. + */ else if (value[0] == '+') { fp = fopen (&value[1], "re"); if (fp == NULL) { -- 2.20.1
Eric Blake
2019-Jul-31 21:42 UTC
Re: [Libguestfs] [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
On 7/31/19 4:31 PM, Eric Blake wrote:> This reverts commit 25206df20275aeff346d9b86adf5e9be99cc9e43. > > An upcoming patch wants to ensure no leaked fds from the server to a > child process. POSIX has required O_CLOEXEC since 2008, and although > current POSIX doesn't yet specify full atomic interfaces everywhere > such as SOCK_CLOEXEC, it does have an open bug since 2014 [1] > recommending the full set of interfaces that will be mandatory in the > next revision of POSIX. Most modern OS have caught up to that (RHEL 6 > and FreeBSD 10 support SOCK_CLOEXEC, for example), and we're doing > ourselves a disservice by using silent fallback instead of actively > detecting with compile failure on systems that are still behind. > > [1] http://austingroupbugs.net/view.php?id=411 > > Conflicts: > filters/xz/xzfile.c - The fallback moved from plugins/xz/xzfile.c (and > in fact has been dead code since commit c879d310 made it a filter) > plugins/file/file.c - context changes in the meantime > src/internal.h - Moved to server/internal.h > ---I plan to also squash in this: (not in the original 25206df2 in 1.1.13, but instead added by copy-and-paste in 1.1.20): diff --git i/plugins/split/split.c w/plugins/split/split.c index 1b8e69a2..68682b29 100644 --- i/plugins/split/split.c +++ w/plugins/split/split.c @@ -45,10 +45,6 @@ #include <nbdkit-plugin.h> -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 -#endif - /* The files. */ static char **filenames = NULL; static size_t nr_files = 0; -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jul-31 22:01 UTC
Re: [Libguestfs] [nbdkit PATCH 8/8] rate: Atomically set CLOEXEC on fds
On 7/31/19 4:31 PM, Eric Blake wrote:> The rate filter is potentially opening fds in one thread while another > thread is processing a fork() in the plugin. Although the file is not > open for long, we MUST atomically use CLOEXEC to avoid fd leaks. This > one is a bit harder to observe using only the sh plugin, because the > window is small; you'll have better success at catching the leak by > using gdb or recompiling code to insert strategic sleeps.In fact, I have to tweak this commit message: you CAN'T observe this one with the sh plugin unless you recompile it to use #define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS, as well as introducing the timing hacks mentioned above (that's because with our current SERIALIZE_ALL_REQUESTS, there is never more than one thread in filter/plugin code at a time). But it does raise an interesting point - if we hit platforms that are unable to support atomic CLOEXEC, one possibility is a patch that forces SERIALIZE_ALL_REQUESTS as the maximum parallelism allowed on that platform (while remaining at our goal of PARALLEL on more competent systems) - once we do that, the lacking systems will be serialized to the point that there is no race window where one thread can fork() while another is obtaining an fd. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jul-31 22:08 UTC
[Libguestfs] [nbdkit PATCH 9/8] sh: Document CLOEXEC considerations
As long as we have SERIALIZE_ALL_REQUESTS, we don't have to worry about one thread processing a fork() while another thread might be opening an fd. But add comments to the code to remind us to fix things if we ever decide to add more parallelism. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/sh/call.c | 4 ++++ plugins/sh/sh.c | 3 +++ 2 files changed, 7 insertions(+) diff --git a/plugins/sh/call.c b/plugins/sh/call.c index 871de5c6..da2651d4 100644 --- a/plugins/sh/call.c +++ b/plugins/sh/call.c @@ -94,6 +94,10 @@ call3 (const char *wbuf, size_t wbuflen, /* sent to stdin */ *rbuflen = *ebuflen = 0; rbufalloc = ebufalloc = 0; + /* As long as we use NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS, we + * don't have to worry about CLOEXEC, because we know no other + * thread is competing to fork at the same time as this one. + */ if (pipe (in_fd) == -1) { nbdkit_error ("%s: pipe: %m", script); goto error; diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c index aeb01918..1e000b11 100644 --- a/plugins/sh/sh.c +++ b/plugins/sh/sh.c @@ -257,6 +257,9 @@ sh_config_complete (void) } } +/* Do not change this to be more parallel without first fixing + * potential CLOEXEC races in call.c. + */ #define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS static int -- 2.20.1
Richard W.M. Jones
2019-Aug-01 09:06 UTC
Re: [Libguestfs] [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
On Wed, Jul 31, 2019 at 04:31:32PM -0500, Eric Blake wrote:> This reverts commit 25206df20275aeff346d9b86adf5e9be99cc9e43. > > An upcoming patch wants to ensure no leaked fds from the server to a > child process. POSIX has required O_CLOEXEC since 2008, and although > current POSIX doesn't yet specify full atomic interfaces everywhere > such as SOCK_CLOEXEC, it does have an open bug since 2014 [1] > recommending the full set of interfaces that will be mandatory in the > next revision of POSIX. Most modern OS have caught up to that (RHEL 6 > and FreeBSD 10 support SOCK_CLOEXEC, for example), and we're doing > ourselves a disservice by using silent fallback instead of actively > detecting with compile failure on systems that are still behind. > > [1] http://austingroupbugs.net/view.php?id=411 > > Conflicts: > filters/xz/xzfile.c - The fallback moved from plugins/xz/xzfile.c (and > in fact has been dead code since commit c879d310 made it a filter) > plugins/file/file.c - context changes in the meantime > src/internal.h - Moved to server/internal.h > --- > server/internal.h | 8 -------- > filters/xz/xzfile.c | 4 ---- > plugins/example2/example2.c | 4 ---- > plugins/file/file.c | 4 ---- > plugins/streaming/streaming.c | 4 ---- > 5 files changed, 24 deletions(-) > > diff --git a/server/internal.h b/server/internal.h > index 80ab879c..6207f0cf 100644 > --- a/server/internal.h > +++ b/server/internal.h > @@ -50,14 +50,6 @@ > #define UNIX_PATH_MAX 108 > #endif > > -#ifndef O_CLOEXEC > -#define O_CLOEXEC 0 > -#endif > - > -#ifndef SOCK_CLOEXEC > -#define SOCK_CLOEXEC 0 > -#endifAs far as I can see Haiku (hrev52698) has O_CLOEXEC but NOT SOCK_CLOEXEC. As this version is a little old I'll do an update and see if newer versions support it. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Richard W.M. Jones
2019-Aug-01 09:16 UTC
Re: [Libguestfs] [nbdkit PATCH 9/8] sh: Document CLOEXEC considerations
This patch series is fine, ACK. Unfortunately it likely breaks Haiku support. I'd like to hear from the Haiku folk if they are planning support for atomic CLOEXEC (SOCK_CLOEXEC, accept4, pipe2, etc.) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Apparently Analagous Threads
- [nbdkit PATCH v2 04/17] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
- Re: [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
- Re: [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
- Re: [nbdkit PATCH 4/8] Revert "RHEL 5: Define O_CLOEXEC and SOCK_CLOEXEC."
- [nbdkit PATCH 0/8] fd leak safety