Richard W.M. Jones
2021-May-14 09:06 UTC
[Libguestfs] [PATCH libnbd v2 0/3] fuse: Enable multithread support
v1 was here: https://listman.redhat.com/archives/libguestfs/2021-May/msg00081.html Compared to v1, patches 1/3 and 3/3 are identical. The new version of patch 2/3 cleans up parallel support. In v1 (https://listman.redhat.com/archives/libguestfs/2021-May/msg00083.html) the thread model was ... interesting. Each FUSE thread would race poll(2) calls on the same file descriptor against each other, and then when one succeeded it would grab a global lock, rerun poll on the file descriptor (to get the true picture of the fd's state), and run the NBD state machine within that global lock but on the FUSE thread. In this version there is a single background thread which dispatches NBD requests on the handle. FUSE threads only start the asynch command, then block waiting for the background thread to finish off their command. Perhaps not too surprisingly v1 is quite a bit faster than v2. v2 suffers from the overhead of coordination between the FUSE threads and the single thread performing the NBD operations (as well as there being only a single thread doing the work). But I think v2 is more maintainable. Multi-conn would help here and also be easier to add with v2. The FIO results presented in the commit messages are stable. Rich.
Richard W.M. Jones
2021-May-14 09:06 UTC
[Libguestfs] [PATCH libnbd v2 1/3] fuse: Enable multithread support
This was much simpler than I though, merely changing fuse_loop -> fuse_loop_mt. FUSE itself will call our operations in parallel, so we have to be careful about shared state. Testing with fio shows this is about the same or a tiny bit slower than before, which is not that surprising -- we are using the synchronous API. It should be possible to improve performance either by issuing commands in parallel on the same handle (complicated) or by using multi-conn. test.fio contains: [global] filename=test.img bs=4k direct=1 rw=read [job1] [job2] [job3] [job4] Run the test: $ touch test.img $ nbdfuse test.img [ nbdkit memory 1G ] & $ fio test.fio $ fusermount3 -u test.img Without this patch: READ: bw=239MiB/s (250MB/s), 59.6MiB/s-59.7MiB/s (62.5MB/s-62.6MB/s), io=4096MiB (4295MB), run=17166-17167msec With this patch: READ: bw=237MiB/s (248MB/s), 59.2MiB/s-59.2MiB/s (62.0MB/s-62.1MB/s), io=4096MiB (4295MB), run=17303-17310msec --- fuse/nbdfuse.c | 6 +++++- fuse/operations.c | 7 +++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/fuse/nbdfuse.c b/fuse/nbdfuse.c index 3deef75..fa35080 100644 --- a/fuse/nbdfuse.c +++ b/fuse/nbdfuse.c @@ -151,6 +151,7 @@ main (int argc, char *argv[]) int64_t ssize; const char *s; struct fuse_args fuse_args = FUSE_ARGS_INIT (0, NULL); + struct fuse_loop_config fuse_loop_config; FILE *fp; for (;;) { @@ -475,7 +476,10 @@ main (int argc, char *argv[]) } /* Enter the main loop. */ - r = fuse_loop (fuse); + memset (&fuse_loop_config, 0, sizeof fuse_loop_config); + fuse_loop_config.clone_fd = 0; + fuse_loop_config.max_idle_threads = 10; + r = fuse_loop_mt (fuse, &fuse_loop_config); if (r < 0) { errno = -r; perror ("fuse_loop"); diff --git a/fuse/operations.c b/fuse/operations.c index 2c38c6e..4da701e 100644 --- a/fuse/operations.c +++ b/fuse/operations.c @@ -16,6 +16,13 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +/* FUSE operations invoked by the kernel. + * + * Note these may be called in parallel from multiple threads, so any + * shared state needs to be read-only or else protected by mutexes. + * libnbd calls are OK. + */ + #include <config.h> #include <stdio.h> -- 2.31.1
Richard W.M. Jones
2021-May-14 09:06 UTC
[Libguestfs] [PATCH libnbd v2 2/3] fuse: Issue commands in parallel
On top of the previous commit which enabled multithreading but continued to use the synchronous libnbd API, this allows each thread to issue commands asynchronously. Because there is still a single handle, this introduces a single background thread to poll the file descriptor and dispatch the commands. This is only a little bit faster (compare to results in previous commit message): READ: bw=250MiB/s (262MB/s), 62.4MiB/s-62.4MiB/s (65.4MB/s-65.5MB/s), io=4096MiB (4295MB), run=16398-16411msec A future multi-conn version of nbdfuse would likely use multiple background threads (one per connection) to do the same job, but that is left for future work. --- fuse/nbdfuse.c | 5 ++ fuse/nbdfuse.h | 1 + fuse/operations.c | 181 +++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 169 insertions(+), 18 deletions(-) diff --git a/fuse/nbdfuse.c b/fuse/nbdfuse.c index fa35080..f91ff7f 100644 --- a/fuse/nbdfuse.c +++ b/fuse/nbdfuse.c @@ -426,6 +426,11 @@ main (int argc, char *argv[]) if (nbd_is_read_only (nbd) > 0) readonly = true; + /* Create the background thread which is used to dispatch NBD + * operations. + */ + start_operations_thread (); + /* This is just used to give an unchanging time when they stat in * the mountpoint. */ diff --git a/fuse/nbdfuse.h b/fuse/nbdfuse.h index 1f8f703..016c325 100644 --- a/fuse/nbdfuse.h +++ b/fuse/nbdfuse.h @@ -36,5 +36,6 @@ extern char *filename; extern uint64_t size; extern struct fuse_operations nbdfuse_operations; +extern void start_operations_thread (void); #endif /* LIBNBD_NBDFUSE_H */ diff --git a/fuse/operations.c b/fuse/operations.c index 4da701e..1e81593 100644 --- a/fuse/operations.c +++ b/fuse/operations.c @@ -39,6 +39,7 @@ #include <assert.h> #include <sys/types.h> #include <sys/stat.h> +#include <pthread.h> #include <libnbd.h> @@ -47,14 +48,90 @@ #define MAX_REQUEST_SIZE (32 * 1024 * 1024) -/* Wraps calls to libnbd functions and automatically checks for error, - * returning errors in the format required by FUSE. It also prints - * out the full error message on stderr, so that we don't lose it. +/* Number of seconds to wait for commands to complete when closing the file. */ +#define RELEASE_TIMEOUT 5 + +/* This operations background thread runs while nbdfuse is running and + * is responsible for dispatching AIO commands. + * + * The commands themselves are initiated by the FUSE threads (by + * calling eg. nbd_aio_pread), and then those threads call + * wait_for_completion() which waits for the command to retire. + * + * A condition variable is signalled by any FUSE thread when it has + * started a new AIO command and wants the operations thread to start + * processing (if it isn't doing so already). To signal completion we + * use a completion callback which signals a per-thread completion + * condition. */ -#define CHECK_NBD_ERROR(CALL) \ - do { if ((CALL) == -1) return check_nbd_error (); } while (0) +static void *operations_thread (void *); + +void +start_operations_thread (void) +{ + int err; + pthread_t t; + + err = pthread_create (&t, NULL, operations_thread, NULL); + if (err != 0) { + errno = err; + perror ("nbdfuse: pthread_create"); + exit (EXIT_FAILURE); + } +} + +static pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER; +static pthread_cond_t start_cond = PTHREAD_COND_INITIALIZER; + +struct completion { + pthread_mutex_t mutex; + pthread_cond_t cond; + bool completed; +} completion; + +static void * +operations_thread (void *arg) +{ + while (1) { + /* Sleep until a command is in flight. */ + pthread_mutex_lock (&start_mutex); + while (nbd_aio_in_flight (nbd) == 0) + pthread_cond_wait (&start_cond, &start_mutex); + pthread_mutex_unlock (&start_mutex); + + /* Dispatch work while there are commands in flight. */ + while (nbd_aio_in_flight (nbd) > 0) + nbd_poll (nbd, -1); + } + + /*NOTREACHED*/ + return NULL; +} + +/* Completion callback - called from the operations thread when a + * command completes. + */ +static int +completion_callback (void *vp, int *error) +{ + struct completion *completion = vp; + + /* Mark the command as completed. */ + completion->completed = true; + + pthread_mutex_lock (&completion->mutex); + pthread_cond_signal (&completion->cond); + pthread_mutex_unlock (&completion->mutex); + + /* Don't retire the command. We want to get the error indication in + * the FUSE thread. + */ + return 0; +} + +/* Report an NBD error and return -errno. */ static int -check_nbd_error (void) +report_nbd_error (void) { int err; @@ -66,6 +143,55 @@ check_nbd_error (void) return -EIO; } +static int +wait_for_completion (struct completion *completion, int64_t cookie) +{ + int r; + + /* Signal to the operations thread to start work, in case it is sleeping. */ + pthread_mutex_lock (&start_mutex); + pthread_cond_signal (&start_cond); + pthread_mutex_unlock (&start_mutex); + + /* Wait until the completion_callback sets the completed flag. + * + * We cannot call nbd_aio_command_completed yet because that can + * lead to a possible deadlock where completion_callback holds the + * NBD handle lock and we try to acquire it by calling + * nbd_aio_command_completed. That is the reason for the + * completion.completed flag. + */ + pthread_mutex_lock (&completion->mutex); + while (!completion->completed) + pthread_cond_wait (&completion->cond, &completion->mutex); + pthread_mutex_unlock (&completion->mutex); + + /* nbd_aio_command_completed returns: + * 0 => command still in flight (should be impossible) + * 1 => completed successfully + * -1 => error + */ + r = nbd_aio_command_completed (nbd, cookie); + assert (r != 0); + return r; +} + +/* Wrap calls to any asynch command and check the error. */ +#define CHECK_NBD_ASYNC_ERROR(CALL) \ + do { \ + struct completion completion = \ + { PTHREAD_MUTEX_INITIALIZER, PTHREAD_COND_INITIALIZER, false }; \ + nbd_completion_callback cb = \ + { .callback = completion_callback, .user_data = &completion }; \ + int64_t cookie = (CALL); \ + if (cookie == -1 || wait_for_completion (&completion, cookie) == -1) \ + return report_nbd_error (); \ + } while (0) + +/* Wraps calls to sync libnbd functions and check the error. */ +#define CHECK_NBD_SYNC_ERROR(CALL) \ + do { if ((CALL) == -1) return report_nbd_error (); } while (0) + static int nbdfuse_getattr (const char *path, struct stat *statbuf, struct fuse_file_info *fi) @@ -150,7 +276,7 @@ nbdfuse_read (const char *path, char *buf, if (offset + count > size) count = size - offset; - CHECK_NBD_ERROR (nbd_pread (nbd, buf, count, offset, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_pread (nbd, buf, count, offset, cb, 0)); return (int) count; } @@ -176,7 +302,7 @@ nbdfuse_write (const char *path, const char *buf, if (offset + count > size) count = size - offset; - CHECK_NBD_ERROR (nbd_pwrite (nbd, buf, count, offset, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_pwrite (nbd, buf, count, offset, cb, 0)); return (int) count; } @@ -191,21 +317,40 @@ nbdfuse_fsync (const char *path, int datasync, struct fuse_file_info *fi) * silently ignored. */ if (nbd_can_flush (nbd)) - CHECK_NBD_ERROR (nbd_flush (nbd, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_flush (nbd, cb, 0)); return 0; } -/* This is called on the last close of a file. We do a flush here to - * be on the safe side, but it's not strictly necessary. - */ +/* This is called on the last close of a file. */ static int nbdfuse_release (const char *path, struct fuse_file_info *fi) { - if (readonly) - return 0; + time_t st; - return nbdfuse_fsync (path, 0, fi); + /* We do a synchronous flush here to be on the safe side, but it's + * not strictly necessary. + */ + if (!readonly && nbd_can_flush (nbd)) + CHECK_NBD_SYNC_ERROR (nbd_flush (nbd, 0)); + + /* Wait until there are no more commands in flight or until a + * timeout is reached. + */ + time (&st); + while (1) { + if (nbd_aio_in_flight (nbd) == 0) + break; + if (time (NULL) - st > RELEASE_TIMEOUT) + break; + + /* Signal to the operations thread to work. */ + pthread_mutex_lock (&start_mutex); + pthread_cond_signal (&start_cond); + pthread_mutex_unlock (&start_mutex); + } + + return 0; } /* Punch a hole or write zeros. */ @@ -220,7 +365,7 @@ nbdfuse_fallocate (const char *path, int mode, off_t offset, off_t len, if (!nbd_can_trim (nbd)) return -EOPNOTSUPP; /* Trim not supported. */ else { - CHECK_NBD_ERROR (nbd_trim (nbd, len, offset, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_trim (nbd, len, offset, cb, 0)); return 0; } } @@ -236,13 +381,13 @@ nbdfuse_fallocate (const char *path, int mode, off_t offset, off_t len, while (len > 0) { off_t n = MIN (len, sizeof zerobuf); - CHECK_NBD_ERROR (nbd_pwrite (nbd, zerobuf, n, offset, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_pwrite (nbd, zerobuf, n, offset, cb, 0)); len -= n; } return 0; } else { - CHECK_NBD_ERROR (nbd_zero (nbd, len, offset, 0)); + CHECK_NBD_ASYNC_ERROR (nbd_aio_zero (nbd, len, offset, cb, 0)); return 0; } } -- 2.31.1
Richard W.M. Jones
2021-May-14 09:06 UTC
[Libguestfs] [PATCH libnbd v2 3/3] fuse: Update items in TODO file
--- TODO | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/TODO b/TODO index 510c219..65f68b9 100644 --- a/TODO +++ b/TODO @@ -38,10 +38,8 @@ nbdcopy: nbdfuse: - If you write beyond the end of the virtual file, it returns EIO. - - Implement trim/discard. - - Implement write_zeroes. - Implement block_status. - - Could be made multithreaded for improved performance. + - Improve multithread performance by using multi-conn. nbdinfo: - 'nbdinfo --map' needs --offset and --length parameters for mapping -- 2.31.1