Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 0/5] Use shared context to make ext2 filter parallel
I've been working on this one for a while, and finally got it working well enough to post. I'm probably still going to work on adding another unit test proving that you can open multiple NBD clients in parallel, and that they can interleave I/O, but my immediate test was setting up: nbdkit -f --filter=ext2 --filter=exportname file tests/ext2.img \ ext2file=exportname exportname-list=explicit exportname=hidden then running two simultaneous nbdsh, one visiting nbd://localhost:10809/manifest and the other nbd://localhost:10809/disks/disk.img. Pre-series, the second nbdsh stalls until the first exits, post-series, both can read their respective file out of the overall image at the same time. Eric Blake (5): filters: Tighter rules on .get_ready/.after_fork threadlocal: Add way to store current context filters: Allow nbdkit_next_context_open outside client connection api: Add .cleanup callback ext2: Support parallel requests and connections docs/nbdkit-filter.pod | 65 ++++++-- docs/nbdkit-plugin.pod | 33 +++- include/nbdkit-filter.h | 10 +- include/nbdkit-plugin.h | 4 +- server/internal.h | 14 +- server/backend.c | 48 +++++- server/filters.c | 49 +++--- server/main.c | 1 + server/plugins.c | 32 ++-- server/protocol-handshake.c | 2 +- server/public.c | 24 ++- server/test-public.c | 12 +- server/threadlocal.c | 36 ++++- filters/exitwhen/exitwhen.c | 9 +- filters/ext2/ext2.c | 225 ++++++++++++++++------------ filters/extentlist/extentlist.c | 5 +- filters/log/log.c | 9 +- filters/multi-conn/multi-conn.c | 5 +- filters/pause/pause.c | 4 +- filters/rate/rate.c | 5 +- filters/retry/retry.c | 2 +- filters/stats/stats.c | 5 +- filters/tls-fallback/tls-fallback.c | 5 +- tests/test-layers-filter.c | 22 ++- tests/test-layers-plugin.c | 9 +- tests/test-layers.c | 42 ++++-- 26 files changed, 453 insertions(+), 224 deletions(-) -- 2.31.1
Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 1/5] filters: Tighter rules on .get_ready/.after_fork
Allowing a filter to bypass the plugin's .get_ready or .after_fork makes no sense, and there are no other arguments to modify. Furthermore, now that filters have access to open a plugin context independently of a client connection, .after_fork would be an ideal place to do so with its existing nbdkit_backend argument (which we already documented was stable, although this test actually adds tests for that). Still, we will want to add a counterpart API (in a later patch) to tear that context down cleanly, where .unload is too late. But for that to work, we need to ensure that the plugin completes initialization before the filter is allowed to try to open a context into the plugin, which means swapping the visitation order to be inner-to-outer (similar to .prepare). --- docs/nbdkit-filter.pod | 24 +++++++++++------------ include/nbdkit-filter.h | 7 ++----- server/filters.c | 26 ++++++------------------- filters/exitwhen/exitwhen.c | 9 ++++----- filters/extentlist/extentlist.c | 5 ++--- filters/log/log.c | 9 ++++----- filters/multi-conn/multi-conn.c | 5 ++--- filters/pause/pause.c | 4 ++-- filters/rate/rate.c | 5 ++--- filters/stats/stats.c | 5 ++--- filters/tls-fallback/tls-fallback.c | 5 ++--- tests/test-layers-filter.c | 12 +++++------- tests/test-layers.c | 30 ++++++++++++++--------------- 13 files changed, 60 insertions(+), 86 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 97f32e10..9816f4e8 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -127,8 +127,7 @@ which is required. =head1 NEXT PLUGIN F<nbdkit-filter.h> defines some function types (C<nbdkit_next_config>, -C<nbdkit_next_config_complete>, C<nbdkit_next_get_ready>, -C<nbdkit_next_after_fork>, C<nbdkit_next_preconnect>, +C<nbdkit_next_config_complete>, C<nbdkit_next_preconnect>, C<nbdkit_next_list_exports>, C<nbdkit_next_default_export>, C<nbdkit_next_open>) and a structure called C<struct nbdkit_next_ops>. These abstract the next plugin or filter in the chain. There is also @@ -406,28 +405,29 @@ with an error message and return C<-1>. =head2 C<.get_ready> - int (*get_ready) (nbdkit_next_get_ready *next, void *nxdata, - int thread_model); + int (*get_ready) (int thread_model); -This intercepts the plugin C<.get_ready> method and can be used by the -filter to get ready to serve requests. +This optional callback is reached if the plugin C<.get_ready> method +succeeded (if the plugin failed, nbdkit has already exited), and can +be used by the filter to get ready to serve requests. The C<thread_model> parameter informs the filter about the final thread model chosen by nbdkit after considering the results of C<.thread_model> of all filters in the chain after -C<.config_complete>. This does not need to be passed on to C<next>, -as the model can no longer be altered at this point. +C<.config_complete>. If there is an error, C<.get_ready> should call C<nbdkit_error> with an error message and return C<-1>. =head2 C<.after_fork> - int (*after_fork) (nbdkit_next_after_fork *next, nbdkit_backend *nxdata); + int (*after_fork) (nbdkit_backend *backend); -This intercepts the plugin C<.after_fork> method and can be used by -the filter to start background threads (although these have limited -usefulness since they will not be able to access the plugin). +This optional callback is reached after the plugin C<.after_fork> +method has succeeded (if the plugin failed, nbdkit has already +exited), and can be used by the filter to start background threads. +The C<backend> parameter is valid until C<.unload>, for creating manual +contexts into the backend with C<nbdkit_next_context_open>. If there is an error, C<.after_fork> should call C<nbdkit_error> with an error message and return C<-1>. diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 86c41dd7..662f52cb 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -66,8 +66,6 @@ typedef struct nbdkit_next_ops nbdkit_next; typedef int nbdkit_next_config (nbdkit_backend *nxdata, const char *key, const char *value); typedef int nbdkit_next_config_complete (nbdkit_backend *nxdata); -typedef int nbdkit_next_get_ready (nbdkit_backend *nxdata); -typedef int nbdkit_next_after_fork (nbdkit_backend *nxdata); typedef int nbdkit_next_preconnect (nbdkit_backend *nxdata, int readonly); typedef int nbdkit_next_list_exports (nbdkit_backend *nxdata, int readonly, struct nbdkit_exports *exports); @@ -197,9 +195,8 @@ struct nbdkit_filter { nbdkit_backend *nxdata); const char *config_help; int (*thread_model) (void); - int (*get_ready) (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model); - int (*after_fork) (nbdkit_next_after_fork *next, nbdkit_backend *nxdata); + int (*get_ready) (int thread_model); + int (*after_fork) (nbdkit_backend *backend); int (*preconnect) (nbdkit_next_preconnect *next, nbdkit_backend *nxdata, int readonly); int (*list_exports) (nbdkit_next_list_exports *next, nbdkit_backend *nxdata, diff --git a/server/filters.c b/server/filters.c index 31cf5c7e..136fa672 100644 --- a/server/filters.c +++ b/server/filters.c @@ -171,33 +171,19 @@ filter_config_complete (struct backend *b) b->next->config_complete (b->next); } -static int -next_get_ready (struct backend *b) -{ - b->get_ready (b); - return 0; -} - static void filter_get_ready (struct backend *b) { struct backend_filter *f = container_of (b, struct backend_filter, backend); + b->next->get_ready (b->next); /* exits on failure */ + debug ("%s: get_ready thread_model=%d", b->name, thread_model); if (f->filter.get_ready) { - if (f->filter.get_ready (next_get_ready, b->next, thread_model) == -1) + if (f->filter.get_ready (thread_model) == -1) exit (EXIT_FAILURE); } - else - b->next->get_ready (b->next); -} - -static int -next_after_fork (struct backend *b) -{ - b->after_fork (b); - return 0; } static void @@ -205,14 +191,14 @@ filter_after_fork (struct backend *b) { struct backend_filter *f = container_of (b, struct backend_filter, backend); + b->next->after_fork (b->next); /* exits on failure */ + debug ("%s: after_fork", b->name); if (f->filter.after_fork) { - if (f->filter.after_fork (next_after_fork, b->next) == -1) + if (f->filter.after_fork (b->next) == -1) exit (EXIT_FAILURE); } - else - b->next->after_fork (b->next); } static int diff --git a/filters/exitwhen/exitwhen.c b/filters/exitwhen/exitwhen.c index 80836357..543af058 100644 --- a/filters/exitwhen/exitwhen.c +++ b/filters/exitwhen/exitwhen.c @@ -436,22 +436,21 @@ exitwhen_config (nbdkit_next_config *next, nbdkit_backend *nxdata, * then we exit immediately. */ static int -exitwhen_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +exitwhen_get_ready (int thread_model) { ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); if (check_for_event ()) exit (EXIT_SUCCESS); - return next (nxdata); + return 0; } /* After forking, start the background thread. Initially it is * polling. */ static int -exitwhen_after_fork (nbdkit_next_after_fork *next, nbdkit_backend *nxdata) +exitwhen_after_fork (nbdkit_backend *nxdata) { int err; pthread_t thread; @@ -462,7 +461,7 @@ exitwhen_after_fork (nbdkit_next_after_fork *next, nbdkit_backend *nxdata) nbdkit_error ("pthread_create: %m"); return -1; } - return next (nxdata); + return 0; } static int diff --git a/filters/extentlist/extentlist.c b/filters/extentlist/extentlist.c index c609062c..7b7ba2f2 100644 --- a/filters/extentlist/extentlist.c +++ b/filters/extentlist/extentlist.c @@ -262,12 +262,11 @@ parse_extentlist (void) } static int -extentlist_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +extentlist_get_ready (int thread_model) { parse_extentlist (); - return next (nxdata); + return 0; } static int diff --git a/filters/log/log.c b/filters/log/log.c index 26358302..acc9d77e 100644 --- a/filters/log/log.c +++ b/filters/log/log.c @@ -103,8 +103,7 @@ log_config (nbdkit_next_config *next, nbdkit_backend *nxdata, /* Open the logfile. */ static int -log_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +log_get_ready (int thread_model) { int fd; @@ -133,17 +132,17 @@ log_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, saved_pid = getpid (); print (NULL, "Ready", "thread_model=%d", thread_model); - return next (nxdata); + return 0; } static int -log_after_fork (nbdkit_next_after_fork *next, nbdkit_backend *nxdata) +log_after_fork (nbdkit_backend *nxdata) { /* Only issue this message if we actually fork. */ if (getpid () != saved_pid) print (NULL, "Fork", ""); - return next (nxdata); + return 0; } /* List exports. */ diff --git a/filters/multi-conn/multi-conn.c b/filters/multi-conn/multi-conn.c index 1b1c9d8c..12cc9a8f 100644 --- a/filters/multi-conn/multi-conn.c +++ b/filters/multi-conn/multi-conn.c @@ -145,13 +145,12 @@ multi_conn_config (nbdkit_next_config *next, nbdkit_backend *nxdata, "multi-conn-exportname=<BOOL> true to limit emulation by export name.\n" static int -multi_conn_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +multi_conn_get_ready (int thread_model) { if (thread_model == NBDKIT_THREAD_MODEL_SERIALIZE_CONNECTIONS && mode == AUTO) mode = DISABLE; - return next (nxdata); + return 0; } static void * diff --git a/filters/pause/pause.c b/filters/pause/pause.c index 90511b93..e3745397 100644 --- a/filters/pause/pause.c +++ b/filters/pause/pause.c @@ -231,7 +231,7 @@ control_socket_thread (void *arg) /* Start the background thread after fork. */ static int -pause_after_fork (nbdkit_next_after_fork *next, nbdkit_backend *nxdata) +pause_after_fork (nbdkit_backend *nxdata) { int err; pthread_t thread; @@ -242,7 +242,7 @@ pause_after_fork (nbdkit_next_after_fork *next, nbdkit_backend *nxdata) nbdkit_error ("pthread_create: %m"); return -1; } - return next (nxdata); + return 0; } /* This is called before processing each NBD request. */ diff --git a/filters/rate/rate.c b/filters/rate/rate.c index 7b07b5bf..6838628f 100644 --- a/filters/rate/rate.c +++ b/filters/rate/rate.c @@ -147,14 +147,13 @@ rate_config (nbdkit_next_config *next, nbdkit_backend *nxdata, } static int -rate_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +rate_get_ready (int thread_model) { /* Initialize the global buckets. */ bucket_init (&read_bucket, rate, BUCKET_CAPACITY); bucket_init (&write_bucket, rate, BUCKET_CAPACITY); - return next (nxdata); + return 0; } #define rate_config_help \ diff --git a/filters/stats/stats.c b/filters/stats/stats.c index 7fd043a3..5f44bc37 100644 --- a/filters/stats/stats.c +++ b/filters/stats/stats.c @@ -212,8 +212,7 @@ stats_config_complete (nbdkit_next_config_complete *next, } static int -stats_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +stats_get_ready (int thread_model) { int fd; @@ -238,7 +237,7 @@ stats_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, gettimeofday (&start_t, NULL); - return next (nxdata); + return 0; } #define stats_config_help \ diff --git a/filters/tls-fallback/tls-fallback.c b/filters/tls-fallback/tls-fallback.c index ee3e6784..fab9e58b 100644 --- a/filters/tls-fallback/tls-fallback.c +++ b/filters/tls-fallback/tls-fallback.c @@ -63,15 +63,14 @@ tls_fallback_config (nbdkit_next_config *next, nbdkit_backend *nxdata, "tlsreadme=<MESSAGE> Alternative contents for the plaintext dummy export.\n" int -tls_fallback_get_ready (nbdkit_next_get_ready *next, nbdkit_backend *nxdata, - int thread_model) +tls_fallback_get_ready (int thread_model) { if (thread_model == NBDKIT_THREAD_MODEL_SERIALIZE_CONNECTIONS) { nbdkit_error ("the tls-fallback filter requires parallel connection " "support"); return -1; } - return next (nxdata); + return 0; } static int diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index 2d8fe879..25ecb585 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -89,20 +89,18 @@ test_layers_filter_thread_model (void) } static int -test_layers_filter_get_ready (nbdkit_next_get_ready *next, - nbdkit_backend *nxdata, int thread_model) +test_layers_filter_get_ready (int thread_model) { DEBUG_FUNCTION; - return next (nxdata); + return 0; } static int -test_layers_filter_after_fork (nbdkit_next_after_fork *next, - nbdkit_backend *nxdata) +test_layers_filter_after_fork (nbdkit_backend *backend) { DEBUG_FUNCTION; - saved_backend = nxdata; - return next (nxdata); + saved_backend = backend; + return 0; } static int diff --git a/tests/test-layers.c b/tests/test-layers.c index cd5d73bc..09edcc0a 100644 --- a/tests/test-layers.c +++ b/tests/test-layers.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2020 Red Hat Inc. + * Copyright (C) 2018-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -234,28 +234,28 @@ main (int argc, char *argv[]) "filter3: test_layers_filter_thread_model", NULL); - /* get_ready methods called in order. */ + /* get_ready methods called in inner-to-outer order. */ log_verify_seen_in_order - ("testlayersfilter3: get_ready", - "filter3: test_layers_filter_get_ready", - "testlayersfilter2: get_ready", - "filter2: test_layers_filter_get_ready", + ("testlayersplugin: get_ready", + "test_layers_plugin_get_ready", "testlayersfilter1: get_ready", "filter1: test_layers_filter_get_ready", - "testlayersplugin: get_ready", - "test_layers_plugin_get_ready", + "testlayersfilter2: get_ready", + "filter2: test_layers_filter_get_ready", + "testlayersfilter3: get_ready", + "filter3: test_layers_filter_get_ready", NULL); - /* after_fork methods called in order. */ + /* after_fork methods called in inner-to-outer order. */ log_verify_seen_in_order - ("testlayersfilter3: after_fork", - "filter3: test_layers_filter_after_fork", - "testlayersfilter2: after_fork", - "filter2: test_layers_filter_after_fork", + ("testlayersplugin: after_fork", + "test_layers_plugin_after_fork", "testlayersfilter1: after_fork", "filter1: test_layers_filter_after_fork", - "testlayersplugin: after_fork", - "test_layers_plugin_after_fork", + "testlayersfilter2: after_fork", + "filter2: test_layers_filter_after_fork", + "testlayersfilter3: after_fork", + "filter3: test_layers_filter_after_fork", NULL); /* preconnect methods called in outer-to-inner order, complete -- 2.31.1
Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 2/5] threadlocal: Add way to store current context
Although we've recently added nbdkit_next_context_open to make it easier to access a plugin from a filter, to date we have still been limited to opening a context while associated with a client connection, because backend_open uses GET_CONN which asserts there is an active connection. In order to lift that restriction, we will need to be able to allow a context that has no associated connection, and move the data that feeds nbdkit_export_name() and friends out of the connection and into the context. This patch starts the process, by updating the thread local storage to track the current context, as well as all calls into backend.c to keep this information up-to-date. As this is internal to the server, I did not think it proper to add this into common/utils/cleanup.h, but rather just placed the new macros in internal.h. --- server/internal.h | 7 +++++++ server/backend.c | 23 +++++++++++++++++++++++ server/threadlocal.c | 27 ++++++++++++++++++++++++--- 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/server/internal.h b/server/internal.h index 849edbf8..a8283195 100644 --- a/server/internal.h +++ b/server/internal.h @@ -527,6 +527,13 @@ extern void *threadlocal_buffer (size_t size); extern void threadlocal_set_conn (struct connection *conn); extern struct connection *threadlocal_get_conn (void); +extern struct context *threadlocal_push_context (struct context *ctx); +extern void threadlocal_pop_context (struct context **ctx); +#define CLEANUP_CONTEXT_POP __attribute__((cleanup (threadlocal_pop_context))) +#define PUSH_CONTEXT_FOR_SCOPE(ctx) \ + CLEANUP_CONTEXT_POP struct context *UNIQUE_VAR(_ctx) = \ + threadlocal_push_context (ctx) + /* Macro which sets local variable struct connection *conn from * thread-local storage, asserting that it is non-NULL. If you want * to check if conn could be NULL (eg. outside a connection context) diff --git a/server/backend.c b/server/backend.c index 69625597..2244df7a 100644 --- a/server/backend.c +++ b/server/backend.c @@ -238,6 +238,7 @@ backend_open (struct backend *b, int readonly, const char *exportname) { GET_CONN; struct context *c = malloc (sizeof *c); + PUSH_CONTEXT_FOR_SCOPE (c); if (c == NULL) { nbdkit_error ("malloc: %m"); @@ -294,6 +295,7 @@ backend_open (struct backend *b, int readonly, const char *exportname) int backend_prepare (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle); @@ -317,6 +319,7 @@ backend_prepare (struct context *c) int backend_finalize (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; /* Call these in reverse order to .prepare above, starting from the @@ -344,6 +347,7 @@ backend_finalize (struct context *c) void backend_close (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; struct context *c_next = c->c_next; @@ -370,6 +374,7 @@ backend_valid_range (struct context *c, uint64_t offset, uint32_t count) const char * backend_export_description (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; const char *s; @@ -391,6 +396,7 @@ backend_export_description (struct context *c) int64_t backend_get_size (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -404,6 +410,7 @@ backend_get_size (struct context *c) int backend_can_write (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -417,6 +424,7 @@ backend_can_write (struct context *c) int backend_can_flush (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -430,6 +438,7 @@ backend_can_flush (struct context *c) int backend_is_rotational (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -443,6 +452,7 @@ backend_is_rotational (struct context *c) int backend_can_trim (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -462,6 +472,7 @@ backend_can_trim (struct context *c) int backend_can_zero (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -481,6 +492,7 @@ backend_can_zero (struct context *c) int backend_can_fast_zero (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -500,6 +512,7 @@ backend_can_fast_zero (struct context *c) int backend_can_extents (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -513,6 +526,7 @@ backend_can_extents (struct context *c) int backend_can_fua (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -532,6 +546,7 @@ backend_can_fua (struct context *c) int backend_can_multi_conn (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -545,6 +560,7 @@ backend_can_multi_conn (struct context *c) int backend_can_cache (struct context *c) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; assert (c->handle && (c->state & HANDLE_CONNECTED)); @@ -560,6 +576,7 @@ backend_pread (struct context *c, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -580,6 +597,7 @@ backend_pwrite (struct context *c, const void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; bool fua = !!(flags & NBDKIT_FLAG_FUA); int r; @@ -603,6 +621,7 @@ int backend_flush (struct context *c, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -622,6 +641,7 @@ backend_trim (struct context *c, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; bool fua = !!(flags & NBDKIT_FLAG_FUA); int r; @@ -647,6 +667,7 @@ backend_zero (struct context *c, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; bool fua = !!(flags & NBDKIT_FLAG_FUA); bool fast = !!(flags & NBDKIT_FLAG_FAST_ZERO); @@ -681,6 +702,7 @@ backend_extents (struct context *c, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents *extents, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; @@ -711,6 +733,7 @@ backend_cache (struct context *c, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + PUSH_CONTEXT_FOR_SCOPE (c); struct backend *b = c->b; int r; diff --git a/server/threadlocal.c b/server/threadlocal.c index 90230028..644239b7 100644 --- a/server/threadlocal.c +++ b/server/threadlocal.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2020 Red Hat Inc. + * Copyright (C) 2013-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -56,9 +56,10 @@ struct threadlocal { char *name; /* Can be NULL. */ size_t instance_num; /* Can be 0. */ int err; - void *buffer; + void *buffer; /* Can be NULL. */ size_t buffer_size; - struct connection *conn; + struct connection *conn; /* Can be NULL. */ + struct context *ctx; /* Can be NULL. */ }; static pthread_key_t threadlocal_key; @@ -227,3 +228,23 @@ threadlocal_get_conn (void) return threadlocal ? threadlocal->conn : NULL; } + +/* Set (or clear) the context using the current thread */ +struct context * +threadlocal_push_context (struct context *ctx) +{ + struct threadlocal *threadlocal = pthread_getspecific (threadlocal_key); + struct context *ret = NULL; + + if (threadlocal) { + ret = threadlocal->ctx; + threadlocal->ctx = ctx; + } + return ret; +} + +void +threadlocal_pop_context (struct context **ctx) +{ + threadlocal_push_context (*ctx); +} -- 2.31.1
Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 3/5] filters: Allow nbdkit_next_context_open outside client connection
Until now, attempts to create a new context outside a client connection (such as during .after_fork) triggered an assertion failure: backend_open insisted on having a valid connection. Even if we merely drop that assertion, there are other issues to worry about: nbdkit_is_tls() depends on the current connection to know if the client negotiated TLS; if the backend interns a string with nbdkit_string_intern() while there is a current connection, then we risk use-after-free since that string's lifetime ends when the client disconnects even if the context stays around; and so on. Since we never promised filter ABI stability, we can address this by adding a parameter to state the intent of whether the filter intends to share a single backend context among multiple client connections. Existing uses all pass false (the context lifetime is tied to the client connection, so life continues as before), but we now open the door to a filter that multiplexes connections, where shared connections now have a global interned string lifetime, a choice of TLS determined by the command line, and better documentation. The handling of nbdkit_is_tls is the most interesting aspect: if a user were to mix a multiplexing filter in front of the tls-fallback filter, we want to never expose data from the real plugin unless --tls=require was on the command line, because otherwise we'd have a lot more code to audit and touch to make sure the multiplexing filter doesn't accidentally leak data to an insecure client. But if the tls-fallback filter is in front of the multiplexing filter, things should just work: the multiplexing filter is not reached except by TLS clients. --- docs/nbdkit-filter.pod | 21 +++++++++++++++++++-- include/nbdkit-filter.h | 2 +- server/internal.h | 5 ++++- server/backend.c | 27 ++++++++++++++++++++------- server/filters.c | 10 +++++++--- server/plugins.c | 19 ++++++++++--------- server/protocol-handshake.c | 2 +- server/public.c | 24 ++++++++++++++++-------- server/test-public.c | 12 +++++++++++- server/threadlocal.c | 9 +++++++++ filters/retry/retry.c | 2 +- tests/test-layers-filter.c | 2 +- 12 files changed, 100 insertions(+), 35 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 9816f4e8..d2cdde13 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -216,7 +216,8 @@ backend pointer has a stable lifetime from the time of C<.after_fork> until C<.unload>. nbdkit_next *nbdkit_next_context_open (nbdkit_backend *backend, - int readonly, const char *exportname); + int readonly, const char *exportname, + int shared); This function attempts to open a new context into the plugin in relation to the filter's current C<backend>. The C<readonly> and @@ -227,6 +228,22 @@ C<nbdkit_context_set_next>. The filter should be careful to not violate any threading model restrictions of the plugin if it opens more than one context. +If C<shared> is false, this function must be called while servicing an +existing client connection, and the new context will share the same +connection details (export name, tls status, and shorter interned +string lifetimes) as the current connection, and thus should not be +used after the client connection ends. Conversely, if C<shared> is +true, this function may be called outside of a current client +connection (such as during C<.after_fork>), and the resulting context +may be freely shared among multiple client connections. In shared +mode, it will not be possible for the plugin to differentiate content +based on the client export name, the result of the plugin calling +nbdkit_is_tls() will depend solely whether C<--tls=require> was on the +command line, the lifetime of interned strings (via +C<nbdkit_strdup_intern> and friends) lasts for the life of the filter, +and the filter must take care to not expose potentially-secure +information from the backend to an insecure client. + void nbdkit_next_context_close (nbdkit_next *next); This function closes a context into the plugin. If the context has @@ -582,7 +599,7 @@ functionality can be obtained manually (other than error checking) by using the following: nbdkit_context_set_next (context, nbdkit_next_context_open - (nbdkit_context_get_backend (context), readonly, exportname)); + (nbdkit_context_get_backend (context), readonly, exportname, false)); The value of C<context> in this call has a lifetime that lasts until the counterpart C<.close>, and it is this value that may be passed to diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 662f52cb..a91de88f 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -162,7 +162,7 @@ NBDKIT_EXTERN_DECL (nbdkit_backend *, nbdkit_context_get_backend, (nbdkit_context *context)); NBDKIT_EXTERN_DECL (nbdkit_next *, nbdkit_next_context_open, (nbdkit_backend *backend, int readonly, - const char *exportname)); + const char *exportname, int shared)); NBDKIT_EXTERN_DECL (void, nbdkit_next_context_close, (nbdkit_next *next)); NBDKIT_EXTERN_DECL (nbdkit_next *, nbdkit_context_set_next, (nbdkit_context *context, nbdkit_next *next)); diff --git a/server/internal.h b/server/internal.h index a8283195..d2ddf271 100644 --- a/server/internal.h +++ b/server/internal.h @@ -208,6 +208,7 @@ struct context { void *handle; /* Plugin or filter handle. */ struct backend *b; /* Backend that provided handle. */ struct context *c_next; /* Underlying context, only when b->next != NULL. */ + struct connection *conn; /* Active connection at context creation, if any. */ unsigned char state; /* Bitmask of HANDLE_* values */ @@ -418,7 +419,8 @@ extern const char *backend_default_export (struct backend *b, int readonly) * exportname if they need to refer to it later. */ extern struct context *backend_open (struct backend *b, - int readonly, const char *exportname) + int readonly, const char *exportname, + int shared) __attribute__((__nonnull__ (1, 3))); extern int backend_prepare (struct context *c) __attribute__((__nonnull__ (1))); @@ -526,6 +528,7 @@ extern int threadlocal_get_error (void); extern void *threadlocal_buffer (size_t size); extern void threadlocal_set_conn (struct connection *conn); extern struct connection *threadlocal_get_conn (void); +extern struct context *threadlocal_get_context (void); extern struct context *threadlocal_push_context (struct context *ctx); extern void threadlocal_pop_context (struct context **ctx); diff --git a/server/backend.c b/server/backend.c index 2244df7a..73994f56 100644 --- a/server/backend.c +++ b/server/backend.c @@ -234,24 +234,37 @@ static struct nbdkit_next_ops next_ops = { }; struct context * -backend_open (struct backend *b, int readonly, const char *exportname) +backend_open (struct backend *b, int readonly, const char *exportname, + int shared) { - GET_CONN; - struct context *c = malloc (sizeof *c); - PUSH_CONTEXT_FOR_SCOPE (c); + struct connection *conn = threadlocal_get_conn (); + bool using_tls; + struct context *c; + if (!shared && !conn) { + nbdkit_error ("attempt to open non-shared context outside a connection"); + return NULL; + } + if (shared) + using_tls = tls == 2; + else + using_tls = conn->using_tls; + + c = malloc (sizeof *c); if (c == NULL) { nbdkit_error ("malloc: %m"); return NULL; } + PUSH_CONTEXT_FOR_SCOPE (c); controlpath_debug ("%s: open readonly=%d exportname=\"%s\" tls=%d", - b->name, readonly, exportname, conn->using_tls); + b->name, readonly, exportname, using_tls); c->next = next_ops; c->handle = NULL; c->b = b; c->c_next = NULL; + c->conn = shared ? NULL : conn; c->state = 0; c->exportsize = -1; c->can_write = readonly ? 0 : -1; @@ -266,7 +279,7 @@ backend_open (struct backend *b, int readonly, const char *exportname) c->can_cache = -1; /* Determine the canonical name for default export */ - if (!*exportname) { + if (!*exportname && conn) { exportname = backend_default_export (b, readonly); if (exportname == NULL) { nbdkit_error ("default export (\"\") not permitted"); @@ -278,7 +291,7 @@ backend_open (struct backend *b, int readonly, const char *exportname) /* Most filters will call next_open first, resulting in * inner-to-outer ordering. */ - c->handle = b->open (c, readonly, exportname, conn->using_tls); + c->handle = b->open (c, readonly, exportname, using_tls); controlpath_debug ("%s: open returned handle %p", b->name, c->handle); if (c->handle == NULL) { diff --git a/server/filters.c b/server/filters.c index 136fa672..1df8e1c9 100644 --- a/server/filters.c +++ b/server/filters.c @@ -250,7 +250,8 @@ static int next_open (struct context *c, int readonly, const char *exportname) { struct backend *b = nbdkit_context_get_backend (c); - struct context *c_next = nbdkit_next_context_open (b, readonly, exportname); + struct context *c_next = nbdkit_next_context_open (b, readonly, exportname, + false); struct context *old; if (c_next == NULL) @@ -691,10 +692,13 @@ nbdkit_context_get_backend (struct context *c) struct context * nbdkit_next_context_open (struct backend *b, - int readonly, const char *exportname) + int readonly, const char *exportname, int shared) { + struct context *c = threadlocal_get_context (); + assert (b); - return backend_open (b, readonly, exportname); + assert (!c || b == c->b->next); + return backend_open (b, readonly, exportname, shared || !c || !c->conn); } void diff --git a/server/plugins.c b/server/plugins.c index b5cb5971..457c8c86 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -313,13 +313,11 @@ static void * plugin_open (struct context *c, int readonly, const char *exportname, int is_tls) { - GET_CONN; struct backend *b = c->b; struct backend_plugin *p = container_of (b, struct backend_plugin, backend); void *r; assert (p->plugin.open != NULL); - assert (conn->exportname == NULL); /* Save the exportname since the lifetime of the string passed in * here is likely to be brief. In addition this provides a place @@ -333,13 +331,16 @@ plugin_open (struct context *c, int readonly, const char *exportname, * will still need to save the export name in the handle because of * the lifetime issue. */ - conn->exportname = nbdkit_strdup_intern (exportname); - if (conn->exportname == NULL) - return NULL; + if (c->conn) { + assert (c->conn->exportname == NULL); + c->conn->exportname = nbdkit_strdup_intern (exportname); + if (c->conn->exportname == NULL) + return NULL; + } r = p->plugin.open (readonly); - if (r == NULL) - conn->exportname = NULL; + if (r == NULL && c->conn) + c->conn->exportname = NULL; return r; } @@ -362,14 +363,14 @@ plugin_finalize (struct context *c) static void plugin_close (struct context *c) { - GET_CONN; struct backend *b = c->b; struct backend_plugin *p = container_of (b, struct backend_plugin, backend); assert (c->handle); if (p->plugin.close) p->plugin.close (c->handle); - conn->exportname = NULL; + if (c->conn) + c->conn->exportname = NULL; } static const char * diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c index e0183173..67762192 100644 --- a/server/protocol-handshake.c +++ b/server/protocol-handshake.c @@ -80,7 +80,7 @@ protocol_common_open (uint64_t *exportsize, uint16_t *flags, uint16_t eflags = NBD_FLAG_HAS_FLAGS; int fl; - conn->top_context = backend_open (top, read_only, exportname); + conn->top_context = backend_open (top, read_only, exportname, false); if (conn->top_context == NULL) return -1; diff --git a/server/public.c b/server/public.c index 9608600e..af55c4bc 100644 --- a/server/public.c +++ b/server/public.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2020 Red Hat Inc. + * Copyright (C) 2013-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -759,14 +759,14 @@ nbdkit_nanosleep (unsigned sec, unsigned nsec) const char * nbdkit_export_name (void) { - struct connection *conn = threadlocal_get_conn (); + struct context *c = threadlocal_get_context (); - if (!conn) { + if (!c || !c->conn) { nbdkit_error ("no connection in this thread"); return NULL; } - return conn->exportname; + return c->conn->exportname; } /* This function will be deprecated for API V3 users. The preferred @@ -775,14 +775,21 @@ nbdkit_export_name (void) int nbdkit_is_tls (void) { - struct connection *conn = threadlocal_get_conn (); + struct context *c = threadlocal_get_context (); - if (!conn) { + if (!c) { nbdkit_error ("no connection in this thread"); return -1; } - return conn->using_tls; + if (!c->conn) { + /* If a filter opened this backend outside of a client connection, + * then we can only claim tls when the command line required it. + */ + return tls == 2; + } + + return c->conn->using_tls; } int @@ -980,7 +987,8 @@ free_interns (void) static const char * add_intern (char *str) { - struct connection *conn = threadlocal_get_conn (); + struct context *c = threadlocal_get_context (); + struct connection *conn = c ? c->conn : NULL; string_vector *list = conn ? &conn->interns : &global_interns; if (string_vector_append (list, str) == -1) { diff --git a/server/test-public.c b/server/test-public.c index 141e9eb5..fef12043 100644 --- a/server/test-public.c +++ b/server/test-public.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2020 Red Hat Inc. + * Copyright (C) 2018-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -75,6 +75,12 @@ threadlocal_get_conn (void) abort (); } +struct context * +threadlocal_get_context (void) +{ + abort (); +} + int connection_get_status (void) { @@ -87,6 +93,10 @@ backend_default_export (struct backend *b, int readonly) abort (); } +int tls; + +/* Unit tests. */ + static bool test_nbdkit_parse_size (void) { diff --git a/server/threadlocal.c b/server/threadlocal.c index 644239b7..ad2ffd09 100644 --- a/server/threadlocal.c +++ b/server/threadlocal.c @@ -229,6 +229,15 @@ threadlocal_get_conn (void) return threadlocal ? threadlocal->conn : NULL; } +/* Get the current context associated with this thread, if available */ +struct context * +threadlocal_get_context (void) +{ + struct threadlocal *threadlocal = pthread_getspecific (threadlocal_key); + + return threadlocal ? threadlocal->ctx : NULL; +} + /* Set (or clear) the context using the current thread */ struct context * threadlocal_push_context (struct context *ctx) diff --git a/filters/retry/retry.c b/filters/retry/retry.c index f1c5d37f..fb1d0526 100644 --- a/filters/retry/retry.c +++ b/filters/retry/retry.c @@ -227,7 +227,7 @@ do_retry (struct retry_handle *h, struct retry_data *data, /* Open a new connection. */ new_next = nbdkit_next_context_open (nbdkit_context_get_backend (h->context), h->readonly || force_readonly, - h->exportname); + h->exportname, false); if (new_next == NULL) { *err = ESHUTDOWN; goto again; diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index 25ecb585..565b5616 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -152,7 +152,7 @@ test_layers_filter_open (nbdkit_next_open *next, nbdkit_context *nxdata, backend = nbdkit_context_get_backend (nxdata); assert (backend != NULL); - n = nbdkit_next_context_open (backend, readonly, exportname); + n = nbdkit_next_context_open (backend, readonly, exportname, 0); if (n == NULL) { free (h); return NULL; -- 2.31.1
Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 4/5] api: Add .cleanup callback
The existing .unload callback is currently called inner-to-outer, which is not suitable for use in late cleanups where a filter wants to make one final call into the underlying plugin. We were careful to state that unload does not have a specific ordering, but switching that ordering may not be helpful. Late cleanup may also want one final chance to open a context into the plugin, which implies one final access to the backend pointer from filters (in contrast to .unload taking no parameters). So, I found it easier to introduce a new .cleanup callback, specifically stated to be the counterpart of .after_fork. This leaves .load/.unload as a pair for handling things at the dlopen barrier (affecting only the local filter or plugin, but no interaction with other modules), and .after_fork/.cleanup for handling things related to the first and last points at which it is safe to interact with the underlying plugin. The new callback has no return value, as by the time we are cleaning up, any failures are not going to change the fact we are shutting down anyway. --- docs/nbdkit-filter.pod | 22 +++++++++++++++++++--- docs/nbdkit-plugin.pod | 33 ++++++++++++++++++++++++++++++--- include/nbdkit-filter.h | 1 + include/nbdkit-plugin.h | 4 +++- server/internal.h | 2 ++ server/filters.c | 13 +++++++++++++ server/main.c | 1 + server/plugins.c | 13 +++++++++++++ tests/test-layers-filter.c | 8 ++++++++ tests/test-layers-plugin.c | 9 ++++++++- tests/test-layers.c | 12 ++++++++++++ 11 files changed, 110 insertions(+), 8 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index d2cdde13..aaa7706d 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -147,7 +147,7 @@ initialization outside any connections. The value of C<backend> passed to C<.after_fork> also occurs without connections, but is shared with C<.preconnect>, C<.list_exports>, and C<.default_export>, and can also be obtained from the C<context> passed to C<.open>, and -has a lifetime that lasts to C<.unload> for use by +has a lifetime that lasts to C<.cleanup> for use by C<nbdkit_next_context_open>. In turn, the value of C<context> passed to C<.open> has a lifetime that lasts until the matching C<.close> for use by C<nbdkit_context_get_backend> and C<nbdkit_context_set_next>. @@ -213,7 +213,7 @@ Obtains the backend pointer from the C<context> parameter to C<.open>, matching the backend pointer available to C<.after_fork>, C<.preconnect>, C<.list_exports>, and C<.default_export>. This backend pointer has a stable lifetime from the time of C<.after_fork> -until C<.unload>. +until C<.cleanup>. nbdkit_next *nbdkit_next_context_open (nbdkit_backend *backend, int readonly, const char *exportname, @@ -443,12 +443,28 @@ an error message and return C<-1>. This optional callback is reached after the plugin C<.after_fork> method has succeeded (if the plugin failed, nbdkit has already exited), and can be used by the filter to start background threads. -The C<backend> parameter is valid until C<.unload>, for creating manual +The C<backend> parameter is valid until C<.cleanup>, for creating manual contexts into the backend with C<nbdkit_next_context_open>. If there is an error, C<.after_fork> should call C<nbdkit_error> with an error message and return C<-1>. +=head2 C<.cleanup> + + int (cleanup) (nbdkit_backend *backend); + +This optional callback is reached once after all client connections +have been closed, but before the underlying plugin C<.cleanup> or any +C<.unload> callbacks. It can be used by the filter to gracefully +close any background threads created during C<.after_fork>, as well as +close any manual contexts into C<backend> previously opened with +C<nbdkit_next_context_open>. + +Note that it's not guaranteed that C<.cleanup> will always be called +(eg. the server might be killed or segfault), so you should try to +make the filter as robust as possible by not requiring cleanup. See +also L<nbdkit-plugin(3)/SHUTDOWN>. + =head2 C<.preconnect> int (*preconnect) (nbdkit_next_preconnect *next, nbdkit_backend *nxdata, diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 89374f55..9a36c01d 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -176,6 +176,9 @@ the plugin: ? before nbdkit exits ? ???????????????????? + ? cleanup ? + ???????????????????? + ???????????????????? ? unload ? ???????????????????? @@ -281,9 +284,20 @@ The sequence C<.preconnect>, C<.open> ... C<.close> can be called repeatedly over the lifetime of the plugin, and can be called in parallel (depending on the thread model). +=item C<.cleanup> + +is called once after all connections have been closed, prior to +unloading the plugin from memory. This is only called if +C<.after_fork> succeeded earlier (even in cases where nbdkit did not +fork but is running in the foreground), which makes it a good place to +gracefully end any background threads. + =item C<.unload> -is called once just before the plugin is unloaded from memory. +is called once just before the plugin is unloaded from memory. This +is called even when nbdkit did not need to use C<.after_fork> (such as +when using C<--dump-plugin> to display documentation about the +plugin). =back @@ -644,6 +658,18 @@ above. If there is an error, C<.after_fork> should call C<nbdkit_error> with an error message and return C<-1>. +=head2 C<.cleanup> + + void cleanup (void); + +This optional callback is called after the server has closed all +connections and is preparing to unload. It is only reached in the +same cases that the C<.after_fork> callback was used, making it a good +place to clean up any background threads. However, it is not +guaranteed that this callback will be reached, so you should try to +make the plugin as robust as possible by not requiring cleanup. See +also L</SHUTDOWN> below. + =head2 C<.preconnect> int preconnect (int readonly); @@ -1290,8 +1316,9 @@ plugin having to call C<nbdkit_set_error>. When nbdkit receives certain signals it will shut down (see L<nbdkit(1)/SIGNALS>). The server will wait for any currently running -plugin callbacks to finish and also call the C<.unload> callback -before unloading the plugin. +plugin callbacks to finish, call C<.close> on those connections, then +call the C<.cleanup> and C<.unload> callbacks before unloading the +plugin. Note that it's not guaranteed this can always happen (eg. the server might be killed by C<SIGKILL> or segfault). diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index a91de88f..633d34e7 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -197,6 +197,7 @@ struct nbdkit_filter { int (*thread_model) (void); int (*get_ready) (int thread_model); int (*after_fork) (nbdkit_backend *backend); + void (*cleanup) (nbdkit_backend *backend); int (*preconnect) (nbdkit_next_preconnect *next, nbdkit_backend *nxdata, int readonly); int (*list_exports) (nbdkit_next_list_exports *next, nbdkit_backend *nxdata, diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index db010a5a..611987c7 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2020 Red Hat Inc. + * Copyright (C) 2013-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -144,6 +144,8 @@ struct nbdkit_plugin { struct nbdkit_exports *exports); const char * (*default_export) (int readonly, int is_tls); const char * (*export_description) (void *handle); + + void (*cleanup) (void); }; NBDKIT_EXTERN_DECL (void, nbdkit_set_error, (int err)); diff --git a/server/internal.h b/server/internal.h index d2ddf271..94eb4a9c 100644 --- a/server/internal.h +++ b/server/internal.h @@ -359,6 +359,8 @@ struct backend { const char *(*magic_config_key) (struct backend *); void (*get_ready) (struct backend *); void (*after_fork) (struct backend *); + void (*cleanup) (struct backend *); + int (*preconnect) (struct backend *, int readonly); int (*list_exports) (struct backend *, int readonly, int is_tls, struct nbdkit_exports *exports); diff --git a/server/filters.c b/server/filters.c index 1df8e1c9..d0dd31e5 100644 --- a/server/filters.c +++ b/server/filters.c @@ -201,6 +201,18 @@ filter_after_fork (struct backend *b) } } +static void +filter_cleanup (struct backend *b) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + + debug ("%s: cleanup", b->name); + if (f->filter.cleanup) + f->filter.cleanup (b->next); + + b->next->cleanup (b->next); +} + static int filter_preconnect (struct backend *b, int readonly) { @@ -599,6 +611,7 @@ static struct backend filter_functions = { .magic_config_key = plugin_magic_config_key, .get_ready = filter_get_ready, .after_fork = filter_after_fork, + .cleanup = filter_cleanup, .preconnect = filter_preconnect, .list_exports = filter_list_exports, .default_export = filter_default_export, diff --git a/server/main.c b/server/main.c index 7200647a..74bdf040 100644 --- a/server/main.c +++ b/server/main.c @@ -735,6 +735,7 @@ main (int argc, char *argv[]) start_serving (); + top->cleanup (top); top->free (top); top = NULL; diff --git a/server/plugins.c b/server/plugins.c index 457c8c86..6576a846 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -164,6 +164,7 @@ plugin_dump_fields (struct backend *b) HAS (thread_model); HAS (get_ready); HAS (after_fork); + HAS (cleanup); HAS (preconnect); HAS (list_exports); HAS (default_export); @@ -273,6 +274,17 @@ plugin_after_fork (struct backend *b) exit (EXIT_FAILURE); } +static void +plugin_cleanup (struct backend *b) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + debug ("%s: cleanup", b->name); + + if (p->plugin.cleanup) + p->plugin.cleanup (); +} + static int plugin_preconnect (struct backend *b, int readonly) { @@ -802,6 +814,7 @@ static struct backend plugin_functions = { .magic_config_key = plugin_magic_config_key, .get_ready = plugin_get_ready, .after_fork = plugin_after_fork, + .cleanup = plugin_cleanup, .preconnect = plugin_preconnect, .list_exports = plugin_list_exports, .default_export = plugin_default_export, diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index 565b5616..380e3cea 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -103,6 +103,13 @@ test_layers_filter_after_fork (nbdkit_backend *backend) return 0; } +static void +test_layers_filter_cleanup (nbdkit_backend *backend) +{ + assert (backend == saved_backend); + DEBUG_FUNCTION; +} + static int test_layers_filter_preconnect (nbdkit_next_preconnect *next, nbdkit_backend *nxdata, int readonly) @@ -423,6 +430,7 @@ static struct nbdkit_filter filter = { .thread_model = test_layers_filter_thread_model, .get_ready = test_layers_filter_get_ready, .after_fork = test_layers_filter_after_fork, + .cleanup = test_layers_filter_cleanup, .preconnect = test_layers_filter_preconnect, .list_exports = test_layers_filter_list_exports, .default_export = test_layers_filter_default_export, diff --git a/tests/test-layers-plugin.c b/tests/test-layers-plugin.c index 856e4031..1716b5b9 100644 --- a/tests/test-layers-plugin.c +++ b/tests/test-layers-plugin.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2020 Red Hat Inc. + * Copyright (C) 2018-2021 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -93,6 +93,12 @@ test_layers_plugin_after_fork (void) return 0; } +static void +test_layers_plugin_cleanup (void) +{ + DEBUG_FUNCTION; +} + static int test_layers_plugin_preconnect (int readonly) { @@ -277,6 +283,7 @@ static struct nbdkit_plugin plugin = { .thread_model = test_layers_plugin_thread_model, .get_ready = test_layers_plugin_get_ready, .after_fork = test_layers_plugin_after_fork, + .cleanup = test_layers_plugin_cleanup, .preconnect = test_layers_plugin_preconnect, .list_exports = test_layers_plugin_list_exports, .default_export = test_layers_plugin_default_export, diff --git a/tests/test-layers.c b/tests/test-layers.c index 09edcc0a..13c82289 100644 --- a/tests/test-layers.c +++ b/tests/test-layers.c @@ -609,6 +609,18 @@ main (int argc, char *argv[]) "test_layers_plugin_close", NULL); + /* cleanup methods called in outer-to-inner order. */ + log_verify_seen_in_order + ("testlayersfilter3: cleanup", + "filter3: test_layers_filter_cleanup", + "testlayersfilter2: cleanup", + "filter2: test_layers_filter_cleanup", + "testlayersfilter1: cleanup", + "filter1: test_layers_filter_cleanup", + "testlayersplugin: cleanup", + "test_layers_plugin_cleanup", + NULL); + /* unload methods should be run in any order. */ log_verify_seen ("test_layers_plugin_unload"); log_verify_seen ("filter1: test_layers_filter_unload"); -- 2.31.1
Eric Blake
2021-May-07 01:59 UTC
[Libguestfs] [nbdkit PATCH 5/5] ext2: Support parallel requests and connections
Instead of serializing connections (which in turn serializes requests) because it is unsafe to open more than one handle into the filesystem at the same time, use the recently-added APIs to open the filesystem during .after_fork, then use the fact that ext2 code then supports parallel access to files within the single system handle. --- filters/ext2/ext2.c | 225 ++++++++++++++++++++++++++------------------ 1 file changed, 131 insertions(+), 94 deletions(-) diff --git a/filters/ext2/ext2.c b/filters/ext2/ext2.c index cc80458f..5b047246 100644 --- a/filters/ext2/ext2.c +++ b/filters/ext2/ext2.c @@ -36,6 +36,7 @@ #include <stdlib.h> #include <inttypes.h> #include <errno.h> +#include <stdbool.h> /* Inlining is broken in the ext2fs header file. Disable it by * defining the following: @@ -55,6 +56,14 @@ */ static const char *file; +/* Filesystem handle, shared between all client connections. */ +static ext2_filsys fs; + +/* Plugin access shared between all client connections, and doubles as + * the "name" parameter for ext2fs_open. + */ +static nbdkit_next *plugin; + static void ext2_load (void) { @@ -89,7 +98,8 @@ ext2_config_complete (nbdkit_next_config_complete *next, nbdkit_backend *nxdata) if (strcmp (file, "exportname") == 0) file = NULL; else if (file[0] != '/') { - nbdkit_error ("the file parameter must refer to an absolute path"); + nbdkit_error ("the file parameter must be 'exportname' or refer to " + "an absolute path"); return -1; } @@ -100,13 +110,89 @@ ext2_config_complete (nbdkit_next_config_complete *next, nbdkit_backend *nxdata) "ext2file=<FILENAME> (required) Absolute name of file to serve inside the\n" \ " disk image, or 'exportname' for client choice." +/* Opening more than one instance of the filesystem in parallel is a + * recipe for disaster, so instead we open a single instance during + * after_fork to share among all client connections. If the + * underlying plugin supports parallel requests, then we do too since + * ext2 code is re-entrant through our one open handle. + */ +static int +ext2_after_fork (nbdkit_backend *nxdata) +{ + CLEANUP_FREE char *name = NULL; + int fs_flags; + int64_t r; + errcode_t err; + + /* It would be desirable for ?nbdkit -r? to behave the same way as + * ?mount -o ro?. But we don't know the state of the readonly flag + * until ext2_open is called, so for now we can't do that. We could + * add a knob during .config if desired; but until then, we blindly + * request write access to the underlying plugin, for journal + * replay. + * + * Similarly, there is no sane way to pass the client's exportname + * through to the plugin (whether or not .config was set to serve a + * single file or to let the client choose by exportname), so + * blindly ask for "" and rely on the plugin's default. + */ + plugin = nbdkit_next_context_open (nxdata, 0, "", true); + if (plugin == NULL) { + nbdkit_error ("unable to open plugin"); + return -1; + } + if (plugin->prepare (plugin) == -1) + goto fail; + + fs_flags = 0; +#ifdef EXT2_FLAG_64BITS + fs_flags |= EXT2_FLAG_64BITS; +#endif + r = plugin->get_size (plugin); + if (r == -1) + goto fail; + /* XXX See note above about a knob for read-only */ + r = plugin->can_write (plugin); + if (r == -1) + goto fail; + if (r == 1) + fs_flags |= EXT2_FLAG_RW; + + name = nbdkit_io_encode (plugin); + if (!name) { + nbdkit_error ("nbdkit_io_encode: %m"); + goto fail; + } + + err = ext2fs_open (name, fs_flags, 0, 0, nbdkit_io_manager, &fs); + if (err != 0) { + nbdkit_error ("open: %s", error_message (err)); + goto fail; + } + + return 0; + + fail: + plugin->finalize (plugin); + nbdkit_next_context_close (plugin); + return -1; +} + +static void +ext2_cleanup (nbdkit_backend *nxdata) +{ + if (fs) + ext2fs_close (fs); + plugin->finalize (plugin); + nbdkit_next_context_close (plugin); +} + /* The per-connection handle. */ struct handle { const char *exportname; /* Client export name. */ - ext2_filsys fs; /* Filesystem handle. */ ext2_ino_t ino; /* Inode of open file. */ ext2_file_t file; /* File handle. */ - nbdkit_next *next; /* "name" parameter to ext2fs_open. */ + nbdkit_context *context; }; /* Export list. */ @@ -117,18 +203,17 @@ ext2_list_exports (nbdkit_next_list_exports *next, nbdkit_backend *nxdata, /* If we are honoring export names, the default export "" won't * work, and we must not leak export names from the underlying * plugin. Advertising all filenames within the ext2 image could be - * huge, and even if we wanted to, it would require that we could - * open the plugin prior to the client reaching our .open. So leave - * the list empty instead. + * huge, although we could do it if we wanted to since the + * filesystem was already opened in .after_fork. */ if (!file) return 0; /* If we are serving a specific ext2file, we don't care what export - * name the user passes, but the underlying plugin might; there's no - * harm in advertising that list. + * name the user passes, but it's too late to pass that on to the + * underlying plugin, so advertise just "". */ - return next (nxdata, readonly, exports); + return nbdkit_use_default_export (exports); } /* Default export. */ @@ -170,17 +255,7 @@ ext2_open (nbdkit_next_open *next, nbdkit_context *nxdata, return NULL; } - /* If file == NULL (ie. using exportname) then don't - * pass the client exportname to the lower layers. - */ - exportname = file ? exportname : ""; - - /* Request write access to the underlying plugin, for journal replay. */ - if (next (nxdata, 0, exportname) == -1) { - free (h); - return NULL; - } - + h->context = nxdata; return h; } @@ -189,36 +264,12 @@ ext2_prepare (nbdkit_next *next, void *handle, int readonly) { struct handle *h = handle; errcode_t err; - int fs_flags; int file_flags; struct ext2_inode inode; - int64_t r; CLEANUP_FREE char *name = NULL; const char *fname = file ?: h->exportname; CLEANUP_FREE char *absname = NULL; - - fs_flags = 0; -#ifdef EXT2_FLAG_64BITS - fs_flags |= EXT2_FLAG_64BITS; -#endif - r = next->get_size (next); - if (r == -1) - return -1; - r = next->can_write (next); - if (r == -1) - return -1; - if (r == 0) - readonly = 1; - - if (!readonly) - fs_flags |= EXT2_FLAG_RW; - - h->next = next; - name = nbdkit_io_encode (next); - if (!name) { - nbdkit_error ("nbdkit_io_encode: %m"); - return -1; - } + nbdkit_next *old; if (fname[0] != '/') { if (asprintf (&absname, "/%s", fname) < 0) { @@ -228,53 +279,58 @@ ext2_prepare (nbdkit_next *next, void *handle, int readonly) fname = absname; } - err = ext2fs_open (name, fs_flags, 0, 0, nbdkit_io_manager, &h->fs); - if (err != 0) { - nbdkit_error ("open: %s", error_message (err)); - goto err0; - } - if (strcmp (fname, "/") == 0) /* probably gonna fail, but we'll catch it later */ h->ino = EXT2_ROOT_INO; else { - err = ext2fs_namei (h->fs, EXT2_ROOT_INO, EXT2_ROOT_INO, + err = ext2fs_namei (fs, EXT2_ROOT_INO, EXT2_ROOT_INO, &fname[1], &h->ino); if (err != 0) { nbdkit_error ("%s: namei: %s", fname, error_message (err)); - goto err1; + return -1; } } /* Check that fname is a regular file. * XXX This won't follow symlinks, we'd have to do that manually. */ - err = ext2fs_read_inode (h->fs, h->ino, &inode); + err = ext2fs_read_inode (fs, h->ino, &inode); if (err != 0) { nbdkit_error ("%s: inode: %s", fname, error_message (err)); - goto err1; + return -1; } if (!LINUX_S_ISREG (inode.i_mode)) { nbdkit_error ("%s: must be a regular file in the disk image", fname); - goto err1; + return -1; } file_flags = 0; if (!readonly) file_flags |= EXT2_FILE_WRITE; - err = ext2fs_file_open2 (h->fs, h->ino, NULL, file_flags, &h->file); + err = ext2fs_file_open2 (fs, h->ino, NULL, file_flags, &h->file); if (err != 0) { nbdkit_error ("%s: open: %s", fname, error_message (err)); - goto err1; + return -1; } + /* Associate our shared backend with this connection, so we don't + * have to override every single callback function. + */ + old = nbdkit_context_set_next (h->context, plugin); + assert (old == NULL); return 0; +} - err1: - ext2fs_close (h->fs); - h->fs = NULL; - err0: - return -1; +static int +ext2_finalize (nbdkit_next *next, void *handle) +{ + struct handle *h = handle; + nbdkit_next *old; + + /* Ensure our shared plugin handle survives past this connection. */ + old = nbdkit_context_set_next (h->context, NULL); + assert (old == next); + return 0; } /* Free up the per-connection handle. */ @@ -283,10 +339,8 @@ ext2_close (void *handle) { struct handle *h = handle; - if (h->fs) { + if (h->file) ext2fs_file_close (h->file); - ext2fs_close (h->fs); - } free (h); } @@ -306,12 +360,10 @@ ext2_can_cache (nbdkit_next *next, void *handle) static int ext2_can_multi_conn (nbdkit_next *next, void *handle) { - /* Since we do not permit parallel connections, it does not matter - * what we advertise here, and we could just as easily inherit the - * plugin's .can_multi_conn. But realistically, if we adjust - * .thread_model, we cannot advertise support unless .flush is - * consistent, and that would require inspecting the ext2 source - * code, so for now, we hard-code a safe answer. + /* Although we permit parallel client connections multiplexed into + * the single shared filesystem handle, we would have to audit the + * ext2 source to learn if it is cache-consistent. So for now, + * hard-code a safe answer. */ return 0; } @@ -324,7 +376,7 @@ ext2_can_flush (nbdkit_next *next, void *handle) * plugin ability, since ext2 wants to flush the filesystem into * permanent storage when possible. */ - if (next->can_flush (next) == -1) + if (plugin->can_flush (plugin) == -1) return -1; return 1; } @@ -339,7 +391,7 @@ ext2_can_zero (nbdkit_next *next, void *handle) /* For now, tell nbdkit to call .pwrite instead of any optimization. * However, we also want to cache the underlying plugin support. */ - if (next->can_zero (next) == -1) + if (plugin->can_zero (plugin) == -1) return -1; return NBDKIT_ZERO_EMULATE; } @@ -350,28 +402,11 @@ ext2_can_trim (nbdkit_next *next, void *handle) /* For now, tell nbdkit to never call .trim. However, we also want * to cache the underlying plugin support. */ - if (next->can_trim (next) == -1) + if (plugin->can_trim (plugin) == -1) return -1; return 0; } -/* It might be possible to relax this, but it's complicated. - * - * It's desirable for ?nbdkit -r? to behave the same way as - * ?mount -o ro?. But we don't know the state of the readonly flag - * until ext2_open is called (because the NBD client can also request - * a readonly connection). So we could not set the "ro" flag if we - * opened the filesystem any earlier (eg in ext2_config). - * - * So out of necessity we have one ext2_filsys handle per connection, - * but if we allowed parallel work on those handles then we would get - * data corruption, so we need to serialize connections. - */ -static int ext2_thread_model (void) -{ - return NBDKIT_THREAD_MODEL_SERIALIZE_CONNECTIONS; -} - /* Description. */ static const char * ext2_export_description (nbdkit_next *next, void *handle) @@ -379,7 +414,7 @@ ext2_export_description (nbdkit_next *next, void *handle) struct handle *h = handle; const char *fname = file ?: h->exportname; const char *slash = fname[0] == '/' ? "" : "/"; - const char *base = next->export_description (next); + const char *base = plugin->export_description (plugin); if (!base) return NULL; @@ -506,11 +541,13 @@ static struct nbdkit_filter filter = { .config = ext2_config, .config_complete = ext2_config_complete, .config_help = ext2_config_help, - .thread_model = ext2_thread_model, + .after_fork = ext2_after_fork, + .cleanup = ext2_cleanup, .list_exports = ext2_list_exports, .default_export = ext2_default_export, .open = ext2_open, .prepare = ext2_prepare, + .finalize = ext2_finalize, .close = ext2_close, .can_fua = ext2_can_fua, .can_cache = ext2_can_cache, -- 2.31.1