Eric Blake
2021-Feb-18 23:23 UTC
[Libguestfs] [nbdkit PATCH] cache: Advertise flush/fua/multi_conn if cache=unsafe
When using cache=unsafe, we are already stating that the plugin will not necessarily ever get our cached data written back, but locally everything is already consistent (we grab a shared lock before consulting the cache), so we can behave as if client flush requests always works in a multi-conn safe manner, even without bothering with fdatasync() on our cache file. When using cache=writethrough, our behavior under multiple connections is at the mercy of the plugin (if connection A and B both write, but only B flushes, we can only guarantee that the data from A was flushed if the plugin advertises multi-conn; as otherwise, the write by connection A may still be in a cache that was not flushed by B's command). But when using cache=writeback, we are guaranteed that only one connection is ever writing back to the server at a time (that is, once we flush, we are flushing data from ALL connections), so we can always advertise multi-conn. --- This is fallout from IRC discussion we had earlier today about whether it was safe to enable multi-conn on the ssh plugin. After re-reading https://sourceforge.net/p/nbd/mailman/nbd-general/?viewmonth=201610&viewday=3, my take is that the original goal for MULTI_CONN is that when clear, if connection A and B both write, then both must FLUSH to guarantee that their writes reached persistent storage (if only one flushes, the other connection may still have cached data that remains unflushed). But when MULTI_CONN is set, only one of the two connections has to issue a flush after both connections have received replies to all writes that are intended to be made persistent. And the way we share our cache for cache=unsafe and cache=writeback meets that goal. filters/cache/cache.c | 56 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/filters/cache/cache.c b/filters/cache/cache.c index b979f256..9e5a3e80 100644 --- a/filters/cache/cache.c +++ b/filters/cache/cache.c @@ -257,6 +257,53 @@ cache_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return 1; } +/* Override the plugin's .can_flush, if we are cache=unsafe */ +static int +cache_can_flush (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + if (cache_mode == CACHE_MODE_UNSAFE) + return 1; + return next_ops->can_flush (nxdata); +} + + +/* Override the plugin's .can_fua, if we are cache=unsafe */ +static int +cache_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + if (cache_mode == CACHE_MODE_UNSAFE) + return NBDKIT_FUA_NATIVE; + return next_ops->can_fua (nxdata); +} + +/* Override the plugin's .can_multi_conn, if we are not cache=writethrough */ +static int +cache_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* For CACHE_MODE_NONE, we always advertise a no-op flush because + * our local cache access is consistent between connections, and we + * don't care about persisting the data to the underlying plugin. + * + * For CACHE_MODE_WRITEBACK, things are more subtle: we only write + * to the plugin during NBD_CMD_FLUSH, at which point that one + * connection writes back ALL cached blocks regardless of which + * connection originally wrote them, so a client can be assured that + * blocks from all connections have reached the plugin's permanent + * storage with only one connection having to send a flush. + * + * But for CACHE_MODE_WRITETHROUGH, we are at the mercy of the + * plugin; data written by connection A is not guaranteed to be made + * persistent by a flush from connection B unless the plugin itself + * supports multi-conn. + */ + if (cache_mode !== CACHE_MODE_WRITETHROUGH) + return 1; + return next_ops->can_multi_conn (nxdata); +} + /* Read data. */ static int cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -352,7 +399,8 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, } if ((flags & NBDKIT_FLAG_FUA) && - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { + (cache_mode == CACHE_MODE_UNSAFE || + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) { flags &= ~NBDKIT_FLAG_FUA; need_flush = true; } @@ -442,7 +490,8 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, flags &= ~NBDKIT_FLAG_MAY_TRIM; if ((flags & NBDKIT_FLAG_FUA) && - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { + (cache_mode == CACHE_MODE_UNSAFE || + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) { flags &= ~NBDKIT_FLAG_FUA; need_flush = true; } @@ -639,6 +688,9 @@ static struct nbdkit_filter filter = { .get_size = cache_get_size, .can_cache = cache_can_cache, .can_fast_zero = cache_can_fast_zero, + .can_flush = cache_can_flush, + .can_fua = cache_can_fua, + .can_multi_conn = cache_can_multi_conn, .pread = cache_pread, .pwrite = cache_pwrite, .zero = cache_zero, -- 2.30.1
Richard W.M. Jones
2021-Feb-19 19:37 UTC
[Libguestfs] [nbdkit PATCH] cache: Advertise flush/fua/multi_conn if cache=unsafe
On Thu, Feb 18, 2021 at 05:23:20PM -0600, Eric Blake wrote:> When using cache=unsafe, we are already stating that the plugin will > not necessarily ever get our cached data written back, but locally > everything is already consistent (we grab a shared lock before > consulting the cache), so we can behave as if client flush requests > always works in a multi-conn safe manner, even without bothering with > fdatasync() on our cache file. > > When using cache=writethrough, our behavior under multiple connections > is at the mercy of the plugin (if connection A and B both write, but > only B flushes, we can only guarantee that the data from A was flushed > if the plugin advertises multi-conn; as otherwise, the write by > connection A may still be in a cache that was not flushed by B's > command). But when using cache=writeback, we are guaranteed that only > one connection is ever writing back to the server at a time (that is, > once we flush, we are flushing data from ALL connections), so we can > always advertise multi-conn. > --- > > This is fallout from IRC discussion we had earlier today about whether > it was safe to enable multi-conn on the ssh plugin. > > After re-reading > https://sourceforge.net/p/nbd/mailman/nbd-general/?viewmonth=201610&viewday=3, > my take is that the original goal for MULTI_CONN is that when clear, > if connection A and B both write, then both must FLUSH to guarantee > that their writes reached persistent storage (if only one flushes, the > other connection may still have cached data that remains unflushed). > But when MULTI_CONN is set, only one of the two connections has to > issue a flush after both connections have received replies to all > writes that are intended to be made persistent. And the way we share > our cache for cache=unsafe and cache=writeback meets that goal. > > filters/cache/cache.c | 56 +++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 54 insertions(+), 2 deletions(-) > > diff --git a/filters/cache/cache.c b/filters/cache/cache.c > index b979f256..9e5a3e80 100644 > --- a/filters/cache/cache.c > +++ b/filters/cache/cache.c > @@ -257,6 +257,53 @@ cache_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > return 1; > } > > +/* Override the plugin's .can_flush, if we are cache=unsafe */ > +static int > +cache_can_flush (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle) > +{ > + if (cache_mode == CACHE_MODE_UNSAFE) > + return 1; > + return next_ops->can_flush (nxdata); > +} > + > + > +/* Override the plugin's .can_fua, if we are cache=unsafe */ > +static int > +cache_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle) > +{ > + if (cache_mode == CACHE_MODE_UNSAFE) > + return NBDKIT_FUA_NATIVE; > + return next_ops->can_fua (nxdata); > +} > + > +/* Override the plugin's .can_multi_conn, if we are not cache=writethrough */ > +static int > +cache_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle) > +{ > + /* For CACHE_MODE_NONE, we always advertise a no-op flush because > + * our local cache access is consistent between connections, and we > + * don't care about persisting the data to the underlying plugin. > + * > + * For CACHE_MODE_WRITEBACK, things are more subtle: we only write > + * to the plugin during NBD_CMD_FLUSH, at which point that one > + * connection writes back ALL cached blocks regardless of which > + * connection originally wrote them, so a client can be assured that > + * blocks from all connections have reached the plugin's permanent > + * storage with only one connection having to send a flush. > + * > + * But for CACHE_MODE_WRITETHROUGH, we are at the mercy of the > + * plugin; data written by connection A is not guaranteed to be made > + * persistent by a flush from connection B unless the plugin itself > + * supports multi-conn. > + */ > + if (cache_mode !== CACHE_MODE_WRITETHROUGH) > + return 1; > + return next_ops->can_multi_conn (nxdata); > +} > + > /* Read data. */ > static int > cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, > @@ -352,7 +399,8 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, > } > > if ((flags & NBDKIT_FLAG_FUA) && > - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { > + (cache_mode == CACHE_MODE_UNSAFE || > + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) { > flags &= ~NBDKIT_FLAG_FUA; > need_flush = true; > } > @@ -442,7 +490,8 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > > flags &= ~NBDKIT_FLAG_MAY_TRIM; > if ((flags & NBDKIT_FLAG_FUA) && > - next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { > + (cache_mode == CACHE_MODE_UNSAFE || > + next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE)) { > flags &= ~NBDKIT_FLAG_FUA; > need_flush = true; > } > @@ -639,6 +688,9 @@ static struct nbdkit_filter filter = { > .get_size = cache_get_size, > .can_cache = cache_can_cache, > .can_fast_zero = cache_can_fast_zero, > + .can_flush = cache_can_flush, > + .can_fua = cache_can_fua, > + .can_multi_conn = cache_can_multi_conn, > .pread = cache_pread, > .pwrite = cache_pwrite, > .zero = cache_zero, > -- > 2.30.1ACK Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW