Richard W.M. Jones
2019-Jan-05 07:42 UTC
Re: [Libguestfs] [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
On Fri, Jan 04, 2019 at 05:26:07PM -0600, Eric Blake wrote:> On 1/4/19 4:08 PM, Richard W.M. Jones wrote: > > First thing to say is that I need to do a *lot* more testing on this, > > so this is just an early peek. In particular, although it passed > > ‘make check && make check-valgrind’ I have *not* tested it against a > > multi-conn-aware client such as the Linux kernel >= 4.9. > > > > This implements NBD_FLAG_CAN_MULTI_CONN, described in the protocol doc > > as: > > > > "NBD_FLAG_CAN_MULTI_CONN: Indicates that the server operates > > entirely without cache, or that the cache it uses is shared among > > all connections to the given device. In particular, if this flag is > > present, then the effects of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA MUST > > be visible across all connections when the server sends its reply to > > that command to the client. In the absence of this flag, clients > > SHOULD NOT multiplex their commands over more than one connection to > > the export." > > > > This is necessary to support the Linux nbd client -C option. > > Sounds promising. And reminds me that I have not even tried to implement > this option for qemu yet, but probably should.I've yet to do testing, but it seems to be necessary if we're going to do scaling with the Linux client.> > The only plugin which sets the flag so far is file. The ocaml, sh and > > nbd plugins allow the flag to be controlled or passed through. > > > > Notable also is that the blocksize filter has to filter out this flag, > > because I'm not convinced that the bounce buffer is safe. However I > > believe the other filters *are* safe (although not really certain > > about the fua filter, and the new cache filter is tricky too). > > > > My feeling is that we should set the flag unconditionally for all > > readonly connections, but I can think of nasty corner cases where it > > might not work. We should most probably set it in all plugins that > > are readonly (eg. nbdkit-pattern-plugin). > > Should the callback include a bool readonly parameter that lets the > filter/plugin know for sure whether it is answering the question on a > read-only connection vs. on a read-write connection, as the answer may > differ between those two?Yes, I think this is a good idea. Will try that for v2.> Should we automatically set the bit for any plugin that has > fully-serialized operation but does not provide the callback? That is, > if a plugin uses SERIALIZE_CONNECTIONS, you can never have multiple > clients at the same time anyways (so the flag is trivially ignorable - a > client can try to set up a second connection for speed, but it won't > help). A plugin that uses SERIALIZE_ALL_REQUESTS is also trivially > supported - any flush request will complete (regardless of which > connected client made it) prior to any other client being able to make > progress on a request that would need to see the results of the flush. > Only when we get to SERIALIZE_REQUESTS or PARALLEL can we have the > situation where one client's request can outpace the flush operation in > another client.I believe the answer here is _no_: Imagine a plugin which serves different data on different connections, ie. complete different data, such as a "roulette" plugin which selects a different disk for each handle. Such a plugin is of course insane, and it's not something that any of _our_ plugins do, but it's something which is possible. Currently a roulette plugin works with well-behaved clients because they will only open a single connection and thus will always see a single view of the virtual disk. However if this plugin advertized NBD_FLAG_CAN_MULTI_CONN then clients could connect multiple times, and would see completely different data on each connection, which would break those clients horribly. Note this argument is not affected by the thread model or the readonly flag. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2019-Jan-05 09:22 UTC
Re: [Libguestfs] [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
On Sat, Jan 05, 2019 at 07:42:56AM +0000, Richard W.M. Jones wrote:> > Should the callback include a bool readonly parameter that lets the > > filter/plugin know for sure whether it is answering the question on a > > read-only connection vs. on a read-write connection, as the answer may > > differ between those two? > > Yes, I think this is a good idea. Will try that for v2.This leads to a subtle layering violation. I'm assuming that we don't pass the same readonly flag that we pass to ‘.open’, but instead we would pass conn->readonly which is the computed readonly status after considering the whole stack of filters + plugin ‘.can_write’. This means that a filter which enables write would cause a plugin which only understands readonly connections to be called with ‘.can_multi_conn (readonly=0)’. But a plugin which thinks it is a readonly plugin may reasonably assume that it can ignore the readonly parameter of ‘.can_multi_conn’ and return true regardless. Now having said that I don't think Bad Things can happen, assuming the filter also filters ‘.can_multi_conn’ correctly. But it makes me think two things: (1) It is possible for plugins to save the readonly flag from ‘.open’ in the handle. So we don't need to handle this corner case. (2) Or we might address this by having an nbdkit API function which returns the computed readonly status of the whole server. (Similarly we could have one which returns the computed thread model of the whole server.) These functions might be more generally useful than passing flags to specific callbacks on a case-by-case basis. And we can caveat them properly in the documentation by noting that these functions return the status of the whole stack rather than the current layer. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Richard W.M. Jones
2019-Jan-05 11:15 UTC
Re: [Libguestfs] [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
On Sat, Jan 05, 2019 at 07:42:56AM +0000, Richard W.M. Jones wrote:> I've yet to do testing, but it seems to be necessary if we're going to > do scaling with the Linux client.Some preliminary numbers on this. I've enabled multi-conn for both the file and memory plugins, and I'm using fio with 8 threads to test this against the Linux kernel client. Without multi-conn: memory: read: IOPS=52.8k, BW=206MiB/s (216MB/s)(24.2GiB/120002msec) write: IOPS=52.8k, BW=206MiB/s (216MB/s)(24.2GiB/120002msec) file: read: IOPS=48.3k, BW=189MiB/s (198MB/s)(22.1GiB/120001msec) write: IOPS=48.3k, BW=189MiB/s (198MB/s)(22.1GiB/120001msec) With multi-conn (-C 8): memory: read: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) write: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) file: read: IOPS=49.2k, BW=192MiB/s (202MB/s)(22.5GiB/120001msec) write: IOPS=49.2k, BW=192MiB/s (202MB/s)(22.5GiB/120001msec) So you can see that the file plugin doesn't move at all, which is not too surprising since perf shows that it is entirely blocked on Linux filesystem calls. IOW something in the kernel seems to be the problem. The memory plugin still has known scalability problems because it uses the global sparse array, but at least it's moving in the right direction. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2019-Jan-05 13:19 UTC
Re: [Libguestfs] [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
Here are some more interesting numbers, concentrating on the memory plugin and RAM disks. All are done using 8 threads and multi-conn, on a single unloaded machine with 16 cores, using a Unix domain socket. (1) The memory plugin using the sparse array, as implemented upstream in 1.9.8: read: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) write: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) (2) I moved the locking to around calls to the sparse array code, and changed the thread model to parallel: read: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec) write: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec) (3) I reimplemented the memory plugin using a simple malloc, which is how it used to be in nbdkit <= 1.5.8: read: IOPS=133k, BW=518MiB/s (544MB/s)(60.7GiB/120002msec) write: IOPS=133k, BW=518MiB/s (543MB/s)(60.7GiB/120002msec) (4) I ran fio directly against /dev/shm to get some idea of how much performance we are losing by using NBD at all: read: IOPS=1018k, BW=3978MiB/s (4171MB/s)(466GiB/120001msec) write: IOPS=1018k, BW=3979MiB/s (4172MB/s)(466GiB/120001msec) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Eric Blake
2019-Jan-05 16:08 UTC
Re: [Libguestfs] [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
On 1/5/19 3:22 AM, Richard W.M. Jones wrote:> On Sat, Jan 05, 2019 at 07:42:56AM +0000, Richard W.M. Jones wrote: >>> Should the callback include a bool readonly parameter that lets the >>> filter/plugin know for sure whether it is answering the question on a >>> read-only connection vs. on a read-write connection, as the answer may >>> differ between those two? >> >> Yes, I think this is a good idea. Will try that for v2. > > This leads to a subtle layering violation. I'm assuming that we don't > pass the same readonly flag that we pass to ‘.open’, but instead we > would pass conn->readonly which is the computed readonly status after > considering the whole stack of filters + plugin ‘.can_write’. > > This means that a filter which enables write would cause a plugin > which only understands readonly connections to be called with > ‘.can_multi_conn (readonly=0)’. But a plugin which thinks it is a > readonly plugin may reasonably assume that it can ignore the readonly > parameter of ‘.can_multi_conn’ and return true regardless. > > Now having said that I don't think Bad Things can happen, assuming the > filter also filters ‘.can_multi_conn’ correctly. But it makes me > think two things: > > (1) It is possible for plugins to save the readonly flag from ‘.open’ > in the handle. So we don't need to handle this corner case.At least the nbd plugin already saves the readonly flag from .open, so I'm fine if we just document that (1) is the preferred approach.> > (2) Or we might address this by having an nbdkit API function which > returns the computed readonly status of the whole server. (Similarly > we could have one which returns the computed thread model of the whole > server.) These functions might be more generally useful than passing > flags to specific callbacks on a case-by-case basis. And we can > caveat them properly in the documentation by noting that these > functions return the status of the whole stack rather than the current > layer.This gets awkward, because filters change the global state (on purpose), and the plugin should see the way the filter will use it, not the way the client will use it. Classic example: the cow filter lets the client see read-write even though the plugin is opened readonly; the plugin's response should NOT be based on the fact that the client can write. So I think that's an argument against (2). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Apparently Analagous Threads
- Re: [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
- Re: [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
- [PATCH nbdkit v2 01/11] server: Implement NBD_FLAG_CAN_MULTI_CONN.
- Re: [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.
- [PATCH nbdkit v2 08/11] file: Return NBD_FLAG_CAN_MULTI_CONN for the file plugin.