Nir Soffer
2021-Aug-03 18:18 UTC
[Libguestfs] [PATCH virt-v2v 1/2] v2v: -o rhv-upload: Split plugin functions from create/finalize
On Tue, Aug 3, 2021 at 8:48 PM Richard W.M. Jones <rjones at redhat.com> wrote:> > On Tue, Aug 03, 2021 at 07:51:03PM +0300, Nir Soffer wrote: > > On Tue, Aug 3, 2021 at 3:10 PM Richard W.M. Jones <rjones at redhat.com> wrote: > > > > > > The existing plugin created a new disk in open() and tried to finalize > > > it in close(). A correct NBD client may make several connections > > > (eg. connecting just to query for multi-conn or other metadata). This > > > could result in multiple disks being created. > > > > > > Let's fix this by using Nir's suggestion to split out the create and > > > finalize functions. The nbdkit plugin becomes a much more > > > straightforward plugin - it no longer depends on the oVirt SDK at all > > > so could in theory be rewritten in another programming language. The > > > other functions have been moved into new scripts ("transfer", > > > "finalize", "cancel") which are run separately. > > > > > > Some subtler points about this commit: > > > > > > - We no longer use or need the @failing decorator. I have extended > > > the cleanup code so as well as deleting disks it also cancels the > > > transfers. Any failure detected by qemu-img convert will result in > > > virt-v2v exiting and cancelling the transfer, which is much cleaner > > > and more robust. > > > > Canceling the transfer on errors is fine but deleting a disk *used* by > > a transfer will fail since the transfer owns the disk. Once the transfer > > was created, there is no need to delete the disk, it will delete the disk > > on errors. > > (Commented on below) > > > > - We still require the HTTP pool. The reasons is a bit subtle: Since > > > we are using the parallel thread model, pread and pwrite (etc) can > > > be called at any time, even overlapping, so storing an HTTP handle > > > per NBD connection does not work as the handle gets reused (which > > > results in a fatal error). > > > > Do we need the parallel threading model now? If we create a python instance > > per connection, one http connection to image is best. > > I did consider this, but I thought it would be better to keep the > parallel thread model and multiple the requests down to a fixed number > of HTTP connections (in the pool). In the eventual multi-conn case it > would look like: > > +----------+ +---------+ > -----| nbdkit p|--------| imageio | > -----| o|--------| | > -----| o| (8) | | > -----| l|--------| | > +----------+ +---------+ > 4 multi-conn HTTP > NBD connections > > On the left you'd have up to 4 x 64 NBD requests coming in. In the > middle (assuming no HTTP pipelining is possible) the requests are > issued by multiple threads on to the 8 HTTP connections to imageio. > > > > diff --git a/v2v/rhv-upload-deletedisks.py b/v2v/rhv-upload-cancel.py > > > similarity index 80% > > > rename from v2v/rhv-upload-deletedisks.py > > > rename to v2v/rhv-upload-cancel.py > > > index 8ed8cb88b..f08713b8c 100644 > > > --- a/v2v/rhv-upload-deletedisks.py > > > +++ b/v2v/rhv-upload-cancel.py > > > @@ -1,6 +1,6 @@ > > > # -*- python -*- > > > -# oVirt or RHV upload delete disks used by ?virt-v2v -o rhv-upload? > > > -# Copyright (C) 2019 Red Hat Inc. > > > +# oVirt or RHV upload cancel used by ?virt-v2v -o rhv-upload? > > > +# Copyright (C) 2019-2021 Red Hat Inc. > > > # > > > # This program is free software; you can redistribute it and/or modify > > > # it under the terms of the GNU General Public License as published by > > > @@ -57,13 +57,22 @@ connection = sdk.Connection( > > > ) > > > > You should close the connection (helps engine manages its sessions). > > The best way is using: > > > > from contextlib import closing > > > > with closing(connection): > > code using the econnection here... > > > > I'm not sure if connection.close() does more than socket.close(), but it > > is better to be explicit. > > OK I'll add this. > > It's worth noting that in my local copy I'm now closing the HTTP > connections in the plugin on exit, which I think wasn't in the patch > posted. It does unfortunately require a small change to nbdkit to do > this (https://gitlab.com/nbdkit/nbdkit/-/commit/f2fe99e4b0f54467ab8028eaf2d039cf918b2961). > > > > system_service = connection.system_service() > > > +image_transfers_service = system_service.image_transfers_service() > > > + > > > +# Try to cancel the transfers. This will probably delete the disk. > > > > This *will* delete the disk, not probably. > > > > > +for id in params['transfer_ids']: > > > + try: > > > + transfer_service = image_transfers_service.image_transfer_service(id) > > > + transfer_service.cancel() > > > + except sdk.NotFoundError: > > > + pass > > > > This can fail because of other reasons, and in this case we will fail > > on the first > > failure, instead of trying to cancel all transfers. Previously we always cancel > > all transfers on all errors. > > > > I think this should be: > > > > for id in params['transfer_ids']: > > try: > > cancel the transfer... > > except sdk.NotFoundError: > > log warning - we don't expect non existing transfer > > except Exception: > > log traceback with the transfer id, someone will have to do manual > > cleanup using this id. > > OK I'll see if I can fix this. > > > Note that cancel is async, so the transfer is still in the cleanup > > flow when we reach this line, so... > > > > > disks_service = system_service.disks_service() > > > > > > +# As a last resort, try to remove the disks. > > > for uuid in params['disk_uuids']: > > > - # Try to get and remove the disk, however do not fail > > > - # if it does not exist (maybe removed in the meanwhile). > > > try: > > > disk_service = disks_service.disk_service(uuid) > > > disk_service.remove() > > > - except sdk.NotFoundError: > > > + except (sdk.NotFoundError, sdk.Error): > > > pass > > > > this will likely fail since the transfer still holds a lock on the disk. > > > > We don't need to delete the disks, the transfers will delete their disk. > > Right, but I believe there is a window between creating the disk and > adding it to the transfer, and therefore this is an attempt to have a > "backup plan" to delete the disk. > > I did discover that the transfer deletes the disk, which was why I > changed the exception to ignore sdk.Error. > > ... > > > > # Using version 2 supporting the buffer protocol for better performance. > > > API_VERSION = 2 > > > > > > @@ -43,31 +34,47 @@ API_VERSION = 2 > > > # client, this give best performance. > > > MAX_CONNECTIONS = 4 > > > > This is not need now since we use mult_con. This can also casue > > a dealock since the underlying qemu-nbd per this transfer supports up > > to 8 clients (--shared 8). If we create 4 http connections per nbdkit > > connections, we will have 16 connections. 8 of these will be blocked > > in imageio on qemu-nbd, which can lead to deadlock or make some > > nbdkit connections ineffective. > > > > To simplify this huge refactoring, I would use MAX_CONNECITIONS = 1 > > for now. > > Since the pool is a global (there is only one plugin instance, but > open() is called 4 times), I believe this actually limits the pool to > 4, unless max_readers/writers overrides it.I see, so if I understand how it works correctly, we have: after fork: - called once - create pool of connections open: - called once per nbd connection - does nothing since we already opened the connections write/zero: - called once per nbd command - pick random connection and send the request flush: - called once per nbd connection - flush all connections close: - called once per nbd connection - close all connections on first call - does nothing on later call So we have some issues: - calling flush 16 times instead of 4 times Can we disable the parallel threading model with multi_con, or we must use it to have concurrent calls?> > > +def after_fork(): > > > + global options, pool > > > > This works but hiding the globals here makes it hard to understand the code. > > The globalls should be initialized at the top of the module. > > OK. > > PS. The way "global" works in Python is dumb! > > ... > > > > -def create_http_pool(url, host, options): > > > +# Connection pool. > > > +def create_http_pool(url, options): > > > pool = queue.Queue() > > > > > > count = min(options["max_readers"], > > > options["max_writers"], > > > MAX_CONNECTIONS) > > > > Note that this logic must move to virt-v2v now. If imageio does not support > > multiple writers, you should not use multi_con. > > > > Since we create the transfer with format="raw", we should always > > support 8 writers, but the correctness this must be enforce using > > max_writers option. > > I don't understand this comment.In create_transfer we use create the image transfer with: format="raw" This enables the nbd backend in imageio. The system starts qemu-nbd and imageio connects to qemu-nbd instead of using the file backend opening the actual disk. The nbd backend supports multiple writers, but the file backend does not.> ... > > > > +# Parameters are passed in via a JSON doc from the OCaml code. > > > +# Because this Python code ships embedded inside virt-v2v there > > > +# is no formal API here. > > > +params = None > > > + > > > +if len(sys.argv) != 2: > > > + raise RuntimeError("incorrect number of parameters") > > > + > > > +# Parameters are passed in via a JSON document. > > > +with open(sys.argv[1], 'r') as fp: > > > + params = json.load(fp) > > > > This works, but why not read the json from stdin in the same way > > we write the output to stdout? > > It's a bit complicated to have two pipes to an external process, and > the other programs pass a params file as a parameter, that's the only > reason.This is simple in python, here is an example: https://github.com/oVirt/vdsm/blob/511d7a4aea9beb6e375d05dfffa0c135a15bb3b4/lib/vdsm/storage/managedvolume.py#L186> > > +# Send the destination URL, transfer ID, and host flag back to OCaml code. > > > +results = { > > > + "transfer_id": transfer.id, > > > + "destination_url": destination_url, > > > + "is_host": host is not None, > > > > is_host is not clear, mayb "is_local"? > > Yes, I agree. Or "is_ovirt_host" perhaps. > > Thanks, > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > Fedora Windows cross-compiler. Compile Windows programs, test, and > build Windows installers. Over 100 libraries supported. > http://fedoraproject.org/wiki/MinGW >
Richard W.M. Jones
2021-Aug-03 19:25 UTC
[Libguestfs] [PATCH virt-v2v 1/2] v2v: -o rhv-upload: Split plugin functions from create/finalize
On Tue, Aug 03, 2021 at 09:18:50PM +0300, Nir Soffer wrote:> On Tue, Aug 3, 2021 at 8:48 PM Richard W.M. Jones <rjones at redhat.com> wrote: > > Since the pool is a global (there is only one plugin instance, but > > open() is called 4 times), I believe this actually limits the pool to > > 4, unless max_readers/writers overrides it. > > I see, so if I understand how it works correctly, we have: > > after fork: > - called once > - create pool of connectionsYes. Note there's only one nbdkit process and one Python interpreter running even if the client connects multiple times.> open: > - called once per nbd connection > - does nothing since we already opened the connectionsYup.> write/zero: > - called once per nbd command > - pick random connection and send the requestYes. But crucially because the thread model is parallel, and because certain Python http requests will block and release the Python GIL, it can be that nbdkit will call into the plugin in parallel even on the same handle. But when running Python code we are never preempted unless we call some C extension that blocks and releases the GIL.> flush: > - called once per nbd connection > - flush all connectionsYes. However this is controlled by the client. It happens that both qemu-img convert and nbdcopy do a flush operation right at the end. qemu-img convert doesn't support multi-conn, so there's only one NBD connection and one flush. nbdcopy happens to call flush on every NBD connection (but that's just because we are being over-cautious, I think multi-conn guarantees mean we don't have to do this). https://gitlab.com/nbdkit/libnbd/-/blob/3d9d23b1b0d1df049782555ad602476c35aad6b0/copy/nbd-ops.c#L174> close: > - called once per nbd connection > - close all connections on first call > - does nothing on later callI think the patch I posted had a mistake in it where I closed the whole thread pool in close(). In my own copy the thread pool is destroyed on cleanup() (when nbdkit exits). However this requires a small change to nbdkit (https://gitlab.com/nbdkit/nbdkit/-/commit/f2fe99e4b0f54467ab8028eaf2d039cf918b2961)> So we have some issues: > - calling flush 16 times instead of 4 timesYes I think so, but probably the later flushes don't do very much work?> Can we disable the parallel threading model with multi_con, or we must > use it to have concurrent calls?The client decides how many connections to make, and that happens after we've made the decision to choose a thread model.> > > > -def create_http_pool(url, host, options): > > > > +# Connection pool. > > > > +def create_http_pool(url, options): > > > > pool = queue.Queue() > > > > > > > > count = min(options["max_readers"], > > > > options["max_writers"], > > > > MAX_CONNECTIONS) > > > > > > Note that this logic must move to virt-v2v now. If imageio does > > > not support multiple writers, you should not use multi_con. > > > Since we create the transfer with format="raw", we should always > > > support 8 writers, but the correctness this must be enforce > > > using max_writers option. > > > > I don't understand this comment. > > In create_transfer we use create the image transfer with: > > format="raw" > > This enables the nbd backend in imageio. The system starts qemu-nbd > and imageio connects to qemu-nbd instead of using the file backend > opening the actual disk. The nbd backend supports multiple writers, but > the file backend does not.I see. However I'm not sure I understand why the logic for choosing "count" has to move to virt-v2v. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Eric Blake
2021-Aug-03 19:39 UTC
[Libguestfs] [PATCH virt-v2v 1/2] v2v: -o rhv-upload: Split plugin functions from create/finalize
On Tue, Aug 03, 2021 at 09:18:50PM +0300, Nir Soffer wrote:> I see, so if I understand how it works correctly, we have: > > after fork: > - called once > - create pool of connections > > open: > - called once per nbd connection > - does nothing since we already opened the connections > > write/zero: > - called once per nbd command > - pick random connection and send the request > > flush: > - called once per nbd connection > - flush all connectionsIf the other end of the http connections is multi-conn consistent (right now, qemu-nbd doesn't yet advertise multi-conn for writable exports, but everything I've seen in the code says that it IS consistent because it serializes flush requests at the block layer backend regardless of which client front-end sent them - it's still on my todo list to get qemu 6.2 to advertise multi-conn for writable qemu-nbd), then you only have to flush on one random connection. If the other end of the http connection is not multi-conn consistent, then you do indeed have to flush ALL of the http connections; at which point, you are somewhat recreating the effects of the nbdkit multi-conn filter. However, the multi-conn filter only knows how to propagate flush across openings into the plugin, and not into the pool of http connections embedded within the plugin, so the duplication is probably necessary rather than relying on the filter.> > close: > - called once per nbd connection > - close all connections on first call > - does nothing on later callOr do you want it to be more reference-counted, where it leaves all connections open until the LAST nbd connection is closed?> > So we have some issues: > - calling flush 16 times instead of 4 timesAre the later flushes expensive, or is the server smart enough to optimize that if nothing has changed since the first flush, the later flushes return immediately? And whether it is called 1, 4, or 16 times (assuming 4 NBD clients each talking to a pool of 4 http connections) depends on which parts of the chain advertise (or assume) multi-conn semantics. If you have multi-conn all the way through, a single flush on any one http connection is immediately visible to all other connections, and you don't need the remaining flushes. If you do not have multi-conn consistency natively, then yes, replicating the flush across all connections is necessary, although the multi-conn filter may let you tweak whether that replication has to be done by the client or can be emulated by nbdkit. Also, the multi-conn filter can turn a single flush from the NBD client into multiple flushes to the plugin; but it also has an operational mode where it recognizes multiple flushes from NBD clients and turns the latter flushes into no-ops if nothing has changed in the meantime from the plugin.> > Can we disable the parallel threading model with multi_con, or we must > use it to have concurrent calls?Parallel threading is necessary if you want concurrent calls (more than one outstanding read or write at a time). That's independent of whether you have multi-conn. Note that you can have a semi-serialized connection that advertises multi-conn but does not allow parallel requests on a given connection (you have to use parallel NBD clients to get any parallelism); and conversely you can have a parallel connection that does not advertise multi-conn (parallel requests are supported on any connection, but flush on one connection does not necessarily affect the other connections, and you may see stale data from a per-connection cache if you don't flush on all connections). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org