Nir Soffer
2019-Nov-28 19:34 UTC
[Libguestfs] [PATCH] rhv-upload: Fix waiting for transfer
We were not considering failures while initializing the transfer. In this case the transfer phase can change to PAUSED_SYSTEM or FINISHED_FAILURE, and transfer_url will be None, which failed the upload with a misleading error: RuntimeError: direct upload to host not supported, requires ovirt-engine >= 4.2 and only works when virt-v2v is run within the oVirt/RHV environment, eg. on an oVirt node Change the wait loop to consider all cases: - Transfer failed and was removed - Transfer failed and will be removed soon - Transfer paused by the system (cancel required) - Unexpected transfer phase (cancel required) - Timeout waiting for TRANSFERRING state (cancel required) Reported-by: Xiaodai Wang --- I could easy simulate the case when the system paused the transfer by injecting an error in vdsm, failing transfer initialization. The import fail with: nbdkit: python[1]: error: /home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py: open: error: Traceback (most recent call last): File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 109, in open transfer = create_transfer(connection, disk, host) File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 539, in create_transfer "transfer %s was paused by system" % transfer.id) RuntimeError: transfer 32b97384-ac8b-40d5-b423-26d31faabe32 was paused by system I could not simulate the other cases. This probaly requires injecting errors in engine. v2v/rhv-upload-plugin.py | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py index 75e4f404..3942ec72 100644 --- a/v2v/rhv-upload-plugin.py +++ b/v2v/rhv-upload-plugin.py @@ -513,21 +513,43 @@ def create_transfer(connection, disk, host): # Get a reference to the created transfer service. transfer_service = transfers_service.image_transfer_service(transfer.id) - # After adding a new transfer for the disk, the transfer's status - # will be INITIALIZING. Wait until the init phase is over. The - # actual transfer can start when its status is "Transferring". + # Wait until transfer's phase change from INITIALIZING to TRANSFERRING. On + # errors transfer's phase can change to PAUSED_SYSTEM or FINISHED_FAILURE. + # If the transfer was paused, we need to cancel it to remove the disk, + # otherwise the system will remove the disk and transfer shortly after. + endt = time.time() + timeout while True: - transfer = transfer_service.get() - if transfer.phase != types.ImageTransferPhase.INITIALIZING: + time.sleep(1) + try: + transfer = transfer_service.get() + except sdk.NotFoundError: + # The system has removed the disk and the transfer. + raise RuntimeError("transfer %s was removed" % transfer.id) + + if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE: + # The system will remove the disk and the transfer soon. + raise RuntimeError( + "transfer %s has failed" % transfer.id) + + if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM: + transfer_service.cancel() + raise RuntimeError( + "transfer %s was paused by system" % transfer.id) + + if transfer.phase == types.ImageTransferPhase.TRANSFERRING: break - if time.time() > endt: + + if transfer.phase != types.ImageTransferPhase.INITIALIZING: transfer_service.cancel() raise RuntimeError( - "timed out waiting for transfer %s status != INITIALIZING" - % transfer.id) + "unexpected transfer %s phase %s" + % (transfer.id, transfer.phase)) - time.sleep(1) + if time.time() > endt: + transfer_service.cancel() + raise RuntimeError( + "timed out waiting for transfer %s" % transfer.id) return transfer -- 2.21.0
Richard W.M. Jones
2019-Nov-28 20:58 UTC
Re: [Libguestfs] [PATCH] rhv-upload: Fix waiting for transfer
On Thu, Nov 28, 2019 at 09:34:18PM +0200, Nir Soffer wrote:> We were not considering failures while initializing the transfer. In > this case the transfer phase can change to PAUSED_SYSTEM or > FINISHED_FAILURE, and transfer_url will be None, which failed the > upload with a misleading error: > > RuntimeError: direct upload to host not supported, requires > ovirt-engine >= 4.2 and only works when virt-v2v is run within the > oVirt/RHV environment, eg. on an oVirt node > > Change the wait loop to consider all cases: > - Transfer failed and was removed > - Transfer failed and will be removed soon > - Transfer paused by the system (cancel required) > - Unexpected transfer phase (cancel required) > - Timeout waiting for TRANSFERRING state (cancel required) > > Reported-by: Xiaodai Wang > --- > > I could easy simulate the case when the system paused the transfer by > injecting an error in vdsm, failing transfer initialization. > > The import fail with: > > nbdkit: python[1]: error: /home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py: open: error: Traceback (most recent call last): > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 109, in open > transfer = create_transfer(connection, disk, host) > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 539, in create_transfer > "transfer %s was paused by system" % transfer.id) > RuntimeError: transfer 32b97384-ac8b-40d5-b423-26d31faabe32 was paused by system > > I could not simulate the other cases. This probaly requires injecting > errors in engine.You might be able to inject errors more easily than that by modifying the test harness (tests/test-v2v-o-rhv-upload-module/ovirtsdk4/). Anyway patch looks reasonable, although I didn't test it, so: ACK Rich.> v2v/rhv-upload-plugin.py | 40 +++++++++++++++++++++++++++++++--------- > 1 file changed, 31 insertions(+), 9 deletions(-) > > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py > index 75e4f404..3942ec72 100644 > --- a/v2v/rhv-upload-plugin.py > +++ b/v2v/rhv-upload-plugin.py > @@ -513,21 +513,43 @@ def create_transfer(connection, disk, host): > # Get a reference to the created transfer service. > transfer_service = transfers_service.image_transfer_service(transfer.id) > > - # After adding a new transfer for the disk, the transfer's status > - # will be INITIALIZING. Wait until the init phase is over. The > - # actual transfer can start when its status is "Transferring". > + # Wait until transfer's phase change from INITIALIZING to TRANSFERRING. On > + # errors transfer's phase can change to PAUSED_SYSTEM or FINISHED_FAILURE. > + # If the transfer was paused, we need to cancel it to remove the disk, > + # otherwise the system will remove the disk and transfer shortly after. > + > endt = time.time() + timeout > while True: > - transfer = transfer_service.get() > - if transfer.phase != types.ImageTransferPhase.INITIALIZING: > + time.sleep(1) > + try: > + transfer = transfer_service.get() > + except sdk.NotFoundError: > + # The system has removed the disk and the transfer. > + raise RuntimeError("transfer %s was removed" % transfer.id) > + > + if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE: > + # The system will remove the disk and the transfer soon. > + raise RuntimeError( > + "transfer %s has failed" % transfer.id) > + > + if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM: > + transfer_service.cancel() > + raise RuntimeError( > + "transfer %s was paused by system" % transfer.id) > + > + if transfer.phase == types.ImageTransferPhase.TRANSFERRING: > break > - if time.time() > endt: > + > + if transfer.phase != types.ImageTransferPhase.INITIALIZING: > transfer_service.cancel() > raise RuntimeError( > - "timed out waiting for transfer %s status != INITIALIZING" > - % transfer.id) > + "unexpected transfer %s phase %s" > + % (transfer.id, transfer.phase)) > > - time.sleep(1) > + if time.time() > endt: > + transfer_service.cancel() > + raise RuntimeError( > + "timed out waiting for transfer %s" % transfer.id) > > return transfer > > -- > 2.21.0-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Nir Soffer
2019-Nov-28 21:04 UTC
Re: [Libguestfs] [PATCH] rhv-upload: Fix waiting for transfer
On Thu, Nov 28, 2019 at 10:58 PM Richard W.M. Jones <rjones@redhat.com> wrote:> > On Thu, Nov 28, 2019 at 09:34:18PM +0200, Nir Soffer wrote: > > We were not considering failures while initializing the transfer. In > > this case the transfer phase can change to PAUSED_SYSTEM or > > FINISHED_FAILURE, and transfer_url will be None, which failed the > > upload with a misleading error: > > > > RuntimeError: direct upload to host not supported, requires > > ovirt-engine >= 4.2 and only works when virt-v2v is run within the > > oVirt/RHV environment, eg. on an oVirt node > > > > Change the wait loop to consider all cases: > > - Transfer failed and was removed > > - Transfer failed and will be removed soon > > - Transfer paused by the system (cancel required) > > - Unexpected transfer phase (cancel required) > > - Timeout waiting for TRANSFERRING state (cancel required) > > > > Reported-by: Xiaodai Wang > > --- > > > > I could easy simulate the case when the system paused the transfer by > > injecting an error in vdsm, failing transfer initialization. > > > > The import fail with: > > > > nbdkit: python[1]: error: /home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py: open: error: Traceback (most recent call last): > > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 109, in open > > transfer = create_transfer(connection, disk, host) > > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 539, in create_transfer > > "transfer %s was paused by system" % transfer.id) > > RuntimeError: transfer 32b97384-ac8b-40d5-b423-26d31faabe32 was paused by system > > > > I could not simulate the other cases. This probaly requires injecting > > errors in engine. > > You might be able to inject errors more easily than that by modifying > the test harness (tests/test-v2v-o-rhv-upload-module/ovirtsdk4/).Good idea. I'm testing currently with real ovirt setup which is pretty easy when you have one, but we should improve the automated tests. I also cannot run the tests, "make check" fail very early with: SRCDIR=. LAYOUT=partitions ../../run --test ./make-fedora-img.pl Can't locate Sys/Guestfs.pm in @INC (you may need to install the Sys::Guestfs module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at /home/nsoffer/src/virt-v2v/test-data/phony-guests/make-fedora-img.pl line 27. BEGIN failed--compilation aborted at /home/nsoffer/src/virt-v2v/test-data/phony-guests/make-fedora-img.pl line 27. /home/nsoffer/src/virt-v2v/run: command failed with exit code 2> Anyway patch looks reasonable, although I didn't test it, so: > > ACK > > Rich. > > > v2v/rhv-upload-plugin.py | 40 +++++++++++++++++++++++++++++++--------- > > 1 file changed, 31 insertions(+), 9 deletions(-) > > > > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py > > index 75e4f404..3942ec72 100644 > > --- a/v2v/rhv-upload-plugin.py > > +++ b/v2v/rhv-upload-plugin.py > > @@ -513,21 +513,43 @@ def create_transfer(connection, disk, host): > > # Get a reference to the created transfer service. > > transfer_service = transfers_service.image_transfer_service(transfer.id) > > > > - # After adding a new transfer for the disk, the transfer's status > > - # will be INITIALIZING. Wait until the init phase is over. The > > - # actual transfer can start when its status is "Transferring". > > + # Wait until transfer's phase change from INITIALIZING to TRANSFERRING. On > > + # errors transfer's phase can change to PAUSED_SYSTEM or FINISHED_FAILURE. > > + # If the transfer was paused, we need to cancel it to remove the disk, > > + # otherwise the system will remove the disk and transfer shortly after. > > + > > endt = time.time() + timeout > > while True: > > - transfer = transfer_service.get() > > - if transfer.phase != types.ImageTransferPhase.INITIALIZING: > > + time.sleep(1) > > + try: > > + transfer = transfer_service.get() > > + except sdk.NotFoundError: > > + # The system has removed the disk and the transfer. > > + raise RuntimeError("transfer %s was removed" % transfer.id) > > + > > + if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE: > > + # The system will remove the disk and the transfer soon. > > + raise RuntimeError( > > + "transfer %s has failed" % transfer.id) > > + > > + if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM: > > + transfer_service.cancel() > > + raise RuntimeError( > > + "transfer %s was paused by system" % transfer.id) > > + > > + if transfer.phase == types.ImageTransferPhase.TRANSFERRING: > > break > > - if time.time() > endt: > > + > > + if transfer.phase != types.ImageTransferPhase.INITIALIZING: > > transfer_service.cancel() > > raise RuntimeError( > > - "timed out waiting for transfer %s status != INITIALIZING" > > - % transfer.id) > > + "unexpected transfer %s phase %s" > > + % (transfer.id, transfer.phase)) > > > > - time.sleep(1) > > + if time.time() > endt: > > + transfer_service.cancel() > > + raise RuntimeError( > > + "timed out waiting for transfer %s" % transfer.id) > > > > return transfer > > > > -- > > 2.21.0 > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-builder quickly builds VMs from scratch > http://libguestfs.org/virt-builder.1.html >
Nir Soffer
2019-Dec-10 18:34 UTC
Re: [Libguestfs] [PATCH] rhv-upload: Fix waiting for transfer
On Thu, Nov 28, 2019 at 10:58 PM Richard W.M. Jones <rjones@redhat.com> wrote:> > On Thu, Nov 28, 2019 at 09:34:18PM +0200, Nir Soffer wrote: > > We were not considering failures while initializing the transfer. In > > this case the transfer phase can change to PAUSED_SYSTEM or > > FINISHED_FAILURE, and transfer_url will be None, which failed the > > upload with a misleading error: > > > > RuntimeError: direct upload to host not supported, requires > > ovirt-engine >= 4.2 and only works when virt-v2v is run within the > > oVirt/RHV environment, eg. on an oVirt node > > > > Change the wait loop to consider all cases: > > - Transfer failed and was removed > > - Transfer failed and will be removed soon > > - Transfer paused by the system (cancel required) > > - Unexpected transfer phase (cancel required) > > - Timeout waiting for TRANSFERRING state (cancel required) > > > > Reported-by: Xiaodai Wang > > --- > > > > I could easy simulate the case when the system paused the transfer by > > injecting an error in vdsm, failing transfer initialization. > > > > The import fail with: > > > > nbdkit: python[1]: error: /home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py: open: error: Traceback (most recent call last): > > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 109, in open > > transfer = create_transfer(connection, disk, host) > > File "/home/nsoffer/src/virt-v2v/tmp/rhvupload.1DgXyh/rhv-upload-plugin.py", line 539, in create_transfer > > "transfer %s was paused by system" % transfer.id) > > RuntimeError: transfer 32b97384-ac8b-40d5-b423-26d31faabe32 was paused by system > > > > I could not simulate the other cases. This probaly requires injecting > > errors in engine. > > You might be able to inject errors more easily than that by modifying > the test harness (tests/test-v2v-o-rhv-upload-module/ovirtsdk4/). > > Anyway patch looks reasonable, although I didn't test it, so: > > ACKSeems that Daniel is too busy now to review this, so I think we should push this.