Richard W.M. Jones
2019-Aug-08 14:05 UTC
Re: [Libguestfs] [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
On Mon, Mar 18, 2019 at 06:51:26PM +0200, Daniel Erez wrote:> After invoking transfer_service.finalize, check operation > status by examining DiskStatus. > This is done instead of failing after a predefined timeout > regardless the status. > > * not verified * > > Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361 > --- > v2v/rhv-upload-plugin.py | 19 +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py > index 2a950c5ed..2cefe94b0 100644 > --- a/v2v/rhv-upload-plugin.py > +++ b/v2v/rhv-upload-plugin.py > @@ -523,16 +523,23 @@ def close(h): > # waiting for the transfer object to cease to exist, which > # falls through to the exception case and then we can > # continue. > - endt = time.time() + timeout > + disk_id = disk.id > + start = time.time() > try: > while True: > time.sleep(1) > - tmp = transfer_service.get() > - if time.time() > endt: > - raise RuntimeError("timed out waiting for transfer " > - "to finalize") > + disk_service = h['disk_service'] > + disk = disk_service.get() > + if disk.status == types.DiskStatus.LOCKED: > + if time.time() > start + timeout: > + raise RuntimeError("timed out waiting for transfer " > + "to finalize") > + continue > + if disk.status == types.DiskStatus.OK: > + debug("finalized after %s seconds", time.time() - start) > + break > except sdk.NotFoundError: > - pass > + raise RuntimeError("transfer failed: disk %s not found" % disk_id) > > # Write the disk ID file. Only do this on successful completion. > with builtins.open(params['diskid_file'], 'w') as fp:Ilanit tested this patch for us. Unfortunately it fails for reasons I don't really understand but seem to be deep inside the RHV API: nbdkit: python[1]: error: /var/tmp/v2v.sZm0my/rhv-upload-plugin.py: close: error: ['Traceback (most recent call last): ', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 577, in close ', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 516, in delete_disk_on_failure ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 37334, in remove self._internal_remove(headers, query, wait) ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 271, in _internal_remove return future.wait() if wait else future ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 268, in callback self._check_fault(response) ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Virtual Disk. Related operation is currently in progress. Please try again later.]". HTTP response code is 409. '] Can you help us to understand what we need to do to fix this? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Nir Soffer
2019-Aug-08 23:58 UTC
Re: [Libguestfs] [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
On Thu, Aug 8, 2019 at 5:05 PM Richard W.M. Jones <rjones@redhat.com> wrote:> On Mon, Mar 18, 2019 at 06:51:26PM +0200, Daniel Erez wrote: > > After invoking transfer_service.finalize, check operation > > status by examining DiskStatus. > > This is done instead of failing after a predefined timeout > > regardless the status. > > > > * not verified * > > > > Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361 > > --- > > v2v/rhv-upload-plugin.py | 19 +++++++++++++------ > > 1 file changed, 13 insertions(+), 6 deletions(-) > > > > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py > > index 2a950c5ed..2cefe94b0 100644 > > --- a/v2v/rhv-upload-plugin.py > > +++ b/v2v/rhv-upload-plugin.py > > @@ -523,16 +523,23 @@ def close(h): > > # waiting for the transfer object to cease to exist, which > > # falls through to the exception case and then we can > > # continue. > > - endt = time.time() + timeout > > + disk_id = disk.id > > + start = time.time() > > try: > > while True: > > time.sleep(1) > > - tmp = transfer_service.get() > > - if time.time() > endt: > > - raise RuntimeError("timed out waiting for transfer " > > - "to finalize") > > + disk_service = h['disk_service'] > > + disk = disk_service.get() > > + if disk.status == types.DiskStatus.LOCKED: > > + if time.time() > start + timeout: > > + raise RuntimeError("timed out waiting for > transfer " > > + "to finalize") > > + continue > > + if disk.status == types.DiskStatus.OK: > > + debug("finalized after %s seconds", time.time() - > start) >Maybe this was the real error? (using , instead of %)> > + break > > except sdk.NotFoundError: > > - pass > > + raise RuntimeError("transfer failed: disk %s not found" % > disk_id) > > > > # Write the disk ID file. Only do this on successful > completion. > > with builtins.open(params['diskid_file'], 'w') as fp: > > Ilanit tested this patch for us. Unfortunately it fails for reasons I > don't really understand but seem to be deep inside the RHV API: > > nbdkit: python[1]: error: /var/tmp/v2v.sZm0my/rhv-upload-plugin.py: close: > error: ['Traceback (most recent call last): > ', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 577, in close > ', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 516, in > delete_disk_on_failure >This looks like failure flow - but there is no info here on the real failure. We need to understand what was that failure.> ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > line 37334, in remove > self._internal_remove(headers, query, wait) > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 271, in _internal_remove > return future.wait() if wait else future > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 55, in wait > return self._code(response) > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 268, in callback > self._check_fault(response) > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 132, in _check_fault > self._raise_error(response, body) > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 118, in _raise_error > raise error > ', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot > remove Virtual Disk. Related operation is currently in progress. Please try > again later.]". HTTP response code is 409. > '] >I think this error means that the disk is currently busy, probably because of the real error. Ilanit, can you share complete virt-v2v and daemon logs? There is a surprising behavior in the image transfer API - you create a disk, and start a transfer session with that disk. However once you started a the transfer, it owns the disk, and it will delete it in case of failures, so you don't need to delete it yourself. I guess you need something like h["owns_disk"] = True After creating the disk, and once the transfer was started, set it to False. In your cleanup flow, delete the disk only if you still own it. The other issue here is that the error during close is hiding the real error. This is a common problem in python 2, and the solution is typically to catch all exceptions in the cleanup flow and log them, and make sure the original error is raised and logged by the top error handler. Nir
Richard W.M. Jones
2019-Aug-09 08:20 UTC
Re: [Libguestfs] [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
On Fri, Aug 09, 2019 at 02:58:18AM +0300, Nir Soffer wrote:> On Thu, Aug 8, 2019 at 5:05 PM Richard W.M. Jones <rjones@redhat.com> wrote: > > > > + if disk.status == types.DiskStatus.OK: > > > + debug("finalized after %s seconds", time.time() - > > start) > > > > Maybe this was the real error? (using , instead of %)I fixed this in a later build. However the error we're talking about here seems to be unrelated to it. ...> > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > > line 37334, in remove > > self._internal_remove(headers, query, wait) > > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 271, in _internal_remove > > return future.wait() if wait else future > > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 55, in wait > > return self._code(response) > > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 268, in callback > > self._check_fault(response) > > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 132, in _check_fault > > self._raise_error(response, body) > > ', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 118, in _raise_error > > raise error > > ', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot > > remove Virtual Disk. Related operation is currently in progress. Please try > > again later.]". HTTP response code is 409. > > '] > > > > I think this error means that the disk is currently busy, probably because > of the real error. > > Ilanit, can you share complete virt-v2v and daemon logs?These are quite large so I uploaded them here: http://oirase.annexia.org/tmp/v2v-import-20190808T125134-439694.log http://oirase.annexia.org/tmp/v2v-import-20190808T125134-439694-wrapper.log Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Maybe Matching Threads
- Re: [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- Re: [PATCH v2] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- Re: [PATCH v3] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- Re: [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize