Daniel Erez
2019-Mar-17 12:07 UTC
[Libguestfs] [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
After invoking transfer_service.finalize, check operation status by examining ImageTransferPhase and DiskStatus. This is done instead of failing after a predefined timeout regardless the status. * not verified * Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361 --- v2v/rhv-upload-plugin.py | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py index 2a950c5ed..873c11ce1 100644 --- a/v2v/rhv-upload-plugin.py +++ b/v2v/rhv-upload-plugin.py @@ -523,14 +523,30 @@ def close(h): # waiting for the transfer object to cease to exist, which # falls through to the exception case and then we can # continue. - endt = time.time() + timeout + start = time.time() try: while True: time.sleep(1) - tmp = transfer_service.get() - if time.time() > endt: - raise RuntimeError("timed out waiting for transfer " - "to finalize") + transfer = transfer_service.get() + + if transfer is None: + disk_service = h['disk_service'] + disk = disk_service.get() + if disk.status == types.DiskStatus.OK: + continue + + if transfer.phase == types.ImageTransferPhase.FINISHED_SUCCESS: + debug("finalized after %s seconds", time.time() - start) + break + + if transfer.phase =types.ImageTransferPhase.FINALIZING_SUCCESS: + if time.time() > start + timeout: + raise RuntimeError("timed out waiting for transfer " + "to finalize") + continue + + raise RuntimeError("Unexpected transfer phase while finalizing " + "upload %r" % transfer.phase) except sdk.NotFoundError: pass --
Nir Soffer
2019-Mar-17 12:59 UTC
Re: [Libguestfs] [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
On Sun, Mar 17, 2019 at 2:07 PM Daniel Erez <derez@redhat.com> wrote:> After invoking transfer_service.finalize, check operation > status by examining ImageTransferPhase and DiskStatus. > This is done instead of failing after a predefined timeout > regardless the status. > > * not verified * > > Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361 > --- > v2v/rhv-upload-plugin.py | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py > index 2a950c5ed..873c11ce1 100644 > --- a/v2v/rhv-upload-plugin.py > +++ b/v2v/rhv-upload-plugin.py > @@ -523,14 +523,30 @@ def close(h): > # waiting for the transfer object to cease to exist, which > # falls through to the exception case and then we can > # continue. > - endt = time.time() + timeout > + start = time.time() > try: > while True: > time.sleep(1) > - tmp = transfer_service.get() > - if time.time() > endt: > - raise RuntimeError("timed out waiting for transfer " > - "to finalize") > + transfer = transfer_service.get() > + > + if transfer is None: >Are you sure this is possible? the original code assumed that we fail with sdk.NotFoundError.> + disk_service = h['disk_service'] > + disk = disk_service.get() > + if disk.status == types.DiskStatus.OK: > + continue >If the disk status is OK the upload was finished, so we should break, no? If the disk status is LOCKED, we should continue. But in this case there is no point to check the transfer again. What if the finalize failed, and the disk was deleted? Do we get disk = None or some exception?> + > + if transfer.phase => types.ImageTransferPhase.FINISHED_SUCCESS: > + debug("finalized after %s seconds", time.time() - > start) > + break > + > + if transfer.phase => types.ImageTransferPhase.FINALIZING_SUCCESS: > + if time.time() > start + timeout: > + raise RuntimeError("timed out waiting for > transfer " > + "to finalize") > + continue > + > + raise RuntimeError("Unexpected transfer phase while > finalizing " > + "upload %r" % transfer.phase) > except sdk.NotFoundError: > pass >Is this error possible? We need to add debug log here. I think the logic should be: while True: wait get transfer if transfer does not exists (None or sdk.NotFoundErro) , break if transfer is finished, break (success) if transfer is finalizing: if timed out, fail continue fail with invalid transfer phase while True: wait get disk if disk does not exists (None or sdk.NotFoundErro), fail if disk is OK, break (success) if disk is locked: if timed out, fail continue fail with invalid disk status This is little bit crazy that all user have to do this. We need to change engine so transfer is kept in the database after it finished for enough time (1 hour?), so user can wait *only* for the transfer. Nir
Daniel Erez
2019-Mar-17 13:47 UTC
Re: [Libguestfs] [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
On Sun, Mar 17, 2019 at 2:59 PM Nir Soffer <nsoffer@redhat.com> wrote:> > On Sun, Mar 17, 2019 at 2:07 PM Daniel Erez <derez@redhat.com> wrote: >> >> After invoking transfer_service.finalize, check operation >> status by examining ImageTransferPhase and DiskStatus. >> This is done instead of failing after a predefined timeout >> regardless the status. >> >> * not verified * >> >> Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1680361 >> --- >> v2v/rhv-upload-plugin.py | 26 +++++++++++++++++++++----- >> 1 file changed, 21 insertions(+), 5 deletions(-) >> >> diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py >> index 2a950c5ed..873c11ce1 100644 >> --- a/v2v/rhv-upload-plugin.py >> +++ b/v2v/rhv-upload-plugin.py >> @@ -523,14 +523,30 @@ def close(h): >> # waiting for the transfer object to cease to exist, which >> # falls through to the exception case and then we can >> # continue. >> - endt = time.time() + timeout >> + start = time.time() >> try: >> while True: >> time.sleep(1) >> - tmp = transfer_service.get() >> - if time.time() > endt: >> - raise RuntimeError("timed out waiting for transfer " >> - "to finalize") >> + transfer = transfer_service.get() >> + >> + if transfer is None: > > > Are you sure this is possible? the original code assumed that we > fail with sdk.NotFoundError. > >> >> + disk_service = h['disk_service'] >> + disk = disk_service.get() >> + if disk.status == types.DiskStatus.OK: >> + continue > > > If the disk status is OK the upload was finished, so we should break, no?Yes, should finish. So 'break'.> > If the disk status is LOCKED, we should continue. But in this case there is > no point to check the transfer again.Could add: if disk.status == types.DiskStatus.LOCKED: continue> > What if the finalize failed, and the disk was deleted? Do we get disk = None > or some exception?sdk.NotFoundError exception should be thrown.> >> >> + >> + if transfer.phase == types.ImageTransferPhase.FINISHED_SUCCESS: >> + debug("finalized after %s seconds", time.time() - start) >> + break >> + >> + if transfer.phase =>> types.ImageTransferPhase.FINALIZING_SUCCESS: >> + if time.time() > start + timeout: >> + raise RuntimeError("timed out waiting for transfer " >> + "to finalize") >> + continue >> + >> + raise RuntimeError("Unexpected transfer phase while >> finalizing " >> + "upload %r" % transfer.phase) >> except sdk.NotFoundError: >> pass > > > Is this error possible? We need to add debug log here. > > I think the logic should be: > > while True: > wait > get transfer > if transfer does not exists (None or sdk.NotFoundErro) , break > if transfer is finished, break (success) > if transfer is finalizing: > if timed out, fail > continue > fail with invalid transfer phase > > while True: > wait > get disk > if disk does not exists (None or sdk.NotFoundErro), fail > if disk is OK, break (success) > if disk is locked: > if timed out, fail > continue > fail with invalid disk status > > This is little bit crazy that all user have to do this.Should make it simpler indeed. Maybe add it to the example at least or create some library for reusability. Fo now, I think it's good enough to check disk's status and catch NotFoundError exception as an indication for error in finalize (I'll send a new version for this). I.e. disk_id = disk.id start = time.time() try: while True: time.sleep(1) disk_service = h['disk_service'] disk = disk_service.get() if disk.status == types.DiskStatus.LOCKED: if time.time() > start + timeout: raise RuntimeError("timed out waiting for transfer " "to finalize") continue if disk.status == types.DiskStatus.OK: debug("finalized after %s seconds", time.time() - start) break except sdk.NotFoundError: debug("transfer finalize failed for disk: %s", disk_id) pass> > We need to change engine so transfer is kept in the database after it finished for enough > time (1 hour?), so user can wait *only* for the transfer. > > Nir > >
Reasonably Related Threads
- Re: [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- Re: [PATCH v2] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize
- Re: [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)
- Re: [PATCH] v2v: rhv-upload-plugin - improve wait logic after finalize (RHBZ#1680361)