thr3ads.net - libvirt users - Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Soeren Malchow

2015-Jun-04 15:08 UTC

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Hi,

I would send those, but unfortunately we did not think about the journals
getting deleted after a reboot.

I just made the journals persistent on the servers, we are trying to
trigger the error again last time we only got half way through the VM’s
when removing the snapshots so we have a good chance that it comes up
again.

Also the libvirt logs to the journal not to libvirtd.log, i would send the
journal directly to you and Eric via our data exchange servers


Soeren 

On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com> wrote:
>On 04/06/15 13:08 +0000, Soeren Malchow wrote:
>>Hi Adam, Hi Eric,
>>
>>We had this issue again a few minutes ago.
>>
>>One machine went down exactly the same way as described, the machine had
>>only one snapshot and it was the only snapshot that was removed, before
>>that in the same scriptrun we deleted the snapshots of 15 other Vms,
some
>>without, some with 1 and some with several snapshots.
>>
>>Can i provide anything from the logs that helps ?
>
>Let's start with the libvirtd.log on that host.  It might be rather
>large so we may need to find a creative place to host it.
>
>>
>>Regards
>>Soeren
>>
>>
>>
>>On 03/06/15 18:07, "Soeren Malchow"
<soeren.malchow@mcon.net> wrote:
>>
>>>Hi,
>>>
>>>This is not happening every time, the last time i had this, it was a
>>>script runnning, and something like th 9. Vm and the 23. Vm had a
>>>problem,
>>>and it is not always the same VMS, it is not about the OS (happen
for
>>>Windows and Linux alike)
>>>
>>>And as i said it also happened when i tried to remove the snapshots
>>>sequentially, here is the code (i know it is probably not the
elegant
>>>way,
>>>but i am not a developer) and the code actually has correct
indentions.
>>>
>>><― snip ―>
>>>
>>>print "Snapshot deletion"
>>>try:
>>>    time.sleep(300)
>>>    Connect()
>>>    vms = api.vms.list()
>>>    for vm in vms:
>>>        print ("Deleting snapshots for %s ") % vm.name
>>>        snapshotlist = vm.snapshots.list()
>>>        for snapshot in snapshotlist:
>>>            if snapshot.description != "Active VM":
>>>                time.sleep(30)
>>>                snapshot.delete()
>>>                try:
>>>                    while
>>>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
>>>=>>>"locked":
>>>                        print("Waiting for snapshot %s on %s
deletion to
>>>finish") % (snapshot.description, vm.name)
>>>                        time.sleep(60)
>>>                except Exception as e:
>>>                    print ("Snapshot %s does not exist
anymore") %
>>>snapshot.description
>>>        print ("Snapshot deletion for %s done") % vm.name
>>>    print ("Deletion of snapshots done")
>>>    api.disconnect()
>>>except Exception as e:
>>>    print ("Something went wrong when deleting the
snapshots\n%s") %
>>>str(e)
>>>
>>>
>>>
>>><― snip ―>
>>>
>>>
>>>Cheers
>>>Soeren
>>>
>>>
>>>
>>>
>>>
>>>On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com>
wrote:
>>>
>>>>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>>>>Dear Adam
>>>>>
>>>>>First we were using a python script that was working on 4
threads and
>>>>>therefore removing 4 snapshots at the time throughout the
cluster,
>>>>>that
>>>>>still caused problems.
>>>>>
>>>>>Now i took the snapshot removing out of the threaded part an
i am just
>>>>>looping through each snapshot on each VM one after another,
even with
>>>>>³sleeps² inbetween, but the problem remains.
>>>>>But i am getting the impression that it is a problem with
the amount
>>>>>of
>>>>>snapshots that are deleted in a certain time, if i delete
manually and
>>>>>one
>>>>>after another (meaning every 10 min or so) i do not have
problems, if
>>>>>i
>>>>>delete manually and do several at once and on one VM the
next one just
>>>>>after one finished, the risk seems to increase.
>>>>
>>>>Hmm.  In our lab we extensively tested removing a snapshot for a
VM
>>>>with 4 disks.  This means 4 block jobs running simultaneously. 
Less
>>>>than 10 minutes later (closer to 1 minute) we would remove a
second
>>>>snapshot for the same VM (again involving 4 block jobs).  I
guess we
>>>>should rerun this flow on a fully updated CentOS 7.1 host to see
about
>>>>local reproduction.  Seems your case is much simpler than this
though.
>>>>Is this happening every time or intermittently?
>>>>
>>>>>I do not think it is the number of VMS because we had this
on hosts
>>>>>with
>>>>>only 3 or 4 Vms running
>>>>>
>>>>>I will try restarting the libvirt and see what happens.
>>>>>
>>>>>We are not using RHEL 7.1 only CentOS 7.1
>>>>>
>>>>>Is there anything else we can look at when this happens
again ?
>>>>
>>>>I'll defer to Eric Blake for the libvirt side of this. 
Eric, would
>>>>enabling debug logging in libvirtd help to shine some light on
the
>>>>problem?
>>>>
>>>>--
>>>>Adam Litke
>>>
>>>_______________________________________________
>>>Users mailing list
>>>Users@ovirt.org
>>>http://lists.ovirt.org/mailman/listinfo/users
>>
>
>-- 
>Adam Litke

Soeren Malchow

2015-Jun-11 11:00 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

We are still having this problem and we can not figure out what to do, i
sent the logs already as download, can i do anything else to help ?




On 04/06/15 17:08, "Soeren Malchow" <soeren.malchow@mcon.net>
wrote:
>Hi,
>
>I would send those, but unfortunately we did not think about the journals
>getting deleted after a reboot.
>
>I just made the journals persistent on the servers, we are trying to
>trigger the error again last time we only got half way through the VM’s
>when removing the snapshots so we have a good chance that it comes up
>again.
>
>Also the libvirt logs to the journal not to libvirtd.log, i would send the
>journal directly to you and Eric via our data exchange servers
>
>
>Soeren 
>
>On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com> wrote:
>
>>On 04/06/15 13:08 +0000, Soeren Malchow wrote:
>>>Hi Adam, Hi Eric,
>>>
>>>We had this issue again a few minutes ago.
>>>
>>>One machine went down exactly the same way as described, the machine
had
>>>only one snapshot and it was the only snapshot that was removed,
before
>>>that in the same scriptrun we deleted the snapshots of 15 other Vms,
>>>some
>>>without, some with 1 and some with several snapshots.
>>>
>>>Can i provide anything from the logs that helps ?
>>
>>Let's start with the libvirtd.log on that host.  It might be rather
>>large so we may need to find a creative place to host it.
>>
>>>
>>>Regards
>>>Soeren
>>>
>>>
>>>
>>>On 03/06/15 18:07, "Soeren Malchow"
<soeren.malchow@mcon.net> wrote:
>>>
>>>>Hi,
>>>>
>>>>This is not happening every time, the last time i had this, it
was a
>>>>script runnning, and something like th 9. Vm and the 23. Vm had
a
>>>>problem,
>>>>and it is not always the same VMS, it is not about the OS
(happen for
>>>>Windows and Linux alike)
>>>>
>>>>And as i said it also happened when i tried to remove the
snapshots
>>>>sequentially, here is the code (i know it is probably not the
elegant
>>>>way,
>>>>but i am not a developer) and the code actually has correct
indentions.
>>>>
>>>><― snip ―>
>>>>
>>>>print "Snapshot deletion"
>>>>try:
>>>>    time.sleep(300)
>>>>    Connect()
>>>>    vms = api.vms.list()
>>>>    for vm in vms:
>>>>        print ("Deleting snapshots for %s ") % vm.name
>>>>        snapshotlist = vm.snapshots.list()
>>>>        for snapshot in snapshotlist:
>>>>            if snapshot.description != "Active VM":
>>>>                time.sleep(30)
>>>>                snapshot.delete()
>>>>                try:
>>>>                    while
>>>>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
>>>>=>>>>"locked":
>>>>                        print("Waiting for snapshot %s on
%s deletion
>>>>to
>>>>finish") % (snapshot.description, vm.name)
>>>>                        time.sleep(60)
>>>>                except Exception as e:
>>>>                    print ("Snapshot %s does not exist
anymore") %
>>>>snapshot.description
>>>>        print ("Snapshot deletion for %s done") %
vm.name
>>>>    print ("Deletion of snapshots done")
>>>>    api.disconnect()
>>>>except Exception as e:
>>>>    print ("Something went wrong when deleting the
snapshots\n%s") %
>>>>str(e)
>>>>
>>>>
>>>>
>>>><― snip ―>
>>>>
>>>>
>>>>Cheers
>>>>Soeren
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 03/06/15 15:20, "Adam Litke"
<alitke@redhat.com> wrote:
>>>>
>>>>>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>>>>>Dear Adam
>>>>>>
>>>>>>First we were using a python script that was working on
4 threads and
>>>>>>therefore removing 4 snapshots at the time throughout
the cluster,
>>>>>>that
>>>>>>still caused problems.
>>>>>>
>>>>>>Now i took the snapshot removing out of the threaded
part an i am
>>>>>>just
>>>>>>looping through each snapshot on each VM one after
another, even with
>>>>>>³sleeps² inbetween, but the problem remains.
>>>>>>But i am getting the impression that it is a problem
with the amount
>>>>>>of
>>>>>>snapshots that are deleted in a certain time, if i
delete manually
>>>>>>and
>>>>>>one
>>>>>>after another (meaning every 10 min or so) i do not have
problems, if
>>>>>>i
>>>>>>delete manually and do several at once and on one VM the
next one
>>>>>>just
>>>>>>after one finished, the risk seems to increase.
>>>>>
>>>>>Hmm.  In our lab we extensively tested removing a snapshot
for a VM
>>>>>with 4 disks.  This means 4 block jobs running
simultaneously.  Less
>>>>>than 10 minutes later (closer to 1 minute) we would remove a
second
>>>>>snapshot for the same VM (again involving 4 block jobs).  I
guess we
>>>>>should rerun this flow on a fully updated CentOS 7.1 host to
see about
>>>>>local reproduction.  Seems your case is much simpler than
this though.
>>>>>Is this happening every time or intermittently?
>>>>>
>>>>>>I do not think it is the number of VMS because we had
this on hosts
>>>>>>with
>>>>>>only 3 or 4 Vms running
>>>>>>
>>>>>>I will try restarting the libvirt and see what happens.
>>>>>>
>>>>>>We are not using RHEL 7.1 only CentOS 7.1
>>>>>>
>>>>>>Is there anything else we can look at when this happens
again ?
>>>>>
>>>>>I'll defer to Eric Blake for the libvirt side of this. 
Eric, would
>>>>>enabling debug logging in libvirtd help to shine some light
on the
>>>>>problem?
>>>>>
>>>>>--
>>>>>Adam Litke
>>>>
>>>>_______________________________________________
>>>>Users mailing list
>>>>Users@ovirt.org
>>>>http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>-- 
>>Adam Litke
>
>_______________________________________________
>Users mailing list
>Users@ovirt.org
>http://lists.ovirt.org/mailman/listinfo/users

Adam Litke

2015-Jun-11 20:39 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

On 11/06/15 11:00 +0000, Soeren Malchow wrote:>We are still having this problem and we can not figure out what to do, i
>sent the logs already as download, can i do anything else to help ?
Hi.  I'm sorry but I don't have any new information for you yet.  One
thing you could do is create a new bug for this issue so we can track
it better.  Please try to include as much information as possible from
this discussion (including relevant log files) in your report.  So far
you are the only one reporting these issues so we'll want to work to
narrow down the specific scenario that is causing this problem and get
the right people working on the solution.
>On 04/06/15 17:08, "Soeren Malchow"
<soeren.malchow@mcon.net> wrote:
>
>>Hi,
>>
>>I would send those, but unfortunately we did not think about the
journals
>>getting deleted after a reboot.
>>
>>I just made the journals persistent on the servers, we are trying to
>>trigger the error again last time we only got half way through the VM’s
>>when removing the snapshots so we have a good chance that it comes up
>>again.
>>
>>Also the libvirt logs to the journal not to libvirtd.log, i would send
the
>>journal directly to you and Eric via our data exchange servers
>>
>>
>>Soeren
>>
>>On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com>
wrote:
>>
>>>On 04/06/15 13:08 +0000, Soeren Malchow wrote:
>>>>Hi Adam, Hi Eric,
>>>>
>>>>We had this issue again a few minutes ago.
>>>>
>>>>One machine went down exactly the same way as described, the
machine had
>>>>only one snapshot and it was the only snapshot that was removed,
before
>>>>that in the same scriptrun we deleted the snapshots of 15 other
Vms,
>>>>some
>>>>without, some with 1 and some with several snapshots.
>>>>
>>>>Can i provide anything from the logs that helps ?
>>>
>>>Let's start with the libvirtd.log on that host.  It might be
rather
>>>large so we may need to find a creative place to host it.
>>>
>>>>
>>>>Regards
>>>>Soeren
>>>>
>>>>
>>>>
>>>>On 03/06/15 18:07, "Soeren Malchow"
<soeren.malchow@mcon.net> wrote:
>>>>
>>>>>Hi,
>>>>>
>>>>>This is not happening every time, the last time i had this,
it was a
>>>>>script runnning, and something like th 9. Vm and the 23. Vm
had a
>>>>>problem,
>>>>>and it is not always the same VMS, it is not about the OS
(happen for
>>>>>Windows and Linux alike)
>>>>>
>>>>>And as i said it also happened when i tried to remove the
snapshots
>>>>>sequentially, here is the code (i know it is probably not
the elegant
>>>>>way,
>>>>>but i am not a developer) and the code actually has correct
indentions.
>>>>>
>>>>><― snip ―>
>>>>>
>>>>>print "Snapshot deletion"
>>>>>try:
>>>>>    time.sleep(300)
>>>>>    Connect()
>>>>>    vms = api.vms.list()
>>>>>    for vm in vms:
>>>>>        print ("Deleting snapshots for %s ") %
vm.name
>>>>>        snapshotlist = vm.snapshots.list()
>>>>>        for snapshot in snapshotlist:
>>>>>            if snapshot.description != "Active
VM":
>>>>>                time.sleep(30)
>>>>>                snapshot.delete()
>>>>>                try:
>>>>>                    while
>>>>>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
>>>>>=>>>>>"locked":
>>>>>                        print("Waiting for snapshot %s
on %s deletion
>>>>>to
>>>>>finish") % (snapshot.description, vm.name)
>>>>>                        time.sleep(60)
>>>>>                except Exception as e:
>>>>>                    print ("Snapshot %s does not exist
anymore") %
>>>>>snapshot.description
>>>>>        print ("Snapshot deletion for %s done") %
vm.name
>>>>>    print ("Deletion of snapshots done")
>>>>>    api.disconnect()
>>>>>except Exception as e:
>>>>>    print ("Something went wrong when deleting the
snapshots\n%s") %
>>>>>str(e)
>>>>>
>>>>>
>>>>>
>>>>><― snip ―>
>>>>>
>>>>>
>>>>>Cheers
>>>>>Soeren
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On 03/06/15 15:20, "Adam Litke"
<alitke@redhat.com> wrote:
>>>>>
>>>>>>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>>>>>>Dear Adam
>>>>>>>
>>>>>>>First we were using a python script that was working
on 4 threads and
>>>>>>>therefore removing 4 snapshots at the time
throughout the cluster,
>>>>>>>that
>>>>>>>still caused problems.
>>>>>>>
>>>>>>>Now i took the snapshot removing out of the threaded
part an i am
>>>>>>>just
>>>>>>>looping through each snapshot on each VM one after
another, even with
>>>>>>>³sleeps² inbetween, but the problem remains.
>>>>>>>But i am getting the impression that it is a problem
with the amount
>>>>>>>of
>>>>>>>snapshots that are deleted in a certain time, if i
delete manually
>>>>>>>and
>>>>>>>one
>>>>>>>after another (meaning every 10 min or so) i do not
have problems, if
>>>>>>>i
>>>>>>>delete manually and do several at once and on one VM
the next one
>>>>>>>just
>>>>>>>after one finished, the risk seems to increase.
>>>>>>
>>>>>>Hmm.  In our lab we extensively tested removing a
snapshot for a VM
>>>>>>with 4 disks.  This means 4 block jobs running
simultaneously.  Less
>>>>>>than 10 minutes later (closer to 1 minute) we would
remove a second
>>>>>>snapshot for the same VM (again involving 4 block jobs).
I guess we
>>>>>>should rerun this flow on a fully updated CentOS 7.1
host to see about
>>>>>>local reproduction.  Seems your case is much simpler
than this though.
>>>>>>Is this happening every time or intermittently?
>>>>>>
>>>>>>>I do not think it is the number of VMS because we
had this on hosts
>>>>>>>with
>>>>>>>only 3 or 4 Vms running
>>>>>>>
>>>>>>>I will try restarting the libvirt and see what
happens.
>>>>>>>
>>>>>>>We are not using RHEL 7.1 only CentOS 7.1
>>>>>>>
>>>>>>>Is there anything else we can look at when this
happens again ?
>>>>>>
>>>>>>I'll defer to Eric Blake for the libvirt side of
this.  Eric, would
>>>>>>enabling debug logging in libvirtd help to shine some
light on the
>>>>>>problem?
>>>>>>
>>>>>>--
>>>>>>Adam Litke
>>>>>
>>>>>_______________________________________________
>>>>>Users mailing list
>>>>>Users@ovirt.org
>>>>>http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>>--
>>>Adam Litke
>>
>>_______________________________________________
>>Users mailing list
>>Users@ovirt.org
>>http://lists.ovirt.org/mailman/listinfo/users
>
-- 
Adam Litke

Possibly Parallel Threads

Search for more possibly parallel threads

libvirt users - Jun 2015 - Re: [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Possibly Parallel Threads