thr3ads.net - libvirt users - Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Adam Litke

2015-Jun-03 13:20 UTC

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

On 03/06/15 07:36 +0000, Soeren Malchow wrote:>Dear Adam
>
>First we were using a python script that was working on 4 threads and
>therefore removing 4 snapshots at the time throughout the cluster, that
>still caused problems.
>
>Now i took the snapshot removing out of the threaded part an i am just
>looping through each snapshot on each VM one after another, even with
>³sleeps² inbetween, but the problem remains.
>But i am getting the impression that it is a problem with the amount of
>snapshots that are deleted in a certain time, if i delete manually and one
>after another (meaning every 10 min or so) i do not have problems, if i
>delete manually and do several at once and on one VM the next one just
>after one finished, the risk seems to increase.
Hmm.  In our lab we extensively tested removing a snapshot for a VM
with 4 disks.  This means 4 block jobs running simultaneously.  Less
than 10 minutes later (closer to 1 minute) we would remove a second
snapshot for the same VM (again involving 4 block jobs).  I guess we
should rerun this flow on a fully updated CentOS 7.1 host to see about
local reproduction.  Seems your case is much simpler than this though.
Is this happening every time or intermittently?
>I do not think it is the number of VMS because we had this on hosts with
>only 3 or 4 Vms running
>
>I will try restarting the libvirt and see what happens.
>
>We are not using RHEL 7.1 only CentOS 7.1
>
>Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this.  Eric, would
enabling debug logging in libvirtd help to shine some light on the
problem?

-- 
Adam Litke

Soeren Malchow

2015-Jun-03 16:07 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Hi,

This is not happening every time, the last time i had this, it was a
script runnning, and something like th 9. Vm and the 23. Vm had a problem,
and it is not always the same VMS, it is not about the OS (happen for
Windows and Linux alike)

And as i said it also happened when i tried to remove the snapshots
sequentially, here is the code (i know it is probably not the elegant way,
but i am not a developer) and the code actually has correct indentions.

<― snip ―>

print "Snapshot deletion"
try:
    time.sleep(300)
    Connect()
    vms = api.vms.list()
    for vm in vms:
        print ("Deleting snapshots for %s ") % vm.name
        snapshotlist = vm.snapshots.list()
        for snapshot in snapshotlist:
            if snapshot.description != "Active VM":
                time.sleep(30)
                snapshot.delete()
                try:
                    while
api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
="locked":
                        print("Waiting for snapshot %s on %s deletion to
finish") % (snapshot.description, vm.name)
                        time.sleep(60)
                except Exception as e:
                    print ("Snapshot %s does not exist anymore") %
snapshot.description
        print ("Snapshot deletion for %s done") % vm.name
    print ("Deletion of snapshots done")
    api.disconnect()
except Exception as e:
    print ("Something went wrong when deleting the snapshots\n%s") %
str(e)



<― snip ―> 


Cheers
Soeren





On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>Dear Adam
>>
>>First we were using a python script that was working on 4 threads and
>>therefore removing 4 snapshots at the time throughout the cluster, that
>>still caused problems.
>>
>>Now i took the snapshot removing out of the threaded part an i am just
>>looping through each snapshot on each VM one after another, even with
>>³sleeps² inbetween, but the problem remains.
>>But i am getting the impression that it is a problem with the amount of
>>snapshots that are deleted in a certain time, if i delete manually and
>>one
>>after another (meaning every 10 min or so) i do not have problems, if i
>>delete manually and do several at once and on one VM the next one just
>>after one finished, the risk seems to increase.
>
>Hmm.  In our lab we extensively tested removing a snapshot for a VM
>with 4 disks.  This means 4 block jobs running simultaneously.  Less
>than 10 minutes later (closer to 1 minute) we would remove a second
>snapshot for the same VM (again involving 4 block jobs).  I guess we
>should rerun this flow on a fully updated CentOS 7.1 host to see about
>local reproduction.  Seems your case is much simpler than this though.
>Is this happening every time or intermittently?
>
>>I do not think it is the number of VMS because we had this on hosts with
>>only 3 or 4 Vms running
>>
>>I will try restarting the libvirt and see what happens.
>>
>>We are not using RHEL 7.1 only CentOS 7.1
>>
>>Is there anything else we can look at when this happens again ?
>
>I'll defer to Eric Blake for the libvirt side of this.  Eric, would
>enabling debug logging in libvirtd help to shine some light on the
>problem?
>
>-- 
>Adam Litke

Soeren Malchow

2015-Jun-04 13:08 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Hi Adam, Hi Eric,

We had this issue again a few minutes ago.

One machine went down exactly the same way as described, the machine had
only one snapshot and it was the only snapshot that was removed, before
that in the same scriptrun we deleted the snapshots of 15 other Vms, some
without, some with 1 and some with several snapshots.

Can i provide anything from the logs that helps ?

Regards
Soeren 



On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net>
wrote:
>Hi,
>
>This is not happening every time, the last time i had this, it was a
>script runnning, and something like th 9. Vm and the 23. Vm had a problem,
>and it is not always the same VMS, it is not about the OS (happen for
>Windows and Linux alike)
>
>And as i said it also happened when i tried to remove the snapshots
>sequentially, here is the code (i know it is probably not the elegant way,
>but i am not a developer) and the code actually has correct indentions.
>
><― snip ―>
>
>print "Snapshot deletion"
>try:
>    time.sleep(300)
>    Connect()
>    vms = api.vms.list()
>    for vm in vms:
>        print ("Deleting snapshots for %s ") % vm.name
>        snapshotlist = vm.snapshots.list()
>        for snapshot in snapshotlist:
>            if snapshot.description != "Active VM":
>                time.sleep(30)
>                snapshot.delete()
>                try:
>                    while
>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
=>"locked":
>                        print("Waiting for snapshot %s on %s deletion
to
>finish") % (snapshot.description, vm.name)
>                        time.sleep(60)
>                except Exception as e:
>                    print ("Snapshot %s does not exist anymore") %
>snapshot.description
>        print ("Snapshot deletion for %s done") % vm.name
>    print ("Deletion of snapshots done")
>    api.disconnect()
>except Exception as e:
>    print ("Something went wrong when deleting the snapshots\n%s")
%
>str(e)
>
>
>
><― snip ―> 
>
>
>Cheers
>Soeren
>
>
>
>
>
>On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
>
>>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>>Dear Adam
>>>
>>>First we were using a python script that was working on 4 threads
and
>>>therefore removing 4 snapshots at the time throughout the cluster,
that
>>>still caused problems.
>>>
>>>Now i took the snapshot removing out of the threaded part an i am
just
>>>looping through each snapshot on each VM one after another, even
with
>>>³sleeps² inbetween, but the problem remains.
>>>But i am getting the impression that it is a problem with the amount
of
>>>snapshots that are deleted in a certain time, if i delete manually
and
>>>one
>>>after another (meaning every 10 min or so) i do not have problems,
if i
>>>delete manually and do several at once and on one VM the next one
just
>>>after one finished, the risk seems to increase.
>>
>>Hmm.  In our lab we extensively tested removing a snapshot for a VM
>>with 4 disks.  This means 4 block jobs running simultaneously.  Less
>>than 10 minutes later (closer to 1 minute) we would remove a second
>>snapshot for the same VM (again involving 4 block jobs).  I guess we
>>should rerun this flow on a fully updated CentOS 7.1 host to see about
>>local reproduction.  Seems your case is much simpler than this though.
>>Is this happening every time or intermittently?
>>
>>>I do not think it is the number of VMS because we had this on hosts
with
>>>only 3 or 4 Vms running
>>>
>>>I will try restarting the libvirt and see what happens.
>>>
>>>We are not using RHEL 7.1 only CentOS 7.1
>>>
>>>Is there anything else we can look at when this happens again ?
>>
>>I'll defer to Eric Blake for the libvirt side of this.  Eric, would
>>enabling debug logging in libvirtd help to shine some light on the
>>problem?
>>
>>-- 
>>Adam Litke
>
>_______________________________________________
>Users mailing list
>Users@ovirt.org
>http://lists.ovirt.org/mailman/listinfo/users

Possibly Parallel Threads

Search for more maybe matching threads

libvirt users - Jun 2015 - Re: [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Possibly Parallel Threads