thr3ads.net - libvirt users - Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Adam Litke

2015-Jun-02 16:53 UTC

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Hello Soeren.

I've started to look at this issue and I'd agree that at first glance
it looks like a libvirt issue.  The 'cannot acquire state change lock'
messages suggest a locking bug or severe contention at least.  To help
me better understand the problem I have a few questions about your
setup.
>From your earlier report it appears that you have 15 VMs running onthe failing host.  Are you attempting to remove snapshots from all VMs
at the same time?  Have you tried with fewer concurrent operations?
I'd be curious to understand if the problem is connected to the
number of VMs running or the number of active block jobs.

Have you tried RHEL-7.1 as a hypervisor host?

Rather than rebooting the host, does restarting libvirtd cause the VMs
to become responsive again?  Note that this operation may cause the
host to move to Unresponsive state in the UI for a short period of
time.

Thanks for your report.

On 31/05/15 23:39 +0000, Soeren Malchow wrote:>And sorry, another update, it does kill the VM partly, it was still pingable
when i wrote the last mail, but no ssh and no spice console possible
>
>From: Soeren Malchow
<soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>
>Date: Monday 1 June 2015 01:35
>To: Soeren Malchow
<soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>,
"libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>"
<libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users
<users@ovirt.org<mailto:users@ovirt.org>>
>Subject: Re: [ovirt-users] Bug in Snapshot Removing
>
>Small addition again:
>
>This error shows up in the log while removing snapshots WITHOUT rendering
the Vms unresponsive
>
>—
>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Timed
out during operation: cannot acquire state change lock
>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm.Vm
ERROR vmId=`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block job info
>                                                                  Traceback
(most recent call last):
>                                                                    File
"/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobs…
>
>—
>
>
>
>From: Soeren Malchow
<soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>
>Date: Monday 1 June 2015 00:56
>To:
"libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>"
<libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users
<users@ovirt.org<mailto:users@ovirt.org>>
>Subject: [ovirt-users] Bug in Snapshot Removing
>
>Dear all
>
>I am not sure if the mail just did not get any attention between all the
mails and this time it is also going to the libvirt mailing list.
>
>I am experiencing a problem with VM becoming unresponsive when removing
Snapshots (Live Merge) and i think there is a serious problem.
>
>Here are the previous mails,
>
>http://lists.ovirt.org/pipermail/users/2015-May/033083.html
>
>The problem is on a system with everything on the latest version, CentOS 7.1
and ovirt 3.5.2.1 all upgrades applied.
>
>This Problem did NOT exist before upgrading to CentOS 7.1 with an
environment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview
repo activated.
>
>I think this is a bug in libvirt, not ovirt itself, but i am not sure. The
actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py, line
697).
>
>We are very willing to help, test and supply log files in anyway we can.
>
>Regards
>Soeren
>
>_______________________________________________
>Users mailing list
>Users@ovirt.org
>http://lists.ovirt.org/mailman/listinfo/users

-- 
Adam Litke

Soeren Malchow

2015-Jun-03 07:36 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Dear Adam

First we were using a python script that was working on 4 threads and
therefore removing 4 snapshots at the time throughout the cluster, that
still caused problems.

Now i took the snapshot removing out of the threaded part an i am just
looping through each snapshot on each VM one after another, even with
³sleeps² inbetween, but the problem remains.
But i am getting the impression that it is a problem with the amount of
snapshots that are deleted in a certain time, if i delete manually and one
after another (meaning every 10 min or so) i do not have problems, if i
delete manually and do several at once and on one VM the next one just
after one finished, the risk seems to increase.

I do not think it is the number of VMS because we had this on hosts with
only 3 or 4 Vms running

I will try restarting the libvirt and see what happens.

We are not using RHEL 7.1 only CentOS 7.1

Is there anything else we can look at when this happens again ?

Regards
Soeren 



On 02/06/15 18:53, "Adam Litke" <alitke@redhat.com> wrote:
>Hello Soeren.
>
>I've started to look at this issue and I'd agree that at first
glance
>it looks like a libvirt issue.  The 'cannot acquire state change
lock'
>messages suggest a locking bug or severe contention at least.  To help
>me better understand the problem I have a few questions about your
>setup.
>
>From your earlier report it appears that you have 15 VMs running on
>the failing host.  Are you attempting to remove snapshots from all VMs
>at the same time?  Have you tried with fewer concurrent operations?
>I'd be curious to understand if the problem is connected to the
>number of VMs running or the number of active block jobs.
>
>Have you tried RHEL-7.1 as a hypervisor host?
>
>Rather than rebooting the host, does restarting libvirtd cause the VMs
>to become responsive again?  Note that this operation may cause the
>host to move to Unresponsive state in the UI for a short period of
>time.
>
>Thanks for your report.
>
>On 31/05/15 23:39 +0000, Soeren Malchow wrote:
>>And sorry, another update, it does kill the VM partly, it was still
>>pingable when i wrote the last mail, but no ssh and no spice console
>>possible
>>
>>From: Soeren Malchow
>><soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>
>>Date: Monday 1 June 2015 01:35
>>To: Soeren Malchow
>><soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>,
>>"libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>"
>><libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>,
users
>><users@ovirt.org<mailto:users@ovirt.org>>
>>Subject: Re: [ovirt-users] Bug in Snapshot Removing
>>
>>Small addition again:
>>
>>This error shows up in the log while removing snapshots WITHOUT
>>rendering the Vms unresponsive
>>
>>‹
>>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]:
>>Timed out during operation: cannot acquire state change lock
>>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm
>>vm.Vm ERROR vmId=`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting
>>block job info
>>                 
>>Traceback (most recent call last):
>>                                                                    File
>>"/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobsŠ
>>
>>‹
>>
>>
>>
>>From: Soeren Malchow
>><soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>
>>Date: Monday 1 June 2015 00:56
>>To:
"libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>"
>><libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>,
users
>><users@ovirt.org<mailto:users@ovirt.org>>
>>Subject: [ovirt-users] Bug in Snapshot Removing
>>
>>Dear all
>>
>>I am not sure if the mail just did not get any attention between all the
>>mails and this time it is also going to the libvirt mailing list.
>>
>>I am experiencing a problem with VM becoming unresponsive when removing
>>Snapshots (Live Merge) and i think there is a serious problem.
>>
>>Here are the previous mails,
>>
>>http://lists.ovirt.org/pipermail/users/2015-May/033083.html
>>
>>The problem is on a system with everything on the latest version, CentOS
>>7.1 and ovirt 3.5.2.1 all upgrades applied.
>>
>>This Problem did NOT exist before upgrading to CentOS 7.1 with an
>>environment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the
>>libvirt-preview repo activated.
>>
>>I think this is a bug in libvirt, not ovirt itself, but i am not sure.
>>The actual file throwing the exception is in VDSM
>>(/usr/share/vdsm/virt/vm.py, line 697).
>>
>>We are very willing to help, test and supply log files in anyway we can.
>>
>>Regards
>>Soeren
>>
>
>>_______________________________________________
>>Users mailing list
>>Users@ovirt.org
>>http://lists.ovirt.org/mailman/listinfo/users
>
>
>-- 
>Adam Litke

Adam Litke

2015-Jun-03 13:20 UTC

head link

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

On 03/06/15 07:36 +0000, Soeren Malchow wrote:>Dear Adam
>
>First we were using a python script that was working on 4 threads and
>therefore removing 4 snapshots at the time throughout the cluster, that
>still caused problems.
>
>Now i took the snapshot removing out of the threaded part an i am just
>looping through each snapshot on each VM one after another, even with
>³sleeps² inbetween, but the problem remains.
>But i am getting the impression that it is a problem with the amount of
>snapshots that are deleted in a certain time, if i delete manually and one
>after another (meaning every 10 min or so) i do not have problems, if i
>delete manually and do several at once and on one VM the next one just
>after one finished, the risk seems to increase.
Hmm.  In our lab we extensively tested removing a snapshot for a VM
with 4 disks.  This means 4 block jobs running simultaneously.  Less
than 10 minutes later (closer to 1 minute) we would remove a second
snapshot for the same VM (again involving 4 block jobs).  I guess we
should rerun this flow on a fully updated CentOS 7.1 host to see about
local reproduction.  Seems your case is much simpler than this though.
Is this happening every time or intermittently?
>I do not think it is the number of VMS because we had this on hosts with
>only 3 or 4 Vms running
>
>I will try restarting the libvirt and see what happens.
>
>We are not using RHEL 7.1 only CentOS 7.1
>
>Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this.  Eric, would
enabling debug logging in libvirtd help to shine some light on the
problem?

-- 
Adam Litke

Maybe Matching Threads

Search for more maybe matching threads

libvirt users - Jun 2015 - Re: [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Re: [libvirt-users] [ovirt-users] Bug in Snapshot Removing

Maybe Matching Threads