Serhii Kharchenko
2018-Jan-17 14:45 UTC
[libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats
Hello libvirt-users list, We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes. When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log: Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats) I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one. When we shutdown the process connected to the socket - everything become OK and the bug is gone. The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK. Thanks for any help in advance. We can send any additional info if needed. ~Serhii
Michal Privoznik
2018-Jan-17 15:32 UTC
Re: [libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats
On 01/17/2018 03:45 PM, Serhii Kharchenko wrote:> Hello libvirt-users list, > > We're catching the same bug since 3.4.0 version (3.3.0 works OK). > So, we have process that is permanently connected to libvirtd via socket > and it is collecting stats, listening to events and control the VPSes. > > When we try to 'shutdown' a number of VPSes we often catch the bug. One of > VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, > and there is the next error in the log: > > Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: > 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data > Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > 20441: error : virFileReadAll:1420 : Failed to open file > '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': > No such file or directory > Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > 20441: error : virCgroupGetValueStr:844 : Unable to read from > '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': > No such file or directory > Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu > account: Operation not permitted > Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: > 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job > (destroy, none) for domain DOMAIN1; current job is (query, none) owned by > (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) > Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: > 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during > operation: cannot acquire state change lock (held by > remoteDispatchConnectGetAllDomainStats)Sounds like qemuConnectGetAllDomainStats() forgot to unset the job on a domain. Can you please attach gdb to libvirtd and run 't a a bt' and share the output? Michal
Serhii Kharchenko
2018-Jan-17 16:38 UTC
Re: [libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats
2018-01-17 17:32 GMT+02:00 Michal Privoznik <mprivozn@redhat.com>:> Sounds like qemuConnectGetAllDomainStats() forgot to unset the job on a > domain. Can you please attach gdb to libvirtd and run 't a a bt' and > share the output? > > Michal >This is output of gdb 't a a bt' ~5 seconds after that error message appeared in log with remoteDispatchConnectGetAllDomainStats() mentioned. Thread 18 (Thread 0x7fd395ffb700 (LWP 3803)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7fd3967fc700 (LWP 2858)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3a6589a20 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so #3 0x00007fd3bc5b2de2 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7fd3977fe700 (LWP 2854)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7fd397fff700 (LWP 2852)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7fd3a4bbd700 (LWP 2851)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7fd3a53be700 (LWP 2850)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7fd3a5bbf700 (LWP 2849)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7fd3ae7fc700 (LWP 2848)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7fd3aeffd700 (LWP 2847)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7fd3af7fe700 (LWP 2846)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7fd3affff700 (LWP 2845)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7fd3b4fef700 (LWP 2844)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7fd3a7fff700 (LWP 2843)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fd3b57f0700 (LWP 2842)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7fd3b5ff1700 (LWP 2841)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fd3b67f2700 (LWP 2840)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fd3b6ff3700 (LWP 2839)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3a5c7776b in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007fd3a5c8b355 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007fd3a5c8d85f in qemuMonitorJSONGetMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007fd3a5c9efc3 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007fd3a5c9f18a in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007fd3a5c9f77f in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #8 0x00007fd3bc672abe in virDomainListGetStats () from /usr/lib64/libvirt.so.0 #9 0x000055f1c9fbb3b5 in ?? () #10 0x00007fd3bc6cd308 in virNetServerProgramDispatch () from /usr/lib64/libvirt.so.0 #11 0x000055f1c9feaef8 in ?? () #12 0x00007fd3bc5b3a2b in ?? () from /usr/lib64/libvirt.so.0 #13 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #14 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fd3bd295800 (LWP 2838)): #0 0x00007fd3bb9409cc in poll () from /lib64/libc.so.6 #1 0x00007fd3bc55cc84 in virEventPollRunOnce () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc55b971 in virEventRunDefaultImpl () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc6c864d in virNetDaemonRun () from /usr/lib64/libvirt.so.0 #4 0x000055f1c9fb1700 in ?? () #5 0x00007fd3bb869fe1 in __libc_start_main () from /lib64/libc.so.6 #6 0x000055f1c9fb1ffa in ?? ()
Ján Tomko
2018-Jan-18 07:25 UTC
Re: [libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats
On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:>Hello libvirt-users list, > >We're catching the same bug since 3.4.0 version (3.3.0 works OK). >So, we have process that is permanently connected to libvirtd via socket >and it is collecting stats, listening to events and control the VPSes. > >When we try to 'shutdown' a number of VPSes we often catch the bug. One of >VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, >and there is the next error in the log: > >Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: >20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data >Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >20441: error : virFileReadAll:1420 : Failed to open file >'/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >No such file or directory >Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >20441: error : virCgroupGetValueStr:844 : Unable to read from >'/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >No such file or directory >Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu >account: Operation not permitted >Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job >(destroy, none) for domain DOMAIN1; current job is (query, none) owned by >(20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) >Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during >operation: cannot acquire state change lock (held by >remoteDispatchConnectGetAllDomainStats) > >I think only the last line matters. >The bug is highly reproducible. We can easily catch it even when we call >multiple 'virsh shutdown' in shell one by one. > >When we shutdown the process connected to the socket - everything become OK >and the bug is gone. > >The system is used is Gentoo Linux, tried all modern versions of libvirt >(3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) >and they have this bug. 3.3.0 works OK. >I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored. Can you try running 'git bisect' to pinpoint the exact commit that caused this issue? Jan>Thanks for any help in advance. >We can send any additional info if needed. > >~Serhii>_______________________________________________ >libvirt-users mailing list >libvirt-users@redhat.com >https://www.redhat.com/mailman/listinfo/libvirt-users
Michal Privoznik
2018-Jan-18 15:49 UTC
Re: [libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats
On 01/18/2018 08:25 AM, Ján Tomko wrote:> On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote: >> Hello libvirt-users list, >> >> We're catching the same bug since 3.4.0 version (3.3.0 works OK). >> So, we have process that is permanently connected to libvirtd via socket >> and it is collecting stats, listening to events and control the VPSes. >> >> When we try to 'shutdown' a number of VPSes we often catch the bug. >> One of >> VPSes sticks in 'in shutdown' state, no related 'qemu' process is >> present, >> and there is the next error in the log: >> >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: >> 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status >> data >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virFileReadAll:1420 : Failed to open file >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >> >> No such file or directory >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virCgroupGetValueStr:844 : Unable to read from >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >> >> No such file or directory >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu >> account: Operation not permitted >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >> 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job >> (destroy, none) for domain DOMAIN1; current job is (query, none) owned by >> (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >> 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during >> operation: cannot acquire state change lock (held by >> remoteDispatchConnectGetAllDomainStats) >> >> I think only the last line matters. >> The bug is highly reproducible. We can easily catch it even when we call >> multiple 'virsh shutdown' in shell one by one. >> >> When we shutdown the process connected to the socket - everything >> become OK >> and the bug is gone. >> >> The system is used is Gentoo Linux, tried all modern versions of libvirt >> (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from >> git)) >> and they have this bug. 3.3.0 works OK. >> > > I don't see anything obvious stats related in the diff between 3.3.0 and > 3.4.0. We have added reporting of the shutdown reason, but that's just > parsing one more JSON reply we previously ignored. > > Can you try running 'git bisect' to pinpoint the exact commit that > caused this issue?I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1. https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d55f812500b But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue. Michal