On Tue, Jul 20, 2021 at 15:52:58 +0800, Jiatong Shen
wrote:> Hello community,
>
> I am seeing following log in production,
>
> 2021-07-20 07:43:49.417+0000: 3918294: error :
> qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot
> acquire state change lock (held by qemuProcessReconnect)
> 2021-07-20 07:44:19.424+0000: 3918296: warning :
> qemuDomainObjBeginJobInternal:4933 : Cannot start job (modify, none) for
> domain instance-0000074e; current job is (modify, none) owned by (3919429
> qemuProcessReconnect, 0 <null>) for (2183193s, 0s)
> 2021-07-20 07:44:19.424+0000: 3918296: error :
> qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot
> acquire state change lock (held by qemuProcessReconnect)
> 2021-07-20 07:44:49.428+0000: 3918296: warning :
> qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, none) for
> domain instance-0000074e; current job is (modify, none) owned by (3919429
> qemuProcessReconnect, 0 <null>) for (2183223s, 0s)
> 2021-07-20 07:44:49.428+0000: 3918296: error :
> qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot
> acquire state change lock (held by qemuProcessReconnect)
> 2021-07-20 07:45:19.429+0000: 3918298: warning :
> qemuDomainObjBeginJobInternal:4933 :
>
> I am confused about what is qemuProcessReconnect and why it acquires a
> domain state lock..
qemuProcessReconnect is an operation that is executed in a separate
thread which re-establishes connection to a qemu process once you
restart libvirtd.
This usually means that the reconnection process got stuck for some
reason. Unfortunately your log doesn't show why and unless you've got a
debug log prior to that happening it won't be possible.
Theoretically seeing which function the thread doing the reconnect is
stuck at could perhaps show why.
Usually the only fix for this is to destroy the VM which is stuck.