thr3ads.net - libvirt users - Re: [libvirt-users] blockcommit of domain not successfull [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Lentes, Bernd

2019-Jun-04 12:44 UTC

[libvirt-users] blockcommit of domain not successfull

Hi,

i have several domains running on a 2-node HA-cluster.
Each night i create snapshots of the domains, after copying the consistent raw
file to a CIFS server i blockcommit the changes into the raw files.
That's running quite well.
But recent the blockcommit didn't work for one domain:
I create a logfile from the whole procedure:
============================================================== ...
Sat Jun  1 03:05:24 CEST 2019
Target     Source
------------------------------------------------
vdb        /mnt/snap/severin.sn
hdc        -

/usr/bin/virsh blockcommit severin /mnt/snap/severin.sn --verbose --active
--pivot
Block commit: [  0 %]Block commit: [ 15 %]Block commit: [ 28 %]Block commit: [
35 %]Block commit: [ 43 %]Block commit: [ 53 %]Block commit: [ 63 %]Block
commit: [ 73 %]Block commit: [ 82 %]Block commit: [ 89 %]Block commit: [ 98
%]Block commit: [100 %]Target     Source
------------------------------------------------
vdb        /mnt/snap/severin.sn
 ...
=============================================================
The libvirtd-log says (it's UTC IIRC):
============================================================ ...
2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error:
End of file from qemu monitor
2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error:
End of file from qemu monitor
2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:06:11.438+0000: 26112: warning :
qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma
in severin; current job is (modify, none) owned by (5372
remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s)
2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877
: Timed out during operation: cannot acquire
state change lock (held by remoteDispatchDomainBlockJobAbort)
2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot
parse process status data
 ...
==========================================================and "cannot parse
process status data" continuously until the end of the logfile.

The syslog from the domain itself didn't reveal anything, it just continues
to run.
The libvirt log from the domains just says:
qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion
`((&bs->tracked_requests)->lh_first == ((void *)0))' failed.

Hosts are SLES 12 SP4 with libvirt-daemon-4.0.0-8.9.1.x86_64.


Bernd




-- 

Bernd Lentes 
Systemadministration 
Institut für Entwicklungsgenetik 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum münchen 
bernd.lentes@helmholtz-muenchen.de 
phone: +49 89 3187 1241 
phone: +49 89 3187 3827 
fax: +49 89 3187 2294 
http://www.helmholtz-muenchen.de/idg 

wer Fehler macht kann etwas lernen 
wer nichts macht kann auch nichts lernen
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler,
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

Peter Krempa

2019-Jun-05 07:27 UTC

head link

Re: [libvirt-users] blockcommit of domain not successfull

On Tue, Jun 04, 2019 at 14:44:29 +0200, Lentes, Bernd
wrote:> Hi,
Hi, 
> 
> i have several domains running on a 2-node HA-cluster.
> Each night i create snapshots of the domains, after copying the consistent
raw file to a CIFS server i blockcommit the changes into the raw files.
> That's running quite well.
> But recent the blockcommit didn't work for one domain:
> I create a logfile from the whole procedure:
> ==============================================================>  ...
> Sat Jun  1 03:05:24 CEST 2019
> Target     Source
> ------------------------------------------------
> vdb        /mnt/snap/severin.sn
> hdc        -
> 
> /usr/bin/virsh blockcommit severin /mnt/snap/severin.sn --verbose --active
--pivot
> Block commit: [  0 %]Block commit: [ 15 %]Block commit: [ 28 %]Block
commit: [ 35 %]Block commit: [ 43 %]Block commit: [ 53 %]Block commit: [ 63
%]Block commit: [ 73 %]Block commit: [ 82 %]Block commit: [ 89 %]Block commit: [
98 %]Block commit: [100 %]Target     Source
> ------------------------------------------------
> vdb        /mnt/snap/severin.sn
>  ...
> =============================================================> 
> The libvirtd-log says (it's UTC IIRC):
> ============================================================>  ...
> 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal
error: End of file from qemu monitor
> 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal
error: End of file from qemu monitor
This message is printed if qemu crashes for some reason and then closes
the monitor socket unexpectedly.
> 2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:06:11.438+0000: 26112: warning :
qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma
> in severin; current job is (modify, none) owned by (5372
remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s)
> 2019-06-01 01:06:11.438+0000: 26112: error :
qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire
> state change lock (held by remoteDispatchDomainBlockJobAbort)
So this means that the virDomainBlockJobAbort API which is also used for
--pivot got stuck for some time.

This is kind of strange if the VM crashed, there might also be a bug in
the synchronous block job handling, but it's hard to tell from this log.
> 2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
> 2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 :
cannot parse process status data
>  ...
> ==========================================================> and
"cannot parse process status data" continuously until the end of the
logfile.
> 
> The syslog from the domain itself didn't reveal anything, it just
continues to run.
> The libvirt log from the domains just says:
> qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion
`((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
So that's interresting. Usually assertion failure in qemu leads to
calling abort() and thus the vm would have crashed. Didn't you HA
solution restart it?

At any rate it would be really beneficial if you could collect debug
logs for libvirtd which also contain the monitor interactions with qemu:

https://wiki.libvirt.org/page/DebugLogs

The qemu assertion failure above should ideally be reported to qemu, but
if you are able to reproduce the problem with libvirtd debug logs
enabled I can extract more useful info from there which the qemu project
would ask you anyways.

Lentes, Bernd

2019-Jun-05 11:33 UTC

head link

Re: [libvirt-users] blockcommit of domain not successfull

Hi Peter,

thanks for your help.

----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:

>> ============================================================>> 
...
>> 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 :
internal error:
>> End of file from qemu monitor
>> 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 :
internal error:
>> End of file from qemu monitor
> 
> This message is printed if qemu crashes for some reason and then closes
> the monitor socket unexpectedly.
> 
>> 2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461
: cannot
>> parse process status data
>> 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461
: cannot
>> parse process status data
>> 2019-06-01 01:06:11.438+0000: 26112: warning :
>> qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none)
for doma
>> in severin; current job is (modify, none) owned by (5372
>> remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s)
>> 2019-06-01 01:06:11.438+0000: 26112: error :
qemuDomainObjBeginJobInternal:4877
>> : Timed out during operation: cannot acquire
>> state change lock (held by remoteDispatchDomainBlockJobAbort)
> 
> So this means that the virDomainBlockJobAbort API which is also used for
> --pivot got stuck for some time.
> 
> This is kind of strange if the VM crashed, there might also be a bug in
> the synchronous block job handling, but it's hard to tell from this
log.
The VM didn't crash. It kept running.
See "last":
root     pts/49       ha-idg-2.scidom. Tue Jun  4 14:02 - 13:18  (23:16)
root     pts/47       pc60337.scidom.d Mon Jun  3 15:13   still logged in
reboot   system boot  2.6.4-52-smp     Wed May 15 20:19         (20+17:02)
reboot   system boot  2.6.4-52-smp     Fri Mar 15 17:38         (81+18:44)
reboot   system boot  2.6.4-52-smp     Wed Feb 27 20:29         (15+21:04)
>> The syslog from the domain itself didn't reveal anything, it just
continues to
>> run.
>> The libvirt log from the domains just says:
>> qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion
>> `((&bs->tracked_requests)->lh_first == ((void *)0))'
failed.
> 
> So that's interresting. Usually assertion failure in qemu leads to
> calling abort() and thus the vm would have crashed. Didn't you HA
> solution restart it?
No. As said the VM didn't crash. It kept running.
 > At any rate it would be really beneficial if you could collect debug
> logs for libvirtd which also contain the monitor interactions with qemu:
> 
> https://wiki.libvirt.org/page/DebugLogs
> 
> The qemu assertion failure above should ideally be reported to qemu, but
> if you are able to reproduce the problem with libvirtd debug logs
> enabled I can extract more useful info from there which the qemu project
> would ask you anyways.
I can't reproduce it. It seems to happen accidentally. But i can collect the
logs. Do they get very large ?
I can contact you the next time it happen. Is that ok for you ?


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler,
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

Possibly Parallel Threads

Search for more seemingly similar threads

libvirt users - Jun 2019 - Re: blockcommit of domain not successfull

[libvirt-users] blockcommit of domain not successfull

Re: [libvirt-users] blockcommit of domain not successfull

Re: [libvirt-users] blockcommit of domain not successfull

Possibly Parallel Threads