Lentes, Bernd
2019-Jun-11 17:19 UTC
Re: [libvirt-users] blockcommit of domain not successfull
----- On Jun 5, 2019, at 4:49 PM, Peter Krempa pkrempa@redhat.com wrote:> On Wed, Jun 05, 2019 at 13:33:49 +0200, Lentes, Bernd wrote: >> Hi Peter, >> >> thanks for your help. >> >> ----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote: > > [...] > >> >> > >> > So that's interresting. Usually assertion failure in qemu leads to >> > calling abort() and thus the vm would have crashed. Didn't you HA >> > solution restart it? >> >> No. As said the VM didn't crash. It kept running. > > That's interresting. I hope you manage to reproduce it then. > >> >> > At any rate it would be really beneficial if you could collect debug >> > logs for libvirtd which also contain the monitor interactions with qemu: >> > >> > https://wiki.libvirt.org/page/DebugLogs >> > >> > The qemu assertion failure above should ideally be reported to qemu, but >> > if you are able to reproduce the problem with libvirtd debug logs >> > enabled I can extract more useful info from there which the qemu project >> > would ask you anyways. >> >> I can't reproduce it. It seems to happen accidentally. But i can collect the >> logs. Do they get very large ? >> I can contact you the next time it happen. Is that ok for you ? > > Unfortunately they do get very large if there's some monitoring > gathering stats through libvirt, but it's okay to nuke them prior > to attempting the block commit, or daily or so. > > Please do contact me if you gather anything interresting.Hi, it happened again. Following the log of my script it started on 8th of june at 5:59:09 (UTC+2) to blockcommit the domain. These are the related lines in libvirtd.log: ==================================================2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, 0s) 2019-06-08 03:59:56.095+0000: 27956: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort) 2019-06-08 03:59:56.325+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.372+0000: 30304: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 04:00:26.503+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data =================================================== Since then the script is stuck. Thanks for your help. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Peter Krempa
2019-Jun-13 07:56 UTC
Re: [libvirt-users] blockcommit of domain not successfull
On Tue, Jun 11, 2019 at 19:19:05 +0200, Lentes, Bernd wrote: [...]> > Hi,Hi,> > it happened again. > Following the log of my script it started on 8th of june at 5:59:09 (UTC+2) to blockcommit the domain. > These are the related lines in libvirtd.log: > ==================================================Thanks for comming back to me with the information. Unfortunately this is not a full debug log but I can try to tell you what I see here:> 2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor > 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitorSo this looks like qemu crashed. Or at least it's the usual symptom we get. Is there anything in /var/log/libvirt/qemu/$VMNAME.log?> 2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data > 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data > 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, > 0s)And this looks to me as if the Abort job can't be interrupted properly while waiting synchronously for the job to finish. This seems to be the problem. If the VM indeed crashed there's a problem in job waiting apparently. I'd still really like to have debug logs in this case to really see what happened.
Lentes, Bernd
2019-Jun-13 11:08 UTC
Re: [libvirt-users] blockcommit of domain not successfull
----- On Jun 13, 2019, at 9:56 AM, Peter Krempa pkrempa@redhat.com wrote:> > Thanks for comming back to me with the information. > > Unfortunately this is not a full debug log but I can try to tell you > what I see here:I configured libvirtd that way: ha-idg-1:~ # grep -Ev '^$|#' /etc/libvirt/libvirtd.conf log_level = 1 log_filters="1:qemu 3:remote 4:event 3:util.json 3:rpc" log_outputs="1:file:/var/log/libvirt/libvirtd.log" keepalive_interval = -1 That's what i found on https://wiki.libvirt.org/page/DebugLogs . Isn't that correct ? That should create informative logfiles. The other host has excat the same configuration but produce much bigger logfiles !?! I have libvirt-daemon-4.0.0-8.12.1.x86_64.> >> 2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: >> End of file from qemu monitor >> 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: >> End of file from qemu monitor > > So this looks like qemu crashed. Or at least it's the usual symptom we > get. Is there anything in /var/log/libvirt/qemu/$VMNAME.log?That's all: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.> >> 2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot >> parse process status data >> 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot >> parse process status data >> 2019-06-08 03:59:56.095+0000: 27956: warning : >> qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for >> domain severin; current job is (modify, none) owned by (13061 >> remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, >> 0s) > > And this looks to me as if the Abort job can't be interrupted properly > while waiting synchronously for the job to finish. This seems to be the > problem. If the VM indeed crashed there's a problem in job waiting > apparently. > > I'd still really like to have debug logs in this case to really see what > happened.I configured logging as i found on https://wiki.libvirt.org/page/DebugLogs. What else can i do ? Bernd Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671