Lentes, Bernd
2019-May-13 13:34 UTC
[libvirt-users] domains paused without any obvious reason
Hi, i have a two node HA-Cluster with several domains as resources. Currently it's running in test mode. Some domains (all on the same host) stopped running, virsh list shows them as "paused". All stopped at the same time (11th of may, 7:00 am), my monitoring system began to yell. I don't have any clue why this happened. virsh domblkerror says for all the domains (5) "no space". The days before the domains were running fine and i know that all disks inside the domain should have enough space. Also the host is not running out of space. The logs don't say anything sensefully, unfortunately i didn't have a log for the libvirtd daemon, i just configured that now. The domains are stopped each day by cron at 10:30 pm for a short moment, a snapshot is taken, domains are started again, the backing file is copied to a CIFS server and if that is finished the snapshot is blockcommited into the backing file. That's working fine already for several days. This cronjob creates a log and it's looking fine. The domains reside in naked Logical Volumes, the respective Volume Group has enough space. Bernd -- Bernd Lentes Systemadministration Institut für Entwicklungsgenetik Gebäude 35.34 - Raum 208 HelmholtzZentrum münchen bernd.lentes@helmholtz-muenchen.de phone: +49 89 3187 1241 phone: +49 89 3187 3827 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/idg wer Fehler macht kann etwas lernen wer nichts macht kann auch nichts lernen Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Lentes, Bernd
2019-May-13 16:19 UTC
Re: [libvirt-users] domains paused without any obvious reason
----- On May 13, 2019, at 3:34 PM, Bernd Lentes bernd.lentes@helmholtz-muenchen.de wrote:> Hi, > > i have a two node HA-Cluster with several domains as resources. > Currently it's running in test mode. > Some domains (all on the same host) stopped running, virsh list shows them as > "paused". > All stopped at the same time (11th of may, 7:00 am), my monitoring system began > to yell. > I don't have any clue why this happened. > virsh domblkerror says for all the domains (5) "no space". The days before the > domains were running fine and i know that all disks inside the domain should > have enough space. > Also the host is not running out of space. > The logs don't say anything sensefully, unfortunately i didn't have a log for > the libvirtd daemon, i just configured that now. > The domains are stopped each day by cron at 10:30 pm for a short moment, a > snapshot is taken, domains are started again, the backing file is copied to a > CIFS server and if that is finished the snapshot is blockcommited into the > backing file. > That's working fine already for several days. This cronjob creates a log and > it's looking fine. > The domains reside in naked Logical Volumes, the respective Volume Group has > enough space. > >I resumed one of the guests and it continued without any problem. The log doesn't indicate any problem, and df -h shows enough space on all partitions. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Daniel P. Berrangé
2019-May-14 09:08 UTC
Re: [libvirt-users] domains paused without any obvious reason
On Mon, May 13, 2019 at 06:19:05PM +0200, Lentes, Bernd wrote:> > > ----- On May 13, 2019, at 3:34 PM, Bernd Lentes bernd.lentes@helmholtz-muenchen.de wrote: > > > Hi, > > > > i have a two node HA-Cluster with several domains as resources. > > Currently it's running in test mode. > > Some domains (all on the same host) stopped running, virsh list shows them as > > "paused". > > All stopped at the same time (11th of may, 7:00 am), my monitoring system began > > to yell. > > I don't have any clue why this happened. > > virsh domblkerror says for all the domains (5) "no space". The days before the > > domains were running fine and i know that all disks inside the domain should > > have enough space. > > Also the host is not running out of space. > > The logs don't say anything sensefully, unfortunately i didn't have a log for > > the libvirtd daemon, i just configured that now. > > The domains are stopped each day by cron at 10:30 pm for a short moment, a > > snapshot is taken, domains are started again, the backing file is copied to a > > CIFS server and if that is finished the snapshot is blockcommited into the > > backing file. > > That's working fine already for several days. This cronjob creates a log and > > it's looking fine. > > The domains reside in naked Logical Volumes, the respective Volume Group has > > enough space. > > > > > > I resumed one of the guests and it continued without any problem. > The log doesn't indicate any problem, and df -h shows enough space on > all partitions.'virsh domstate --reason $GUEST' will tell you what event caused the guest to pause in the first place. If you can resume successfully, this indicates the event was a transient problem. Given the domblkerror message 'no space' I'm it looks that you had a problem running out of disk space temporarily which then resolved itself. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|