Hey guys, i have a two-node cluster with around 20 domains. Cluster-Software is pacemaker and corosync, OS is SLES 12 SP5. The scripts for starting/stopping the domains use virsh. Is there a way to reliably shutdown the domains via virsh ? I'm testing around, but sometimes the domains stop, sometimes they don't, sometimes it takes very long so that the cluster times out and fence the respective node. I'm using "virsh shutdown domain --mode acpi,agent" for the windows domains (not reliable) and for the linux domains "virsh shutdown domain", also not reliable. What can i do ? Bernd -- Bernd Lentes System Administrator Institute for Metabolism and Cell Death (MCD) Building 25 - office 122 HelmholtzZentrum M?nchen bernd.lentes at helmholtz-muenchen.de phone: +49 89 3187 1241 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/mcd Public key: 30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 fc cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3 a7 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92 67 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53 89 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2 e3 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2 fa 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2 67 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85 08 d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac 58 f1 38 43 0e 72 af 02 03 01 00 01 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2217 bytes Desc: S/MIME Cryptographic Signature URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20220308/887f8397/attachment.p7s>
On 3/8/22 20:25, Lentes, Bernd wrote:> Hey guys, > > i have a two-node cluster with around 20 domains. Cluster-Software is pacemaker and corosync, OS is SLES 12 SP5. > The scripts for starting/stopping the domains use virsh. Is there a way to reliably shutdown the domains via virsh ? > I'm testing around, but sometimes the domains stop, sometimes they don't, sometimes it takes very long so that the cluster times out > and fence the respective node. > I'm using "virsh shutdown domain --mode acpi,agent" for the windows domains (not reliable) and for the linux domains > "virsh shutdown domain", also not reliable.In general, if agent is available it's more likely to succeed than acpi, because the shutdown is initiated from inside the guest (/sbin/shutdown is invoked for UNIX-like systems, or Windows equivalent) while with ACPI a guest can simply chose to ignore it. In fact, that's what libvirt does - whenever agent method is available (i.e. --mode contains agent, or no specific --mode was requested = libvirt is free to chose), it is preferred. I assume you don't see any errors reported by virsh and thus "sometimes it takes very long" [to shutdown a guest] could mean that guests are under heavy load, e.g. they are syncing disks before shutdown, stopping services, etc. I don't think I have a good answer for you until the root cause if found. Michal
"Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> writes:> i have a two-node cluster with around 20 domains. Cluster-Software is > pacemaker and corosync, OS is SLES 12 SP5. The scripts for starting/ > stopping the domains use virsh. Is there a way to reliably shutdown > the domains via virsh ?I issue virsh shutdown commands periodically until the Pacemaker stop operation timeout - 10 s, then switch over to virsh destroy. This way the domain is eventually stopped whether it cooperates or not. Except when libvirt itself is blocked, which happened with some old versions, but fencing is a reasonable solution in such cases. -- Feri