Wiebe Cazemier
2025-Oct-24 07:40 UTC
[Pkg-xen-devel] Bug#1118711: xen-utils-common: Xen domains shutdown fails (/run/xen/qmp-libxl-* missing); mdadm not stopped, needs resync on boot
Package: xen-utils-common Version: 4.17.5+23-ga4e5191dc0-1+deb12u1 Severity: normal Tags: upstream Dear Maintainer, On a server with mdadm+lvm with Xen domain storage on logical volumes, shutting it down and starting it again, the mdadm volume was reported as dirty and needed a resync. I was debating whether this falls under the 'data loss' justification of 'serious', but I'll let you decide. Also, I know it's Debian 12, but being so specific, I still wanted to report it. The problem is that when shutting down xendomains, it has apparently lost control over two domains and can't shut them down. The following sequence shows the QMP socket errors, and skipping deactivation of the volume groups: # journalctl --since '2025-10-22 18:00:00' | grep -E '(blkdeactivate|xendomains|-- Boot)' Oct 22 19:48:45 brick systemd[1]: Stopping xendomains.service - LSB: Start/stop secondary xen domains... Oct 22 19:48:45 brick blkdeactivate[312015]: Deactivating block devices: Oct 22 19:48:46 brick blkdeactivate[312015]: [SKIP]: unmount of md1 (md1) mounted on [SWAP] Oct 22 19:48:47 brick xendomains[312070]: libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 5:Failed to connect to QMP socket /var/run/xen/qmp-libxl-5: No such file or directory Oct 22 19:48:47 brick xendomains[312070]: libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 6:Failed to connect to QMP socket /var/run/xen/qmp-libxl-6: No such file or directory Oct 22 19:48:47 brick xendomains[312067]: Shutting down Xen domain geborsteldstaal (1)... Oct 22 19:48:47 brick xendomains[312103]: Shutting down domain 1 Oct 22 19:48:47 brick xendomains[312067]: done. Oct 22 19:48:47 brick xendomains[312067]: Shutting down Xen domain gold (2)... Oct 22 19:48:47 brick xendomains[312105]: Shutting down domain 2 Oct 22 19:48:47 brick xendomains[312067]: done. Oct 22 19:48:47 brick xendomains[312067]: Shutting down Xen domain meel (3)... Oct 22 19:48:47 brick xendomains[312107]: Shutting down domain 3 Oct 22 19:48:47 brick xendomains[312067]: done. Oct 22 19:48:47 brick xendomains[312067]: Shutting down Xen domain wood (4)... Oct 22 19:48:47 brick xendomains[312109]: Shutting down domain 4 Oct 22 19:48:47 brick xendomains[312067]: done. Oct 22 19:48:47 brick blkdeactivate[312015]: [UMOUNT]: unmounting big-decrypted (dm-12) mounted on /mnt/big... done Oct 22 19:48:48 brick xendomains[312113]: libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 5:Failed to connect to QMP socket /var/run/xen/qmp-libxl-5: No such file or directory Oct 22 19:48:48 brick xendomains[312113]: libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 6:Failed to connect to QMP socket /var/run/xen/qmp-libxl-6: No such file or directory Oct 22 19:48:49 brick blkdeactivate[312015]: [UMOUNT]: unmounting md0 (md0) mounted on /boot... done Oct 22 19:48:49 brick blkdeactivate[312015]: [SKIP]: unmount of md2 (md2) mounted on / Oct 22 19:48:49 brick blkdeactivate[312015]: [MD]: deactivating raid1 device md0... done Oct 22 19:49:00 brick blkdeactivate[312015]: [DM]: deactivating crypt device big-decrypted (dm-12)... done Oct 22 19:49:02 brick blkdeactivate[312015]: [LVM]: deactivating Volume Group universe2... skipping Oct 22 19:49:03 brick blkdeactivate[312015]: [LVM]: deactivating Volume Group universe... skipping Oct 22 19:50:20 brick xendomains[312112]: Waiting for Xen domain geborsteldstaal (1) to shut down.................................................................................................................................................................................................................................................................................done. Oct 22 19:50:20 brick xendomains[312112]: Waiting for Xen domain gold (2) to shut down...done. Oct 22 19:50:20 brick xendomains[312112]: Waiting for Xen domain meel (3) to shut down...done. Oct 22 19:50:20 brick xendomains[312112]: Waiting for Xen domain wood (4) to shut down...done. Oct 22 19:50:20 brick systemd[1]: xendomains.service: Deactivated successfully. Oct 22 19:50:20 brick systemd[1]: xendomains.service: Unit process 1769 (xl) remains running after unit stopped. Oct 22 19:50:20 brick systemd[1]: xendomains.service: Unit process 312720 (xl) remains running after unit stopped. Oct 22 19:50:20 brick systemd[1]: Stopped xendomains.service - LSB: Start/stop secondary xen domains. Oct 22 19:50:20 brick systemd[1]: xendomains.service: Consumed 4min 48.836s CPU time. -- Boot 7cd2b4335f3d4f8aa735a24b9b57dae6 -- Note that the array in question was /dev/md3, which is not mentioned here before the reboot. Not sure why. See the errors about /var/run/xen/qmp-libxl-*. I happen to know which id 5 and 6 were, and these domains were indeed online and operating normally before the shutdown. Because they were still online, the md+lvm stack was unable to be stopped, and shutdown proceeded. Then on boot, the array was marked as dirty and started resyncing, which is visible on boot: # journalctl --since '2025-10-22 18:00:00' | grep -E 'md3' Oct 22 20:14:07 brick kernel: md/raid1:md3: not clean -- starting background reconstruction Oct 22 20:14:07 brick kernel: md/raid1:md3: active with 2 out of 2 mirrors Oct 22 20:14:07 brick kernel: md3: detected capacity change from 0 to 3800903680 Oct 22 20:14:12 brick lvm[655]: PV /dev/md3 online, VG universe is complete. Oct 22 20:14:47 brick kernel: md: resync of RAID array md3 Oct 23 01:02:16 brick kernel: md: md3: resync done. I have no idea how to reproduce it, and being a production server, I can't really. Perhaps this is also a bug elsewhere in blkdeactivate, that it should force all volume groups to turn off, so that mdadm can be stopped? -- System Information: Debian Release: 12.12 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable-security'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-40-amd64 (SMP w/8 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages xen-utils-common depends on: ii libc6 2.36-9+deb12u13 ii libxenhypfs1 4.17.5+23-ga4e5191dc0-1+deb12u1 ii libxenstore4 4.17.5+23-ga4e5191dc0-1+deb12u1 ii lsb-base 11.6 ii python3 3.11.2-1+b1 ii sysvinit-utils [lsb-base] 3.06-4 ii ucf 3.0043+nmu1+deb12u1 ii udev 252.39-1~deb12u1 ii xenstore-utils 4.17.5+23-ga4e5191dc0-1+deb12u1 xen-utils-common recommends no packages. Versions of packages xen-utils-common suggests: pn xen-doc <none> -- Configuration Files: /etc/default/xendomains changed [not included] /etc/xen/xend-config.sxp changed [not included] -- no debconf information