Maximilian Engelhardt
2021-Feb-13 15:36 UTC
[Pkg-xen-devel] [BUG] Linux pvh vm not getting destroyed on shutdown
Hi, after a recent upgrade of one of our test systems to Debian Bullseye we noticed an issue where on shutdown of a pvh vm the vm was not destroyed by xen automatically. It could still be destroyed by manually issuing a 'xl destroy $vm' command. We can reproduce the hang reliably with the following vm configuration: type = 'pvh' memory = '512' kernel = '/usr/lib/grub-xen/grub-i386-xen_pvh.bin' [... disk/name/vif ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' vcpus = '1' maxvcpus = '2' And then issuing a shutdown command in the vm (e.g. by calling 'poweroff') Here are some things I noticed while trying to debug this issue: * It happens on a Debian buster dom0 as well as on a bullseye dom0 * It seems to only affect pvh vms. * shutdown from the pvgrub menu ("c" -> "halt") does work * the vm seems to shut down normal, the last lines in the console are: [ 228.461167] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached. [ 228.476794] systemd-shutdown[1]: Syncing filesystems and block devices. [ 228.477878] systemd-shutdown[1]: Powering off. [ 233.709498] xenbus_probe_frontend: xenbus_frontend_dev_shutdown: device/ vif/0 timeout closing device [ 233.745642] reboot: System halted * issuing a reboot instead of a shutdown does work fine. * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, Debian kernel 5.7.17-1 does not show the issue. * setting vcpus equal to maxvcpus does *not* show the hang. Below is the output of "xl debug-keys q; xl dmesg" for the affected vm in the 'hang' state as suggested by andyhhp on #xen to attach to this bug report: (XEN) General information for domain 55: (XEN) refcnt=3 dying=0 pause_count=0 (XEN) nr_pages=131088 xenheap_pages=4 shared_pages=0 paged_pages=0 dirty_cpus={} max_pages=131328 (XEN) handle=275e3a73-247f-4649-af86-6d5c0c72e8e4 vm_assist=00000020 (XEN) paging assistance: hap refcounts translate external (XEN) Rangesets belonging to domain 55: (XEN) Interrupts { } (XEN) I/O Memory { } (XEN) I/O Ports { } (XEN) log-dirty { } (XEN) Memory pages belonging to domain 55: (XEN) DomPage list too long to display (XEN) PoD entries=0 cachesize=0 (XEN) XenPage 0000000000080125: caf=c000000000000001, taf=e400000000000001 (XEN) XenPage 00000000001412c9: caf=c000000000000001, taf=e400000000000001 (XEN) XenPage 0000000000140da0: caf=c000000000000001, taf=e400000000000001 (XEN) XenPage 0000000000140d9a: caf=c000000000000001, taf=e400000000000001 (XEN) ExtraPage 00000000001412d3: caf=8040000000000002, taf=e400000000000001 (XEN) NODE affinity for domain 55: [0] (XEN) VCPU information and callbacks for domain 55: (XEN) UNIT0 affinities: hard={0-7} soft={0-3} (XEN) VCPU0: CPU2 [has=F] poll=0 upcall_pend=01 upcall_mask=00 (XEN) pause_count=0 pause_flags=2 (XEN) paging assistance: hap, 4 levels (XEN) No periodic timer (XEN) UNIT1 affinities: hard={0-7} soft={0-3} (XEN) VCPU1: CPU1 [has=F] poll=0 upcall_pend=00 upcall_mask=00 (XEN) pause_count=0 pause_flags=1 (XEN) paging assistance: hap, 4 levels (XEN) No periodic timer Please let me know if more information is necessary. Thanks, Maxi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210213/8f7a03d4/attachment.sig>
Elliott Mitchell
2021-Feb-13 18:21 UTC
[Pkg-xen-devel] [BUG] Linux pvh vm not getting destroyed on shutdown
On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:> after a recent upgrade of one of our test systems to Debian Bullseye we > noticed an issue where on shutdown of a pvh vm the vm was not destroyed by xen > automatically. It could still be destroyed by manually issuing a 'xl destroy > $vm' command.Usually I would expect such an issue to show on the Debian bug database before xen-devel. In particular as this is a behavior change with security updates, there is a good chance this isn't attributable to the Xen Project. Additionally the Xen Project's support window is rather narrow. I've been observing the same (or similar) issue for a bit too.> Here are some things I noticed while trying to debug this issue: > > * It happens on a Debian buster dom0 as well as on a bullseye dom0I stick with stable on non-development machines, so I can't say anything to this.> * It seems to only affect pvh vms.I've observed it with pv and hvm VMs as well.> * shutdown from the pvgrub menu ("c" -> "halt") does workWoah! That is quite the observation. Since I had a handy opportunity I tried this and this reproduces for me.> * the vm seems to shut down normal, the last lines in the console are:I agree with this. Everything appears typical until the last moment.> * issuing a reboot instead of a shutdown does work fine.I disagree with this. I'm seeing the issue occur with restart attempts too.> * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, Debian > kernel 5.7.17-1 does not show the issue.I think the first kernel update during which I saw the issue was around linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64. I think the last security update to the Xen packages was in a similar timeframe though. Rate this portion as unreliable though. I can definitely state this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built from corresponding source, this may have shown earlier.> * setting vcpus equal to maxvcpus does *not* show the hang.I haven't tried things related to this, so I can't comment on this part. Fresh observation. During a similar timeframe I started noticing VM creation leaving a `xl create` process behind. I had discovered this process could be freely killed without appearing to effect the VM and had thus been doing so (memory in a lean Dom0 is precious). While typing this I realized there was another scenario I needed to try. Turns out if I boot PV GRUB and get to its command-line (press 'c'), then get away from the VM console, kill the `xl create` process, return to the console and type "halt". This results in a hung VM. Are you perhaps either killing the `xl create` process for effected VMs, or migrating the VM and thus splitting the `xl create` process from the effected VMs? This seems more a Debian issue than a Xen Project issue right now. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Maximilian Engelhardt
2021-Feb-17 18:19 UTC
[Pkg-xen-devel] [BUG] Linux pvh vm not getting destroyed on shutdown
On Samstag, 13. Februar 2021 16:36:24 CET Maximilian Engelhardt wrote:> Here are some things I noticed while trying to debug this issue: > > * It happens on a Debian buster dom0 as well as on a bullseye dom0 > > * It seems to only affect pvh vms. > > * shutdown from the pvgrub menu ("c" -> "halt") does work > > * the vm seems to shut down normal, the last lines in the console are:[...]> > * issuing a reboot instead of a shutdown does work fine. > > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, > Debian kernel 5.7.17-1 does not show the issue. > > * setting vcpus equal to maxvcpus does *not* show the hang.One thing I just realized I totally forgot to mention in my initial report is that this issue is present for us also on a modern kernel. We tested with Debian kernel 5.10.13-1. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210217/8331dd6b/attachment.sig>