Hello, After updating Sid this morning, which included an upgrade to Linux kernel version 5.17.x from 5.16.x, the system will not boot into Dom0 on Xen. Package versions: Xen hypervisor: xen-hypervisor-4.16-amd64 Debian version 16.0+51-g0941d6cb-1+b1 Linux kernel in Dom0: linux-image-5.17.0-1-amd64 Debian version 5.17.3-1 The system boots into Linux 5.17.3-1 fine on the bare metal, but not as Dom0 on the Xen 4.16 hypervisor Debian version 16.0+51-g0941d6cb-1+b1 The system never shows boot messages or a login prompt, but the monitor stays powered on but with no text or image at all. It seems to just hang indefinitely. Downgrading to the latest 5.16 kernel on Xen 4.16 Debian version 16.0+51-g0941d6cb-1+b1 works normally. Anybody else see this problem? Regards, Chuck Zmudzinski
On Sat, Apr 23, 2022 at 12:42:36PM -0400, Chuck Zmudzinski wrote:> Hello, > > After updating Sid this morning, which included an upgrade to Linux kernel > version 5.17.x from 5.16.x, the system will not boot into Dom0 on Xen. > > Package versions: > > Xen hypervisor: xen-hypervisor-4.16-amd64 Debian version > 16.0+51-g0941d6cb-1+b1 > Linux kernel in Dom0: linux-image-5.17.0-1-amd64 Debian version 5.17.3-1 > > The system boots into Linux 5.17.3-1 fine on the bare metal, but not as Dom0 > on the Xen 4.16 hypervisor Debian version 16.0+51-g0941d6cb-1+b1 > > The system never shows boot messages or a login prompt, but the monitor > stays powered on but with no text or image at all. It seems to just hang > indefinitely. Downgrading to the latest 5.16 kernel on Xen 4.16 Debian > version 16.0+51-g0941d6cb-1+b1 works normally. Anybody else see this > problem?We have similar issue in Qubes OS: https://github.com/QubesOS/qubes-issues/issues/7462. Are you running this on AMD? -- Best Regards, Marek Marczykowski-G?recki Invisible Things Lab -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20220424/cb1ad98a/attachment.sig>
On 4/23/22 6:03 PM, Marek Marczykowski-G?recki wrote:> On Sat, Apr 23, 2022 at 12:42:36PM -0400, Chuck Zmudzinski wrote: >> Hello, >> >> After updating Sid this morning, which included an upgrade to Linux kernel >> version 5.17.x from 5.16.x, the system will not boot into Dom0 on Xen. >> >> Package versions: >> >> Xen hypervisor: xen-hypervisor-4.16-amd64 Debian version >> 16.0+51-g0941d6cb-1+b1 >> Linux kernel in Dom0: linux-image-5.17.0-1-amd64 Debian version 5.17.3-1 >> >> The system boots into Linux 5.17.3-1 fine on the bare metal, but not as Dom0 >> on the Xen 4.16 hypervisor Debian version 16.0+51-g0941d6cb-1+b1 >> >> The system never shows boot messages or a login prompt, but the monitor >> stays powered on but with no text or image at all. It seems to just hang >> indefinitely. Downgrading to the latest 5.16 kernel on Xen 4.16 Debian >> version 16.0+51-g0941d6cb-1+b1 works normally. Anybody else see this >> problem? > We have similar issue in Qubes OS: > https://github.com/QubesOS/qubes-issues/issues/7462. Are you running > this on AMD? >No, mine is an Intel Haswell Intel core i5 (4th gen), but Diederik reported his Broadwell Intel Xeons (5th gen) work OK with 5.17 as Dom0 on Xen 4.16. I was able to see the cause of the failure on my Intel Haswell processor from the systemd journal by looking at the logs of the failed boot after a successful boot: Apr 23 08:44:59 debian kernel: [??? 3.876169] i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_init+0xb6/0x2e0 [i915] After that, I see entries in the kern.log for just under a second, and then it stops and hangs. It looks like the add_taint_for_CI function for the Intel i915 GPU driver halts the CPU. That explains the system hang, and just as on your AMD system on Qubes, the backlight stays on. I recover by hitting the reset button to reboot a 5.16 or less Dom0 kernel on Xen, or the 5.17 kernel on the bare metal. It looks like my older Intel hardware is doing something the Linux CI people don't like. Perhaps that is also happening on your AMD system. I spent some time reading the Linux git logs that might be causing this during the development of Linux 5.17, and I came up with the last two commits to arch/x86/xen/vga.c in the Linux source, which are in 5.17 but not in 5.16. They are commits that involve the initialization of the Dom0 console, and I tried a build of 5.17 without those last two commits, but it did not help. I found a big drm-next merge (committed to Linux on Jan 10, 2022) in the kernel git logs where the problem appeared by building and testing the kernel before and after the merge, but it has over a thousand commits in the merge so it is not so easy to figure out what is causing this on my system. The link to the merge: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d0749b4f83bf4768ceae45ee6a79e6e7eddfc2a Next I plan to test a version of 5.17 that prints some debugging information from the intel_gt_init function where my system hangs with 5.17 as a Dom0 on Xen. Are you seeing any taint_for_CI calls in the kernel logs on your AMD hardware?
On 4/23/22 6:03 PM, Marek Marczykowski-G?recki wrote:> On Sat, Apr 23, 2022 at 12:42:36PM -0400, Chuck Zmudzinski wrote: >> Hello, >> >> After updating Sid this morning, which included an upgrade to Linux kernel >> version 5.17.x from 5.16.x, the system will not boot into Dom0 on Xen. >> >> Package versions: >> >> Xen hypervisor: xen-hypervisor-4.16-amd64 Debian version >> 16.0+51-g0941d6cb-1+b1 >> Linux kernel in Dom0: linux-image-5.17.0-1-amd64 Debian version 5.17.3-1 >> >> The system boots into Linux 5.17.3-1 fine on the bare metal, but not as Dom0 >> on the Xen 4.16 hypervisor Debian version 16.0+51-g0941d6cb-1+b1 >> >> The system never shows boot messages or a login prompt, but the monitor >> stays powered on but with no text or image at all. It seems to just hang >> indefinitely. Downgrading to the latest 5.16 kernel on Xen 4.16 Debian >> version 16.0+51-g0941d6cb-1+b1 works normally. Anybody else see this >> problem? > We have similar issue in Qubes OS: > https://github.com/QubesOS/qubes-issues/issues/7462. Are you running > this on AMD? >Hi Marek, I found a fix for my Intel system. The offending commit which is in Linux 5.17 but not 5.16: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdd8b6c98239cad3a976d6f197afc2c794d3cef8 Jan Beulich posted a patch to fix this, and I tested it and it fixed my system, but it is not yet committed to Linux: https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f88b0 at suse.com/ Although the offending commit only affects Intel drivers, the fix is in the underlying pat_enabled() function which provides a false negative when device drivers test for the X86 PAT feature if Linux is running on the Xen hypervisor. If one of your AMD device drivers is also relying on pat_enabled() to test for the X86 PAT feature, Jan's patch might fix the problem on your AMD system. Best regards, Chuck