Chuck Zmudzinski
2022-Mar-07 14:59 UTC
[Pkg-xen-devel] Bug#988333: libxenmisc4.16: libxl fails to grant necessary I/O memory access for gfx_passthru of Intel IGD
The bug's title is re-named to focus it on a single problem that needs to be fixed. The bug is marked as affecting the Linux kernel because it causes the i915.ko module to crash in some configurations. The bug defined as failure to grant the necessary I/O memory access to a Linux HVM domU for gfx_passthru of the Intel IGD is fixed by the patch at the end of this message, so added tag patch. With that patch applied, passthrough of a Haswell Intel IGD to a bullseye HVM domU works as expected for the case when the qemu-xen-traditional device model is used. However, this alone does not provide the feature of passthrough of the Intel IGD to Linux on Debian because Debian does not provide the traditional Qemu device model, but source code is available for it from the upstream Xen project and it can be built for Debian as it was when Wheezy was released. So please note: The fact that this patch can only be verified by testing with the Qemu traditional device model which is a piece of software available from the upstream Xen project but not provided by the Debian project, makes this patch as a fix to this bug somewhat difficult to reproduce and verify on Debian until after the Qemu traditional device model is custom-built for Debian. Added tag moreinfo because a complete solution to this problem requires further investigations about why fixing this bug does not prevent a Call Trace and failure to boot when using the Qemu upstream device model instead of the Qemu traditional device model. Most likely there is another distinct bug yet to be clearly identified and defined. Summary of my recent tests with a Haswell Intel IGD: Working configurations: 1. Sid/Xen-4.16 with the patch at the end of this message/Qemu-xen-traditional as dom0 and a bullseye HVM. 2. Bullseye/Xen-4.14 with the patch at the end of this message adapted for Xen-4.14/Qemu-xen-traditional as dom0 and a bullseye HVM. In both these cases, the only binary package that needs to be installed is libxenmisc4.14 if Bullseye is the dom0 and libxenmisc-4.16 if Sid is the dom0. Broken configuration (presumably caused by a yet to be identified bug): 1. Sid/Xen-4.16 with the patch at the end of this message/Debian's qemu-6.2 as dom0 and a bullseye HVM. It behaves similarly to the original bug report - there is a very slow booting process which never completes, a message is displayed on the dom0 console after a while that states that the IRQ #16 is being disabled, and there is a Call Trace in the dmesg of the dom0: [? 842.446490] Call Trace: [? 842.446496]? <IRQ> [? 842.446503]? dump_stack_lvl+0x48/0x5e [? 842.446517]? __report_bad_irq+0x35/0xa7 [? 842.446530]? note_interrupt.cold+0xb/0x61 [? 842.446540]? handle_irq_event+0xa3/0xb0 [? 842.446551]? handle_fasteoi_irq+0x90/0x1e0 [? 842.446562]? handle_irq_desc+0x36/0x40 [? 842.446569]? __evtchn_fifo_handle_events+0x195/0x1b0 [? 842.446582]? __xen_evtchn_do_upcall+0x72/0xc0 [? 842.446595]? __xen_pv_evtchn_do_upcall+0x39/0x60 [? 842.446606]? xen_pv_evtchn_do_upcall+0xd7/0x100 [? 842.446619]? </IRQ> [? 842.446622]? <TASK> [? 842.446625]? exc_xen_hypervisor_callback+0x8/0x10 [? 842.446638] RIP: e030:xen_hypercall_sched_op+0xa/0x20 [? 842.446651] Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [? 842.446657] RSP: e02b:ffffffff82803d58 EFLAGS: 00000246 [? 842.446665] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8193a3aa [? 842.446669] RDX: ffffffff82819940 RSI: 0000000000000000 RDI: 0000000000000001 [? 842.446673] RBP: ffffffff82819940 R08: 00000066a1713428 R09: 0000000000000000 [? 842.446677] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 [? 842.446681] R13: 0000000000000000 R14: ffffffff82819110 R15: 0000000000000000 [? 842.446687]? ? xen_hypercall_sched_op+0xa/0x20 [? 842.446701]? ? xen_safe_halt+0xc/0x20 [? 842.446710]? ? default_idle+0xa/0x10 [? 842.446717]? ? default_idle_call+0x33/0xe0 [? 842.446724]? ? do_idle+0x215/0x2a0 [? 842.446732]? ? cpu_startup_entry+0x19/0x20 [? 842.446738]? ? start_kernel+0x6b7/0x6dc [? 842.446750]? ? xen_start_kernel+0x6a4/0x6b1 [? 842.446762]? ? startup_xen+0x3e/0x3e [? 842.446773]? </TASK> [? 842.446776] handlers: [? 842.446784] [<0000000074c02061>] usb_hcd_irq [usbcore] [? 842.446843] [<00000000c81c8287>] ath_isr [ath9k] [? 842.446870] Disabling IRQ #16 Not tested: Bullseye/Xen-4.14 with the patch at the end of this message adapted for Xen 4.14/Debian's Qemu 5.2 - no need to test this until Sid is working as the dom0 with Debian's Qemu 6.2 for Sid and Intel IGD passthrough to a Bullseye HVM domU. More information is needed to determine the exact nature of the bug that causes the Call Trace listed above that occurs with Qemu 6.2 and Xen 4.16 on Sid, but not with the traditional Qemu device model on Sid. Most likely it will be a bug related to this bug. I will try to investigate the cause of this Call Trace by comparing the code in Qemu 6.2 with the code in Qemu xen-traditional provided by the Xen project, and I also may take a look at or try the upstream qemu-xen build that is provided by the upstream Xen project. I will also try again to contact the Xen users/developers on the Xen mailing lists and see if they can provide some insight. I will provide a detailed description of how I developed the patch that fixes the bug and enables the passthrough feature of the Intel IGD to a Bullseye HVM when using the Qemu traditional device model in a subsequent message. Here is the patch for the current Xen 4.16 package (version 4.16.0+51-g0941d6cb-1) for Sid: --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -2502,6 +2502,7 @@ ???? for (i = 0 ; i < d_config->num_pcidevs ; i++) { ???????? uint64_t vga_iomem_start = 0xa0000 >> XC_PAGE_SHIFT; +??????? uint64_t vga_iomem2_start = 0xcc490; /* Probably IRQ data, nr = 0x2 */ ???????? uint32_t stubdom_domid; ???????? libxl_device_pci *pci = &d_config->pcidevs[i]; ???????? unsigned long pci_device_class; @@ -2531,6 +2532,25 @@ ?????????????????? domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1)); ???????????? return ret; ???????? } +??????? ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid, +???????????????????????????????????????? vga_iomem2_start, 0x2, 1); +??????? if (ret < 0) { +??????????? LOGED(ERROR, domid, +????????????????? "failed to give stubdom%d access to iomem range " +????????????????? "%"PRIx64"-%"PRIx64" for VGA passthru", +????????????????? stubdom_domid, +????????????????? vga_iomem2_start, (vga_iomem2_start + 0x2 - 1)); +??????????? return ret; +??????? } +??????? ret = xc_domain_iomem_permission(CTX->xch, domid, +???????????????????????????????????????? vga_iomem2_start, 0x2, 1); +??????? if (ret < 0) { +??????????? LOGED(ERROR, domid, +????????????????? "failed to give dom%d access to iomem range " +????????????????? "%"PRIx64"-%"PRIx64" for VGA passthru", +????????????????? domid, vga_iomem2_start, (vga_iomem2_start + 0x2 - 1)); +??????????? return ret; +??????? } ???????? break; ???? }