Craig Ringer
2021-Feb-18 03:06 UTC
[Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22"
Hi all I'm trying to get HDMI hotplug working on my Lenovo T15g laptop with Optimus graphics. HDMI works when plugged in at boot, but does not work when hotplugged after boot, or when hot-unplugged then re-plugged. The external display is not detected, its status remains 'disconnected' in sysfs, and the display stays in what looks like DPMS-off state. NOTE: This is a PRELIMINARY problem report and request for advice or comment. I'm on a recent Fedora kernel but still need to try latest mainline + nouveau. I still need to capture detailed debug logs from nouveau, drm, kms, etc. And while writing the report I found an i915 config issue I need to retry without. So this is mostly google-help for others right now. VERSIONS AND DEVICES === Kernel and nouveau version: 5.10.15-200.fc33.x86_64 with the bundled nouveau driver. (I'll try latest mainline soon). Video hardware: * GeForce RTX 2070 SUPER Mobile (PCI ID 10de:1e91) * Intel CometLake-H GT2 (PCI ID 8086:9bc4) Laptop: Lenovo T15g. DMI identifies it as: LENOVO 20URCTO1WW/20URCTO1WW, BIOS N30ET33W (1.16 ) 12/17/2020 I believe this is a muxless design with the external outputs under control of the NVidia card, as the Intel card only has one output in /sys/bus/drm/card0/ and the external display doesn't work (even when attached at boot) if I blacklist the nouveau module. BEHAVIOUR =========== An external HDMI display is only detected and used if it's attached before boot. If hotplugged later instead it isn't detected and DRM: Dropped ACPI reprobe event due to RPM error: -22 is printed to dmesg. "RPM error -22" is -EINVAL. AFAICS this is probably coming from the rpm_resume() function [1] as called by __pm_runtime_resume() by pm_runtime_get() by nouveau_display_acpi_ntfy() [2]. I haven't tracked it down further yet - I'll do some perf probing and report back in a followup post. IIRC (need to repeat and verify) once hot-unplugged, the display won't re-detect, even if it was connected at boot. Connecting it while the machine is in S3 sleep doesn't help, it still doesn't get (re)detected on resume. echo 'detect' > card1-HDMI-A-1/status has no apparent effect - no message is printed to dmesg (default log level) and the monitor isn't detected. TAINTED KERNEL =========== While collecting info for this report, I noticed that I am still running with some non-default i915 options from my old (non-hybrid-graphics) laptop. I'll have to reboot without those to verify these i915 options aren't the cause: [ 3.403694] Setting dangerous option enable_guc - tainting kernel [ 3.404506] Setting dangerous option enable_fbc - tainting kernel [ 3.405306] Setting dangerous option enable_dc - tainting kernel I'll be sure to update once I disable these, but I'll post now. If nothing else, it might help someone else. NOUVEAU TIMEOUTS IN DMESG =========== I also noticed some nouveau related output in the kernel logs - I think from the first suspend, or possibly the first HDMI unplug. I'll need to verify this later. There are also some xhci_hcd messages that may or may not be relevant. I'll include longer excerpts at the end of the post but the basics are: [25877.621114] nouveau 0000:01:00.0: timeout [25877.621289] WARNING: CPU: 14 PID: 73556 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau] ... [25877.621631] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau] ... [25877.621680] Call Trace: [25877.621754] gm200_acr_hsfw_boot+0xc3/0x160 [nouveau] [25877.621782] ? mutex_lock+0xe/0x30 [25877.621849] nvkm_acr_hsf_boot+0x85/0xe0 [nouveau] [25877.621916] nvkm_acr_fini+0x25/0x30 [nouveau] [25877.621984] nvkm_subdev_fini+0x59/0xb0 [nouveau] [25877.622100] nvkm_device_fini+0x79/0x110 [nouveau] [25877.622215] nvkm_udevice_fini+0x47/0x60 [nouveau] [25877.622277] nvkm_object_fini+0xbc/0x150 [nouveau] [25877.622343] nvkm_object_fini+0x73/0x150 [nouveau] [25877.622464] nouveau_do_suspend+0x107/0x180 [nouveau] [25877.622583] nouveau_pmops_runtime_suspend+0x3b/0xb0 [nouveau] [25877.622597] pci_pm_runtime_suspend+0x5e/0x170 ... then [25877.622741] nouveau 0000:01:00.0: acr: unload binary failed [25877.946511] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at 00000000000bd000 engine c0 [BAR2] client 07 [HUB/HOST_CPU] reason 0d [REGION_VIOLATION] on channel -1 [01ffedf000 unknown] [25913.829849] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 00000000004df000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [01ffedf000 unknown] then [25913.930365] nouveau 0000:01:00.0: timeout [25913.930426] WARNING: CPU: 5 PID: 2395 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau] ... [25913.930511] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau] ... [25913.930523] Call Trace: [25913.930540] gm200_acr_hsfw_boot+0xc3/0x160 [nouveau] [25913.930543] ? mutex_lock+0xe/0x30 [25913.930558] nvkm_acr_hsf_boot+0x85/0xe0 [nouveau] [25913.930573] tu102_acr_init+0x15/0x30 [nouveau] [25913.930587] nvkm_acr_load+0x2b/0xd0 [nouveau] [25913.930589] ? ktime_get+0x38/0xa0 [25913.930603] nvkm_subdev_init+0x92/0xd0 [nouveau] [25913.930604] ? ktime_get+0x38/0xa0 [25913.930629] nvkm_device_init+0x10b/0x190 [nouveau] [25913.930656] nvkm_udevice_init+0x41/0x60 [nouveau] [25913.930676] nvkm_object_init+0x3e/0x100 [nouveau] [25913.930690] nvkm_object_init+0x6f/0x100 [nouveau] [25913.930703] nvkm_object_init+0x6f/0x100 [nouveau] [25913.930729] nouveau_do_resume+0x2b/0xc0 [nouveau] [25913.930755] nouveau_pmops_runtime_resume+0x7a/0x150 [nouveau] [25913.930760] pci_pm_runtime_resume+0xaa/0xc0 [...] [25913.930806] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [...] [25913.930820] nouveau 0000:01:00.0: acr: AHESASC binary failed [25913.930821] nouveau 0000:01:00.0: acr: init failed, -110 [25913.930958] nouveau 0000:01:00.0: init failed with -110 [25913.930959] nouveau: systemd-logind[1510]:00000000:00000080: init failed with -110 [25913.930960] nouveau: DRM-master:00000000:00000000: init failed with -110 [25913.930961] nouveau: DRM-master:00000000:00000000: init failed with -110 [25913.930963] nouveau 0000:01:00.0: DRM: Client resume failed with error: -110 [25913.930963] nouveau 0000:01:00.0: DRM: resume failed with: -110 I'll do some poking around with perf, capture some ACPI state and verbose nouveau + drm kernel logs for both attached-at-boot and detached-at-boot cases, etc, then post a big diagnostics bundle in a bit. But I thought I'd keep this initial report short-ish. I'll include some basic diag info below though. URL REFERENCES =========== URLs referenced: [1] https://github.com/torvalds/linux/blob/521b619acdc8f1f5acdac15b84f81fd9515b2aff/drivers/base/power/runtime.c#L702 [2] https://github.com/torvalds/linux/blob/93b694d096cc10994c817730d4d50288f9ae3d66/drivers/gpu/drm/nouveau/nouveau_display.c#L530 BASIC DIAGNOSTICS =========== Basic diagnostics, when display physically connected (DVI-D -> HDMI) but not detected by nouveau: $ ls /sys/class/drm card0 card0-eDP-1 card1 card1-DP-1 card1-DP-2 card1-DP-3 card1-eDP-2 card1-HDMI-A-1 renderD128 renderD129 ttm version $ for f in */status; do printf "%s: %s\n" "$f" "$(cat $f)"; done card0-eDP-1/status: connected card1-DP-1/status: disconnected card1-DP-2/status: disconnected card1-DP-3/status: disconnected card1-eDP-2/status: disconnected card1-HDMI-A-1/status: disconnected $ dmesg | tail -n 2 [42147.075025] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to RPM error: -22 [42151.153559] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to RPM error: -22 # for p in /sys/module/nouveau/parameters/*; do printf "%s: %s\n" "$(basename $p)" "$(cat $p)"; done [sudo] password for craig: atomic: 0 config: (null) debug: (null) duallink: 1 fbcon_bpp: 0 hdmimhz: 0 ignorelid: 0 modeset: -1 mst: 1 noaccel: 0 nofbaccel: 0 runpm: -1 tv_disable: 0 tv_norm: (null) vram_pushbuf: 0 $ cat /proc/cmdline BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.10.15-200.fc33.x86_64 [SNIP root dev args] libata.allow_tpm=on systemd.unified_cgroup_hierarchy=0 rhgb $ sudo lspci -vvnnqPP -d 10de:1e91 00:01.0/01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q] [10de:1e91] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:22c3] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- .... Kernel driver in use: nouveau Kernel modules: nouveau # dmidecode ... Processor Information ... Version: Intel(R) Core(TM) i9-10980HK CPU @ 2.40GHz ... BIOS Information Vendor: LENOVO Version: N30ET33W (1.16 ) Release Date: 12/17/2020 ... BIOS Revision: 1.16 Firmware Revision: 1.12 ... Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Hdmi1 External Connector Type: Other Port Type: Video Port System Information Manufacturer: LENOVO Product Name: 20URCTO1WW Version: ThinkPad T15g Gen 1 [snip serial number and uuid] SKU Number: LENOVO_MT_20UR_BU_Think_FM_ThinkPad T15g Gen 1 Family: ThinkPad T15g Gen 1 I'll attach a detailed lspci, bigger excerpts from demesg, etc in a followup to make sure I don't upset any mail filter. -- Craig Ringer -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20210218/d5f2e341/attachment-0001.htm>
Craig Ringer
2021-Feb-18 03:40 UTC
[Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22"
On Thu, 18 Feb 2021 at 11:06, Craig Ringer <ringerc at ringerc.id.au> wrote:> Hi all > > I'm trying to get HDMI hotplug working on my Lenovo T15g laptop with > Optimus graphics. HDMI works when plugged in at boot, but does not work > when hotplugged after boot, or when hot-unplugged then re-plugged. The > external display is not detected, its status remains 'disconnected' in > sysfs, and the display stays in what looks like DPMS-off state. > [snip] > I'll attach a detailed lspci, bigger excerpts from demesg, etc in a > followup to make sure I don't upset any mail filter. >Detailed PCI info and trimmed dmesg uploaded to a GDrive since I don't really want to send all that to the whole list. Files here: https://drive.google.com/drive/folders/1oE3ow7d8N6npDNbL8vqHYjbAvvJCUcPN?usp=sharing Contains: $ sudo lspci -vvvvnnnnPPq | sed '/Device Serial Number/d' > pci.list $ sudo dmesg | egrep -v 'e1000e|nvme|BTRFS|audit:|SELinux:|systemd\[1\]|thunderbolt|cfg80211|WiFi|Bluetooth|battery|iwlwifi|uvcvideo|usbcore|zram|iTCO_wdt|snd_hda_intel|squashfs|EXT4-fs|iSCSI|wlp0s20f3|IPv6|\<bridge\>|\<tun\>|virbr0|rfkill|nf_conntrack|psmouse' | sed 's/SerialNumber.*$/SerialNumber REDACTED/' > dmesg.out $ sudo dmidecode | sed '/Serial Number:/d;/Asset Tag/d' > dmi.dump $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch 0:DIS: :DynOff:0000:01:00.0 1:IGD:+:Pwr:0000:00:02.0 2:DIS-Audio: :DynOff:0000:01:00.1 I can provide raw or decompiled ACPI DSDT and SSDTs on request, as well as kernel logs with higher log levels, a nouveau module debug string, info from /sys/kernel/debug, 'perf' runs, etc. There's also some nouveau info in /sys/kernel/debug/dri/1 . Also, I note that echo 'on' > /sys/kernel/debug/dri/1/HDMI-A-1/force does not appear to have any effect when the display is plugged in and turned on, but not being detected by nouveau. Also true for other ports DP-1, DP-2, DP-3, eDP-2. -- Craig Ringer -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20210218/23f61020/attachment-0001.htm>