Marc MERLIN
2023-Dec-02 17:13 UTC
[Nouveau] Thinkpad P17 gen 2 kernel 6.4 and 6.6 lack of support for nvidia GA104GLM [RTX A5000 Mobile] and missing module firmware
Howdy, I'm trying a Thnkpad P17 gen2, the last thinkpad that still comes in 17" 4K (newer ones are 16" only, so I'm looking for other worthwhile linux laptops with 17" or bigger LCD that also does 4K, the alienware I saw was 18" but not 4K) Unfortunately I seem to need the nouveau driver to turn off the nvidia chip I don't plan on using (intel graphics is fine for me), and bios only allows 'bybrid' or nvidia only) On my P73, nouveau never really worked in the 3 years I've had it, but it could at least turn off the nvidia chip. On P17gen2 it does not seem to be able to do so. Firmware is missing even from the latest firmware-linux-nonfree or from upstream git https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git sauron:~# update-initramfs -v -c -k `uname -r` 2>&1 |grep W: W: Possible missing firmware /lib/firmware/nvidia/ga107/acr/ucode_ahesasc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/acr/ucode_ahesasc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/acr/ucode_ahesasc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/acr/ucode_ahesasc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/acr/ucode_asb.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/acr/ucode_asb.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/acr/ucode_asb.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/acr/ucode_asb.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/acr/ucode_unload.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/acr/ucode_unload.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/acr/ucode_unload.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/acr/ucode_unload.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/nvdec/scrubber.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/nvdec/scrubber.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/nvdec/scrubber.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/nvdec/scrubber.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/sec2/hs_bl_sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/sec2/sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/sec2/image.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga107/sec2/desc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/sec2/hs_bl_sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/sec2/sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/sec2/image.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga106/sec2/desc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/sec2/hs_bl_sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/sec2/sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/sec2/image.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga104/sec2/desc.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/sec2/hs_bl_sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/sec2/sig.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/sec2/image.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/ga103/sec2/desc.bin for module nouveau During boot, the nvidia module hangs for 2mn and fails to do any work, including being able to turn off the nvidia chip (which it could do on P73 without otherwise ever being able to use that chip for proper display). I want to turn off the nvidia chip so that I can get multi hour runtime on batteries without some useless chip that is using power for no reason. sauron:~# lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation Tiger Lake-H GT1 [UHD Graphics] (rev 01) 01:00.0 VGA compatible controller: NVIDIA Corporation GA104GLM [RTX A5000 Mobile] (rev a1) Boot looks like this: [ 0.210932] Kernel command line: BOOT_IMAGE=/vmlinuz-6.6.3-amd64-preempt-sysrq-20220227 root=/dev/mapper/cryptroot ro rootflags=subvol=root cryptopts=source=/dev/nvme0n1p7,keyscript=/sbin/cryptgetpw usbcore.autosuspend=1 pcie_aspm=force resume=/dev/dm-1 thinkpad-acpi.brightness_enable=1 acpi_backlight=native nouveau.modset=0 systemd.unified_cgroup_hierarchy=0 [ 3.184525] nouveau: unknown parameter 'modset' ignored [ 3.184800] nouveau: detected PR support, will not use DSM [ 3.184813] nouveau 0000:01:00.0: vgaarb: pci_notify [ 3.184816] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch [ 3.184821] nouveau 0000:01:00.0: enabling device (0000 -> 0003) [ 3.184959] nouveau 0000:01:00.0: NVIDIA GA104 (b74000a1) [ 3.295682] nouveau 0000:01:00.0: bios: version 94.04.51.00.34 [ 64.941381] nouveau 0000:01:00.0: acr: firmware unavailable [ 126.381361] nouveau 0000:01:00.0: sec2: firmware unavailable [ 126.381484] nouveau 0000:01:00.0: enabling bus mastering [ 126.382073] nouveau 0000:01:00.0: fb: 16384 MiB GDDR6 [ 183.695282] nouveau 0000:01:00.0: fb: VPR locked, but no scrubber binary! [ 183.702321] nouveau 0000:01:00.0: DRM: error initialising bo driver, -12 [ 183.709419] nouveau: probe of 0000:01:00.0 failed with error -12 [ 183.711498] nouveau 0000:01:00.0: vgaarb: pci_notify What is the next recommended step? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/
Timur Tabi
2023-Dec-02 18:08 UTC
[Nouveau] Thinkpad P17 gen 2 kernel 6.4 and 6.6 lack of support for nvidia GA104GLM [RTX A5000 Mobile] and missing module firmware
On Sat, 2023-12-02 at 09:13 -0800, Marc MERLIN wrote:> [??? 3.184525] nouveau: unknown parameter 'modset' ignoredFor starters, you misspelled "modeset"
Marc MERLIN
2023-Dec-02 22:14 UTC
[Nouveau] Thinkpad P17 gen 2 kernel 6.4 and 6.6 lack of support for nvidia GA104GLM [RTX A5000 Mobile] and missing module firmware
On Sat, Dec 02, 2023 at 06:08:01PM +0000, Timur Tabi wrote:> On Sat, 2023-12-02 at 09:13 -0800, Marc MERLIN wrote: > > [??? 3.184525] nouveau: unknown parameter 'modset' ignored > > For starters, you misspelled "modeset"That was a previous boot in dmesg where I failed to turn off the module, but I was mostly interested in showing the errors of all the firmware missing and nouveau failing to start, which those logs do show. Separely, both 6.4 and 6.6 are hanging after a few hours of runtime with networking dying or other issues that require reboot See below 6.4:> [55647.774842] vgaarb: client 0x00000000c24cb19e called 'target' > [55647.774852] vgaarb: PCI:0000:00:02.0 ==> 0000:00:02.0 pdev 00000000bfa35d85 > [55647.774854] vgaarb: vgadev 000000008ea0fc7d > [55825.318992] INFO: task NetworkManager:3372 blocked for more than 120 seconds. > [55825.318999] Tainted: G U OE 6.4.9-amd64-preempt-sysrq-20220227 #2 > [55825.319000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [55825.319002] task:NetworkManager state:D stack:0 pid:3372 ppid:1 flags:0x00000002 > [55825.319005] Call Trace: > [55825.319006] <TASK> > [55825.319009] __schedule+0xba5/0xc17 > [55825.319015] schedule+0x95/0xce > [55825.319017] schedule_preempt_disabled+0x15/0x22 > [55825.319020] __mutex_lock.constprop.0+0x18b/0x291 > [55825.319025] nl80211_prepare_wdev_dump+0x8b/0x19f [cfg80211 d0c23c84d531afea8d4a2711c5c3e691cbb9587f] > [55825.319065] nl80211_dump_station+0x49/0x1d0 [cfg80211 d0c23c84d531afea8d4a2711c5c3e691cbb9587f] > [55825.319091] ? __mod_lruvec_page_state+0x4c/0x86 > [55825.319093] ? mod_lruvec_page_state.constprop.0+0x1c/0x2e > [55825.319096] ? __kmalloc_large_node+0xd5/0xfb > [55825.319099] ? __kmalloc_node_track_caller+0x5a/0xad > [55825.319101] ? kmalloc_reserve+0xa7/0xe2 > [55825.319104] ? __alloc_skb+0xe9/0x148 > [55825.319106] netlink_dump+0x143/0x2b2 > [55825.319109] __netlink_dump_start+0x125/0x177 > [55825.319111] genl_family_rcv_msg_dumpit+0xf1/0x110 > [55825.319114] ? poll_freewait+0x72/0x91 > [55825.319117] ? __pfx_genl_start+0x40/0x40 > [55825.319119] ? __pfx_nl80211_dump_station+0x40/0x40 [cfg80211 d0c23c84d531afea8d4a2711c5c3e691cbb9587f] > [55825.319143] ? __pfx_genl_parallel_done+0x40/0x40 > [55825.319146] genl_rcv_msg+0x189/0x1e2 > [55825.319148] ? __pfx_nl80211_dump_station+0x40/0x40 [cfg80211 d0c23c84d531afea8d4a2711c5c3e691cbb9587f] > [55825.319172] ? __pfx_genl_rcv_msg+0x40/0x40 > [55825.319173] netlink_rcv_skb+0x89/0xe3 > [55825.319176] genl_rcv+0x24/0x31 > [55825.319178] netlink_unicast+0x10e/0x1ae > [55825.319180] netlink_sendmsg+0x321/0x361 > [55825.319182] sock_sendmsg_nosec+0x35/0x64 > [55825.319186] ____sys_sendmsg+0x13e/0x1ef > [55825.319188] ___sys_sendmsg+0x76/0xb3 > [55825.319190] ? __fget_light+0x41/0x50 > [55825.319193] ? do_epoll_wait+0x49b/0x4d4 > [55825.319196] ? __pfx_pollwake+0x40/0x40 > [55825.319198] ? __rseq_handle_notify_resume+0x2a0/0x4bd > [55825.319200] ? __fget+0x38/0x47 > [55825.319202] __sys_sendmsg+0x60/0x97 > [55825.319204] do_syscall_64+0x7e/0xa7 > [55825.319208] ? syscall_exit_to_user_mode+0x18/0x27 > [55825.319210] ? __task_pid_nr_ns+0x5f/0x6d > [55825.319213] ? syscall_exit_to_user_mode+0x18/0x27 > [55825.319214] ? do_syscall_64+0x9d/0xa7 > [55825.319216] ? do_syscall_64+0x9d/0xa7 > [55825.319218] ? do_syscall_64+0x9d/0xa7 > [55825.319220] ? do_syscall_64+0x9d/0xa7 > [55825.319222] entry_SYSCALL_64_after_hwframe+0x77/0xe1 > [55825.319224] RIP: 0033:0x7f1fdc79e9bd > [55825.319226] RSP: 002b:00007ffeb6460900 EFLAGS: 00000293 ORIG_RAX: 000000000000002e > [55825.319228] RAX: ffffffffffffffda RBX: 000055e0a9ce1d90 RCX: 00007f1fdc79e9bd > [55825.319229] RDX: 0000000000000000 RSI: 00007ffeb6460950 RDI: 000000000000000b > [55825.319230] RBP: 00007ffeb6460950 R08: 0000000000000000 R09: 0000000000000300 > [55825.319231] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffeb6460a30 > [55825.319232] R13: 00007f1fd0038690 R14: 00007ffeb6460c60 R15: 000055e0aa210400 > [55825.319234] </TASK>6.6.3: [ 443.613095] BTRFS info (device dm-2): scrub: started on devid 1 [ 484.778344] INFO: task kworker/2:1:106 blocked for more than 120 seconds. [ 484.778352] Tainted: G U 6.6.3-amd64-preempt-sysrq-20220227 #4 [ 484.778353] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 484.778354] task:kworker/2:1 state:D stack:0 pid:106 ppid:2 flags:0x00004000 [ 484.778358] Workqueue: ipv6_addrconf addrconf_verify_work [ 484.778365] Call Trace: [ 484.778367] <TASK> [ 484.778369] __schedule+0xba0/0xc05 [ 484.778373] schedule+0x95/0xce [ 484.778375] schedule_preempt_disabled+0x15/0x22 [ 484.778376] __mutex_lock.constprop.0+0x18b/0x291 [ 484.778379] addrconf_verify_work+0xe/0x20 [ 484.778381] process_scheduled_works+0x1da/0x2e0 [ 484.778385] worker_thread+0x1ca/0x224 [ 484.778388] ? __pfx_worker_thread+0x40/0x40 [ 484.778390] kthread+0xe9/0xf4 [ 484.778393] ? __pfx_kthread+0x40/0x40 [ 484.778395] ret_from_fork+0x21/0x38 [ 484.778397] ? __pfx_kthread+0x40/0x40 [ 484.778399] ret_from_fork_asm+0x1b/0x80 [ 484.778403] </TASK> [ 484.778409] INFO: task kworker/4:2:388 blocked for more than 120 seconds. [ 484.778410] Tainted: G U 6.6.3-amd64-preempt-sysrq-20220227 #4 [ 484.778411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 484.778412] task:kworker/4:2 state:D stack:0 pid:388 ppid:2 flags:0x00004000 [ 484.778415] Workqueue: ipv6_addrconf addrconf_verify_work [ 484.778417] Call Trace: [ 484.778418] <TASK> [ 484.778420] __schedule+0xba0/0xc05 [ 484.778422] schedule+0x95/0xce [ 484.778423] schedule_preempt_disabled+0x15/0x22 [ 484.778425] __mutex_lock.constprop.0+0x18b/0x291 [ 484.778427] addrconf_verify_work+0xe/0x20 [ 484.778429] process_scheduled_works+0x1da/0x2e0 [ 484.778431] worker_thread+0x1ca/0x224 [ 484.778433] ? __pfx_worker_thread+0x40/0x40 [ 484.778436] kthread+0xe9/0xf4 [ 484.778437] ? __pfx_kthread+0x40/0x40 [ 484.778439] ret_from_fork+0x21/0x38 [ 484.778440] ? __pfx_kthread+0x40/0x40 [ 484.778442] ret_from_fork_asm+0x1b/0x80 [ 484.778444] </TASK> [ 484.778468] INFO: task NetworkManager:3372 blocked for more than 120 seconds. [ 484.778469] Tainted: G U 6.6.3-amd64-preempt-sysrq-20220227 #4 [ 484.778470] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 484.778471] task:NetworkManager state:D stack:0 pid:3372 ppid:1 flags:0x00000002 [ 484.778473] Call Trace: [ 484.778474] <TASK> [ 484.778475] __schedule+0xba0/0xc05 [ 484.778477] schedule+0x95/0xce [ 484.778478] schedule_preempt_disabled+0x15/0x22 [ 484.778480] __mutex_lock.constprop.0+0x18b/0x291 [ 484.778483] nl80211_prepare_wdev_dump+0x8b/0x19f [cfg80211 443448ea372df5c5c09782a3fb412c115f1aa45a] [ 484.778526] nl80211_dump_station+0x49/0x1d0 [cfg80211 443448ea372df5c5c09782a3fb412c115f1aa45a] [ 484.778555] ? __alloc_pages+0x131/0x1e8 [ 484.778558] ? __mod_lruvec_page_state+0x4c/0x86 [ 484.778561] ? mod_lruvec_page_state.constprop.0+0x1c/0x2e [ 484.778564] ? __kmalloc_large_node+0xd5/0xfb [ 484.778566] ? __alloc_skb+0xad/0x148 [ 484.778569] ? __kmalloc_node_track_caller+0x5b/0xb2 [ 484.778571] ? kmalloc_reserve+0xab/0xe6 [ 484.778573] genl_dumpit+0x32/0x4d [ 484.778576] netlink_dump+0x143/0x2b2 [ 484.778579] __netlink_dump_start+0x145/0x197 [ 484.778583] genl_family_rcv_msg_dumpit+0xa3/0xd1 [ 484.778585] ? __pfx_genl_start+0x40/0x40 [ 484.778586] ? __pfx_genl_dumpit+0x40/0x40 [ 484.778588] ? __pfx_genl_done+0x40/0x40 [ 484.778589] genl_rcv_msg+0x1a0/0x1f2 [ 484.778591] ? __pfx_nl80211_dump_station+0x40/0x40 [cfg80211 443448ea372df5c5c09782a3fb412c115f1aa45a] [ 484.778619] ? __pfx_genl_rcv_msg+0x40/0x40 [ 484.778620] netlink_rcv_skb+0x89/0xe3 [ 484.778623] genl_rcv+0x24/0x31 [ 484.778625] netlink_unicast+0x114/0x1b4 [ 484.778627] netlink_sendmsg+0x321/0x361 [ 484.778630] sock_sendmsg_nosec+0x46/0x70 [ 484.778633] ____sys_sendmsg+0x144/0x1f5 [ 484.778635] ___sys_sendmsg+0x76/0xb3 [ 484.778637] ? __fget+0x38/0x47 [ 484.778640] ? __fget_light+0x41/0x50 [ 484.778642] ? do_epoll_wait+0x49e/0x4d7 [ 484.778645] ? __pfx_pollwake+0x40/0x40 [ 484.778647] ? __fget+0x38/0x47 [ 484.778649] __sys_sendmsg+0x60/0x97 [ 484.778652] do_syscall_64+0x7e/0xa7 [ 484.778655] ? do_syscall_64+0x9d/0xa7 [ 484.778656] ? do_syscall_64+0x9d/0xa7 [ 484.778657] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 484.778661] RIP: 0033:0x7fad667fe9bd [ 484.778663] RSP: 002b:00007ffd9d5a4f00 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 484.778665] RAX: ffffffffffffffda RBX: 000055d7ea4dad90 RCX: 00007fad667fe9bd [ 484.778666] RDX: 0000000000000000 RSI: 00007ffd9d5a4f50 RDI: 000000000000000b [ 484.778667] RBP: 00007ffd9d5a4f50 R08: 0000000000000000 R09: 0000000000000300 [ 484.778669] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd9d5a5030 [ 484.778670] R13: 000055d7ea9f9050 R14: 00007ffd9d5a5260 R15: 000055d7eaa03050 [ 484.778671] </TASK> -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Linux User #330250
2024-Jan-31 20:06 UTC
[Nouveau] Thinkpad P17 gen 2 kernel 6.4 and 6.6 lack of support for nvidia GA104GLM [RTX A5000 Mobile] and missing module firmware
On 12/02/23 Marc MERLIN wrote:> Howdy,Howdy!> I'm trying a Thnkpad P17 gen2, the last thinkpad that still comes in 17" > 4K (newer ones are 16" only, so I'm looking for other worthwhile linux > laptops with 17" or bigger LCD that also does 4K, the alienware I saw > was 18" but not 4K) > > Unfortunately I seem to need the nouveau driver to turn off the nvidia > chip I don't plan on using (intel graphics is fine for me), and bios > only allows 'bybrid' or nvidia only) > On my P73, nouveau never really worked in the 3 years I've had it, but > it could at least turn off the nvidia chip. On P17gen2 it does not seem > to be able to do so.At the moment you'd have to use the proprietary Nvidia driver for graphics support. But there are, and have been for a long time, ways to disable the additional dedicated graphics device completely and save power, which is nice thing on a laptop...> sauron:~# lspci | grep VGA > 00:02.0 VGA compatible controller: Intel Corporation Tiger Lake-H GT1 [UHD Graphics] (rev 01) > 01:00.0 VGA compatible controller: NVIDIA Corporation GA104GLM [RTX A5000 Mobile] (rev a1)Note the Nvidia card's PCI address is 01:00.0...> What is the next recommended step?STEP #1: disable nouveau by blacklisting the module. There's more than one way to do this: * Add it to /etc/modprobe.d/<someconfigfilename>.conf E.g. /etc/modprobe.d/blacklist-nouveau.conf, run in a root shell (if the file doesn't already exist!): echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf * Add a kernel command line parameter: modprobe.blacklist=nouveau How you do this depends on which Linux (distribution) you're running. E.g. GRUB's command line may be used, if GRUB /is/ used, or dracut and so on... STEP #2: you could power down the PCI device (only after you've disabled the driver in step #1). Try it out first by disabling the PCI device you noted above on a running system (as root!), e.g. like this: echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove If that works, you'd do something like adding a new udev rule in e.g. /etc/udev/rules.d/00-remove-nvidia.rules with contents of the sort: ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", ATTR{power/control}="auto", ATTR{remove}="1" Take a full example from the Arch Wiki: https://wiki.archlinux.org/title/Hybrid_graphics Other resouces: https://github.com/bayasdev/nvidia-gpu-off https://unix.stackexchange.com/questions/702774/how-to-disable-pcie-device-at-boot> Thanks, > MarcWelcome! Hope this helps, and I also hope it's not too late. I just saw your posting and thought, better late than never... Linux User #330250