Lyude Paul
2024-Apr-18 20:27 UTC
Issues with trying to boot falcons from sgt memory + Possible firmware SG_DEBUG fix?
So - first some context here for Ben and anyone else who hasn't been following. A little while ago I got a Slimbook Executive 16 with a Nvidia RTX 4060 in it, and I've unfortunately been running into a kind of annoying issue. Currently this laptop only has 16 gigs of ram, and as it turns out - this can easily lead the system to having pretty heavy memory fragmentation once it starts swapping pages out. Normally this wouldn't matter, but I unfortunately discovered that when we're runtime suspending the GPU in Nouveau - we actually appear to allocate some of the memory we use for migrating using dma_alloc_coherent. This starts to fail on my system once memory fragmentation goes up like so: kworker/18:0: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 18 PID: 287012 Comm: kworker/18:0 Not tainted 6.8.4-200.ChopperV1.fc39.x86_64 #1 Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x47/0x60 warn_alloc+0x165/0x1e0 ? __alloc_pages_direct_compact+0x1ad/0x2b0 __alloc_pages_slowpath.constprop.0+0xd7d/0xde0 __alloc_pages+0x32d/0x350 __dma_direct_alloc_pages.isra.0+0x16a/0x2b0 dma_direct_alloc+0x70/0x280 nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau] r535_gsp_fini+0x1d4/0x350 [nouveau] nvkm_subdev_fini+0x67/0x150 [nouveau] nvkm_device_fini+0x95/0x1e0 [nouveau] nvkm_udevice_fini+0x53/0x70 [nouveau] nvkm_object_fini+0xb9/0x240 [nouveau] nvkm_object_fini+0x75/0x240 [nouveau] nouveau_do_suspend+0xf5/0x280 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1e0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x41/0x170 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x5d/0x70 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_suspend+0x120/0x6a0 pm_runtime_work+0x98/0xb0 process_one_work+0x171/0x340 worker_thread+0x27b/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 nouveau 0000:01:00.0: gsp: suspend failed, -12 nouveau: DRM-master:00000000:00000080: suspend failed with -12 nouveau 0000:01:00.0: can't suspend (nouveau_pmops_runtime_suspend [nouveau] returned -12) Keep in mind, I don't dive into memory management related stuff like this very often! But I'd very much like to know how to help out anywhere around the driver, including outside of my usual domains, so I've been trying to write up a patch for this. The original suggestion for a fix that Dave Airlie had given me was (unless I misunderstood, which isn't unlikely) to try to see if we could get nvkm_gsp_mem_ctor() to start allocating memory with vmalloc() and map that onto the GPU using the SG helpers instead. So - I gave a shot at writing up a patch for doing that: https://gitlab.freedesktop.org/lyudess/linux/-/commit/b5a41ac2bd948979815d262d8d20b4f3333f9c26 As you can probably guess - the patch does not really seem to work, and I've been trying to figure out why. There's already a couple of issues I'm aware of: the most glaring one being that as Timur pointed out, a lot of GSP hardware expects contiguous memory allocations - but according to them the allocation that's specifically failing should be small enough that it'd be allocated in a contiguous page anyway: [ 9.429884] Lyude:r535_gsp_init:2186: (mbox1) == 0 [ 9.429898] Lyude:r535_gsp_init:2186: (mbox0) == dbdfe000 [ 9.491300] ------------[ cut here ]------------ [ 9.491308] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1713 r535_gsp_init+0x75e/0x7c0 [nouveau] [ 9.491533] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 9.491670] CPU: 5 PID: 921 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #22 [ 9.491681] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 [ 9.491690] RIP: 0010:r535_gsp_init+0x75e/0x7c0 [nouveau] [ 9.491885] Code: 8b 83 10 0d 00 00 48 89 ef 41 bf e4 ff ff ff 48 8b 40 18 48 8b 80 48 0f 00 00 48 8b 40 28 e8 b9 5e 89 ee 0f 0b e9 73 f9 ff ff <0f> 0b 41 bf fb ff ff ff e9 5a f9 ff ff 41 89 ef 0f 0b e9 5c f9 ff [ 9.491905] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246 [ 9.491914] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8 [ 9.491922] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388 [ 9.491930] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000 [ 9.491938] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008 [ 9.491946] R13: 000000022b906452 R14: ffffa098e192f008 R15: 0000000000000000 [ 9.491956] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000 [ 9.491966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.491974] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0 [ 9.491989] Call Trace: [ 9.491996] <TASK> [ 9.492002] ? __warn+0x80/0x120 [ 9.492012] ? r535_gsp_init+0x75e/0x7c0 [nouveau] [ 9.492200] ? report_bug+0x164/0x190 [ 9.492211] ? handle_bug+0x3c/0x80 [ 9.492218] ? exc_invalid_op+0x17/0x70 [ 9.492227] ? asm_exc_invalid_op+0x1a/0x20 [ 9.492241] ? r535_gsp_init+0x75e/0x7c0 [nouveau] [ 9.492429] ? r535_gsp_init+0x18e/0x7c0 [nouveau] [ 9.492616] ? srso_return_thunk+0x5/0x5f [ 9.492626] nvkm_subdev_init_+0x48/0x130 [nouveau] [ 9.492802] ? srso_return_thunk+0x5/0x5f [ 9.492810] nvkm_subdev_init+0x44/0x90 [nouveau] [ 9.492988] nvkm_device_init+0x166/0x2e0 [nouveau] [ 9.493189] nvkm_udevice_init+0x47/0x70 [nouveau] [ 9.493391] nvkm_object_init+0x41/0x1c0 [nouveau] [ 9.493567] nvkm_ioctl_new+0x16a/0x290 [nouveau] [ 9.493740] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau] [ 9.493912] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau] [ 9.494121] nvkm_ioctl+0x10e/0x250 [nouveau] [ 9.494288] nvif_object_ctor+0x112/0x190 [nouveau] [ 9.494456] nvif_device_ctor+0x23/0x60 [nouveau] [ 9.494625] nouveau_cli_init+0x164/0x5d0 [nouveau] [ 9.494820] nouveau_drm_device_init+0x97/0xe00 [nouveau] [ 9.495022] ? srso_return_thunk+0x5/0x5f [ 9.495030] ? pci_bus_read_config_word+0x4d/0x90 [ 9.495039] ? srso_return_thunk+0x5/0x5f [ 9.495047] ? pci_update_current_state+0x72/0xb0 [ 9.495059] nouveau_drm_probe+0x12c/0x280 [nouveau] [ 9.495245] ? srso_return_thunk+0x5/0x5f [ 9.495254] local_pci_probe+0x45/0xa0 [ 9.495263] pci_device_probe+0xc7/0x240 [ 9.495272] really_probe+0xd6/0x390 [ 9.495282] ? __pfx___driver_attach+0x10/0x10 [ 9.495290] __driver_probe_device+0x78/0x150 [ 9.495301] driver_probe_device+0x1f/0x90 [ 9.495308] __driver_attach+0xd2/0x1c0 [ 9.495316] bus_for_each_dev+0x88/0xd0 [ 9.495325] bus_add_driver+0x116/0x220 [ 9.495334] driver_register+0x59/0x100 [ 9.495342] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau] [ 9.495512] do_one_initcall+0x5b/0x320 [ 9.495524] do_init_module+0x60/0x240 [ 9.495536] init_module_from_file+0x86/0xc0 [ 9.495550] idempotent_init_module+0x120/0x2b0 [ 9.495562] __x64_sys_finit_module+0x5e/0xb0 [ 9.495571] do_syscall_64+0x88/0x170 [ 9.495581] ? srso_return_thunk+0x5/0x5f [ 9.495589] ? syscall_exit_to_user_mode_prepare+0x15d/0x190 [ 9.495600] ? srso_return_thunk+0x5/0x5f [ 9.495607] ? syscall_exit_to_user_mode+0x60/0x210 [ 9.495615] ? srso_return_thunk+0x5/0x5f [ 9.495622] ? do_syscall_64+0x95/0x170 [ 9.495630] ? srso_return_thunk+0x5/0x5f [ 9.495636] ? do_syscall_64+0x95/0x170 [ 9.495644] ? srso_return_thunk+0x5/0x5f [ 9.495653] entry_SYSCALL_64_after_hwframe+0x71/0x79 [ 9.495663] RIP: 0033:0x7f4de9b2919d [ 9.495680] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48 [ 9.495697] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 9.495707] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d [ 9.495717] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031 [ 9.495726] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0 [ 9.495734] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30 [ 9.495742] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660 [ 9.495754] </TASK> [ 9.495759] ---[ end trace 0000000000000000 ]--- [ 9.495778] ------------[ cut here ]------------ [ 9.495784] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2187 r535_gsp_init+0xc5/0x7c0 [nouveau] [ 9.495981] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 9.496112] CPU: 5 PID: 921 Comm: (udev-worker) Tainted: G W 6.9.0-rc3Lyude-Test+ #22 [ 9.496123] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 [ 9.496132] RIP: 0010:r535_gsp_init+0xc5/0x7c0 [nouveau] [ 9.496317] Code: 24 18 4c 8d 63 08 89 6c 24 14 4c 89 e6 6a 00 4c 8d 44 24 20 48 8d 4c 24 1c e8 b7 c3 fa ff 5f 41 89 c7 85 c0 0f 84 97 00 00 00 <0f> 0b 48 83 bb 20 0a 00 00 00 75 37 48 8b 44 24 20 65 48 2b 04 25 [ 9.496333] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246 [ 9.496341] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8 [ 9.496351] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388 [ 9.496360] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000 [ 9.496368] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008 [ 9.496375] R13: 000000022b906452 R14: ffffa098e192f008 R15: 00000000fffffffb [ 9.496383] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000 [ 9.496393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.496400] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0 [ 9.496410] Call Trace: [ 9.496416] <TASK> [ 9.496422] ? __warn+0x80/0x120 [ 9.496429] ? r535_gsp_init+0xc5/0x7c0 [nouveau] [ 9.496622] ? report_bug+0x164/0x190 [ 9.496631] ? handle_bug+0x3c/0x80 [ 9.496638] ? exc_invalid_op+0x17/0x70 [ 9.496647] ? asm_exc_invalid_op+0x1a/0x20 [ 9.496660] ? r535_gsp_init+0xc5/0x7c0 [nouveau] [ 9.496851] ? r535_gsp_init+0x18e/0x7c0 [nouveau] [ 9.497044] ? srso_return_thunk+0x5/0x5f [ 9.497055] nvkm_subdev_init_+0x48/0x130 [nouveau] [ 9.497227] ? srso_return_thunk+0x5/0x5f [ 9.497236] nvkm_subdev_init+0x44/0x90 [nouveau] [ 9.497405] nvkm_device_init+0x166/0x2e0 [nouveau] [ 9.497608] nvkm_udevice_init+0x47/0x70 [nouveau] [ 9.497808] nvkm_object_init+0x41/0x1c0 [nouveau] [ 9.497983] nvkm_ioctl_new+0x16a/0x290 [nouveau] [ 9.498154] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau] [ 9.498326] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau] [ 9.498531] nvkm_ioctl+0x10e/0x250 [nouveau] [ 9.498702] nvif_object_ctor+0x112/0x190 [nouveau] [ 9.498873] nvif_device_ctor+0x23/0x60 [nouveau] [ 9.499049] nouveau_cli_init+0x164/0x5d0 [nouveau] [ 9.499244] nouveau_drm_device_init+0x97/0xe00 [nouveau] [ 9.499430] ? srso_return_thunk+0x5/0x5f [ 9.499437] ? pci_bus_read_config_word+0x4d/0x90 [ 9.499445] ? srso_return_thunk+0x5/0x5f [ 9.499452] ? pci_update_current_state+0x72/0xb0 [ 9.499461] nouveau_drm_probe+0x12c/0x280 [nouveau] [ 9.499657] ? srso_return_thunk+0x5/0x5f [ 9.499666] local_pci_probe+0x45/0xa0 [ 9.499674] pci_device_probe+0xc7/0x240 [ 9.499683] really_probe+0xd6/0x390 [ 9.499692] ? __pfx___driver_attach+0x10/0x10 [ 9.499699] __driver_probe_device+0x78/0x150 [ 9.499709] driver_probe_device+0x1f/0x90 [ 9.499718] __driver_attach+0xd2/0x1c0 [ 9.499726] bus_for_each_dev+0x88/0xd0 [ 9.499735] bus_add_driver+0x116/0x220 [ 9.499744] driver_register+0x59/0x100 [ 9.499751] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau] [ 9.499915] do_one_initcall+0x5b/0x320 [ 9.499926] do_init_module+0x60/0x240 [ 9.499934] init_module_from_file+0x86/0xc0 [ 9.499948] idempotent_init_module+0x120/0x2b0 [ 9.499962] __x64_sys_finit_module+0x5e/0xb0 [ 9.499971] do_syscall_64+0x88/0x170 [ 9.499987] ? srso_return_thunk+0x5/0x5f [ 9.499996] ? syscall_exit_to_user_mode_prepare+0x15d/0x190 [ 9.500004] ? srso_return_thunk+0x5/0x5f [ 9.500011] ? syscall_exit_to_user_mode+0x60/0x210 [ 9.500019] ? srso_return_thunk+0x5/0x5f [ 9.500026] ? do_syscall_64+0x95/0x170 [ 9.500034] ? srso_return_thunk+0x5/0x5f [ 9.500041] ? do_syscall_64+0x95/0x170 [ 9.500050] ? srso_return_thunk+0x5/0x5f [ 9.500058] entry_SYSCALL_64_after_hwframe+0x71/0x79 [ 9.500067] RIP: 0033:0x7f4de9b2919d [ 9.500075] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48 [ 9.500091] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 9.500103] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d [ 9.500112] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031 [ 9.500121] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0 [ 9.500128] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30 [ 9.500136] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660 [ 9.500149] </TASK> [ 9.500154] ---[ end trace 0000000000000000 ]--- [ 9.500162] nouveau 0000:1f:00.0: gsp: init failed, -5 [ 9.500189] nouveau 0000:1f:00.0: init failed with -5 [ 9.500196] nouveau: DRM-master:00000000:00000080: init failed with -5 [ 9.500207] nouveau 0000:1f:00.0: DRM-master: Device allocation failed: -5 [ 9.502661] nouveau 0000:1f:00.0: probe with driver nouveau failed with error -5 Which brings me to the second part - TImur had me enable CONFIG_SG_DEBUG, which quickly hit a different issue: [ 8.992320] RIP: 0010:sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) [ 8.992331] Code: 71 93 37 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b 94 7d 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e ae 9f 01 eb b2 66 66 2e 0f 1f 84 00 [ 8.992428] Call Trace: [ 8.992433] <TASK> [ 8.992439] ? die (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:421 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:434 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:447) [ 8.992448] ? do_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:114 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:155) [ 8.992455] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) [ 8.992464] ? do_error_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/traps.h:58 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:176) [ 8.992472] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) [ 8.992481] ? exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:267) [ 8.992489] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) [ 8.992496] ? asm_exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/idtentry.h:621) [ 8.992509] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) [ 8.992518] nvkm_firmware_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/firmware.c:249) nouveau [ 8.992722] nvkm_falcon_fw_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c:199) nouveau [ 8.992898] ga102_gsp_booter_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c:62) nouveau [ 8.993095] r535_gsp_oneinit (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2309) nouveau [ 8.993292] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.993302] ? kmem_cache_alloc_lru (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3748 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3827 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3864 (discriminator 2)) [ 8.993311] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.993317] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.993324] ? ktime_get (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:292 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:388 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:848) [ 8.993334] nvkm_subdev_oneinit_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:113) nouveau [ 8.993510] nvkm_subdev_init_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:139) nouveau [ 8.993685] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.993693] nvkm_subdev_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:170) nouveau [ 8.993867] nvkm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c:3023) nouveau [ 8.994079] nvkm_udevice_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:295) nouveau [ 8.994281] nvkm_object_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/object.c:245) nouveau [ 8.994457] nvkm_ioctl_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:149) nouveau [ 8.994630] ? __pfx_nvkm_client_child_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/client.c:125) nouveau [ 8.994803] ? __pfx_nvkm_udevice_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:386) nouveau [ 8.995013] nvkm_ioctl (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:354 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:376) nouveau [ 8.995187] nvif_object_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:298 (discriminator 1)) nouveau [ 8.995356] nvif_device_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/device.c:56) nouveau [ 8.995524] nouveau_cli_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:270) nouveau [ 8.995721] nouveau_drm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:602) nouveau [ 8.995915] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.995923] ? pci_bus_read_config_word (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/access.c:67 (discriminator 1)) [ 8.995932] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.995939] ? pci_update_current_state (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1195 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1187) [ 8.995949] nouveau_drm_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:841) nouveau [ 8.996145] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.996154] local_pci_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:325) [ 8.996163] pci_device_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:392 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:417 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:451 (discriminator 1)) [ 8.996174] really_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:578 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:656) [ 8.996185] ? __pfx___driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1155) [ 8.996192] __driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:798) [ 8.996201] driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:828) [ 8.996209] __driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1215) [ 8.996217] bus_for_each_dev (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:368) [ 8.996228] bus_add_driver (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:673) [ 8.996238] driver_register (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/driver.c:246) [ 8.996246] ? __pfx_nouveau_drm_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:32) nouveau [ 8.996415] do_one_initcall (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/init/main.c:1238) [ 8.996428] do_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:2538) [ 8.996437] init_module_from_file (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3168) [ 8.996450] idempotent_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/spinlock.h:351 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3131 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3185) [ 8.996462] __x64_sys_finit_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/file.h:47 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3207 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189) [ 8.996473] do_syscall_64 (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:52 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:83 (discriminator 1)) [ 8.996482] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) [ 8.996490] entry_SYSCALL_64_after_hwframe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/entry_64.S:129) [ 8.996499] RIP: 0033:0x7fd12f52919d I think timur actually mentioned this bug to you previously, but in hopes of getting something more useful out of SG_DEBUG I dug into this problem a bit and ended up with what I believe is an actually correct patch: https://gitlab.freedesktop.org/lyudess/linux/-/commit/485f1fb62ddd4b42b60848eeb48206fef4376161 ...unfortunately, fixing that issue on my system did not get SG_DEBUG to give me any useful info. Anyway - that brings me to ask 1: do you have any idea what might be going on with the falcon boot issue I mentioned, or if I might just be doing something wrong/silly with how I'm setting up memory in nvkm_gsp_mem_ctor()? And 2: if you have the time does that patch look correct? I'm happy to submit it :) Also 3: welcome back again :)
David Airlie
2024-Apr-18 22:14 UTC
Issues with trying to boot falcons from sgt memory + Possible firmware SG_DEBUG fix?
On Fri, Apr 19, 2024 at 6:27?AM Lyude Paul <lyude at redhat.com> wrote:> > So - first some context here for Ben and anyone else who hasn't been > following. A little while ago I got a Slimbook Executive 16 with a > Nvidia RTX 4060 in it, and I've unfortunately been running into a kind > of annoying issue. Currently this laptop only has 16 gigs of ram, and > as it turns out - this can easily lead the system to having pretty > heavy memory fragmentation once it starts swapping pages out. > > Normally this wouldn't matter, but I unfortunately discovered that when > we're runtime suspending the GPU in Nouveau - we actually appear to > allocate some of the memory we use for migrating using > dma_alloc_coherent. This starts to fail on my system once memory > fragmentation goes up like so: > > kworker/18:0: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), > nodemask=(null),cpuset=/,mems_allowed=0 > CPU: 18 PID: 287012 Comm: kworker/18:0 Not tainted > 6.8.4-200.ChopperV1.fc39.x86_64 #1 > Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 > Workqueue: pm pm_runtime_work > Call Trace: > <TASK> > dump_stack_lvl+0x47/0x60 > warn_alloc+0x165/0x1e0 > ? __alloc_pages_direct_compact+0x1ad/0x2b0 > __alloc_pages_slowpath.constprop.0+0xd7d/0xde0 > __alloc_pages+0x32d/0x350 > __dma_direct_alloc_pages.isra.0+0x16a/0x2b0 > dma_direct_alloc+0x70/0x280 > nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau] > r535_gsp_fini+0x1d4/0x350 [nouveau] > nvkm_subdev_fini+0x67/0x150 [nouveau] > nvkm_device_fini+0x95/0x1e0 [nouveau] > nvkm_udevice_fini+0x53/0x70 [nouveau] > nvkm_object_fini+0xb9/0x240 [nouveau] > nvkm_object_fini+0x75/0x240 [nouveau] > nouveau_do_suspend+0xf5/0x280 [nouveau] > nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] > pci_pm_runtime_suspend+0x67/0x1e0 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > __rpm_callback+0x41/0x170 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > rpm_callback+0x5d/0x70 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > rpm_suspend+0x120/0x6a0 > pm_runtime_work+0x98/0xb0 > process_one_work+0x171/0x340 > worker_thread+0x27b/0x3a0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe5/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > > nouveau 0000:01:00.0: gsp: suspend failed, -12 > nouveau: DRM-master:00000000:00000080: suspend failed with -12 > nouveau 0000:01:00.0: can't suspend (nouveau_pmops_runtime_suspend > [nouveau] returned -12) > > Keep in mind, I don't dive into memory management related stuff like > this very often! But I'd very much like to know how to help out > anywhere around the driver, including outside of my usual domains, so > I've been trying to write up a patch for this. The original suggestion > for a fix that Dave Airlie had given me was (unless I misunderstood, > which isn't unlikely) to try to see if we could get nvkm_gsp_mem_ctor() > to start allocating memory with vmalloc() and map that onto the GPU > using the SG helpers instead. So - I gave a shot at writing up a patch > for doing that: > > https://gitlab.freedesktop.org/lyudess/linux/-/commit/b5a41ac2bd948979815d262d8d20b4f3333f9c26 > > As you can probably guess - the patch does not really seem to work, and > I've been trying to figure out why. There's already a couple of issues > I'm aware of: the most glaring one being that as Timur pointed out, a > lot of GSP hardware expects contiguous memory allocations - but > according to them the allocation that's specifically failing should be > small enough that it'd be allocated in a contiguous page anyway:nvkm_gsp_mem_ctor is used to do coherent allocations in a bunch of places in the gsp code, we can't use vmalloc for a lot of them. A lot of the allocations are small multi-page and hang around and the hardware expects allocations to be non-scattered. Now in this single case we have a large amount of data pointed to by a radix3 page table. The data is allocated with nvkm_gsp_sg, then we fail to allocate the first level of page tables with the coherent allocation. However I don't think the first level of the page table needs to be allocated with the coherent allocator, we should allocate it with nvkm_gsp_sg instead. Dave.
Ben Skeggs
2024-Apr-19 13:52 UTC
Issues with trying to boot falcons from sgt memory + Possible firmware SG_DEBUG fix?
On 19/4/24 06:27, Lyude Paul wrote:> So - first some context here for Ben and anyone else who hasn't been > following. A little while ago I got a Slimbook Executive 16 with a > Nvidia RTX 4060 in it, and I've unfortunately been running into a kind > of annoying issue. Currently this laptop only has 16 gigs of ram, and > as it turns out - this can easily lead the system to having pretty > heavy memory fragmentation once it starts swapping pages out. > > Normally this wouldn't matter, but I unfortunately discovered that when > we're runtime suspending the GPU in Nouveau - we actually appear to > allocate some of the memory we use for migrating using > dma_alloc_coherent. This starts to fail on my system once memory > fragmentation goes up like so: > > kworker/18:0: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), > nodemask=(null),cpuset=/,mems_allowed=0 > CPU: 18 PID: 287012 Comm: kworker/18:0 Not tainted > 6.8.4-200.ChopperV1.fc39.x86_64 #1 > Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 > Workqueue: pm pm_runtime_work > Call Trace: > <TASK> > dump_stack_lvl+0x47/0x60 > warn_alloc+0x165/0x1e0 > ? __alloc_pages_direct_compact+0x1ad/0x2b0 > __alloc_pages_slowpath.constprop.0+0xd7d/0xde0 > __alloc_pages+0x32d/0x350 > __dma_direct_alloc_pages.isra.0+0x16a/0x2b0 > dma_direct_alloc+0x70/0x280 > nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau] > r535_gsp_fini+0x1d4/0x350 [nouveau] > nvkm_subdev_fini+0x67/0x150 [nouveau] > nvkm_device_fini+0x95/0x1e0 [nouveau] > nvkm_udevice_fini+0x53/0x70 [nouveau] > nvkm_object_fini+0xb9/0x240 [nouveau] > nvkm_object_fini+0x75/0x240 [nouveau] > nouveau_do_suspend+0xf5/0x280 [nouveau] > nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] > pci_pm_runtime_suspend+0x67/0x1e0 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > __rpm_callback+0x41/0x170 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > rpm_callback+0x5d/0x70 > ? __pfx_pci_pm_runtime_suspend+0x10/0x10 > rpm_suspend+0x120/0x6a0 > pm_runtime_work+0x98/0xb0 > process_one_work+0x171/0x340 > worker_thread+0x27b/0x3a0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe5/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > > nouveau 0000:01:00.0: gsp: suspend failed, -12 > nouveau: DRM-master:00000000:00000080: suspend failed with -12 > nouveau 0000:01:00.0: can't suspend (nouveau_pmops_runtime_suspend > [nouveau] returned -12) > > Keep in mind, I don't dive into memory management related stuff like > this very often! But I'd very much like to know how to help out > anywhere around the driver, including outside of my usual domains, so > I've been trying to write up a patch for this. The original suggestion > for a fix that Dave Airlie had given me was (unless I misunderstood, > which isn't unlikely) to try to see if we could get nvkm_gsp_mem_ctor() > to start allocating memory with vmalloc() and map that onto the GPU > using the SG helpers instead. So - I gave a shot at writing up a patch > for doing that: > > https://gitlab.freedesktop.org/lyudess/linux/-/commit/b5a41ac2bd948979815d262d8d20b4f3333f9c26 > > As you can probably guess - the patch does not really seem to work, and > I've been trying to figure out why. There's already a couple of issues > I'm aware of: the most glaring one being that as Timur pointed out, a > lot of GSP hardware expects contiguous memory allocations - but > according to them the allocation that's specifically failing should be > small enough that it'd be allocated in a contiguous page anyway: > > [ 9.429884] Lyude:r535_gsp_init:2186: (mbox1) == 0 > [ 9.429898] Lyude:r535_gsp_init:2186: (mbox0) == dbdfe000 > [ 9.491300] ------------[ cut here ]------------ > [ 9.491308] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1713 r535_gsp_init+0x75e/0x7c0 [nouveau] > [ 9.491533] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse > [ 9.491670] CPU: 5 PID: 921 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #22 > [ 9.491681] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 > [ 9.491690] RIP: 0010:r535_gsp_init+0x75e/0x7c0 [nouveau] > [ 9.491885] Code: 8b 83 10 0d 00 00 48 89 ef 41 bf e4 ff ff ff 48 8b 40 18 48 8b 80 48 0f 00 00 48 8b 40 28 e8 b9 5e 89 ee 0f 0b e9 73 f9 ff ff <0f> 0b 41 bf fb ff ff ff e9 5a f9 ff ff 41 89 ef 0f 0b e9 5c f9 ff > [ 9.491905] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246 > [ 9.491914] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8 > [ 9.491922] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388 > [ 9.491930] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000 > [ 9.491938] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008 > [ 9.491946] R13: 000000022b906452 R14: ffffa098e192f008 R15: 0000000000000000 > [ 9.491956] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000 > [ 9.491966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9.491974] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0 > [ 9.491989] Call Trace: > [ 9.491996] <TASK> > [ 9.492002] ? __warn+0x80/0x120 > [ 9.492012] ? r535_gsp_init+0x75e/0x7c0 [nouveau] > [ 9.492200] ? report_bug+0x164/0x190 > [ 9.492211] ? handle_bug+0x3c/0x80 > [ 9.492218] ? exc_invalid_op+0x17/0x70 > [ 9.492227] ? asm_exc_invalid_op+0x1a/0x20 > [ 9.492241] ? r535_gsp_init+0x75e/0x7c0 [nouveau] > [ 9.492429] ? r535_gsp_init+0x18e/0x7c0 [nouveau] > [ 9.492616] ? srso_return_thunk+0x5/0x5f > [ 9.492626] nvkm_subdev_init_+0x48/0x130 [nouveau] > [ 9.492802] ? srso_return_thunk+0x5/0x5f > [ 9.492810] nvkm_subdev_init+0x44/0x90 [nouveau] > [ 9.492988] nvkm_device_init+0x166/0x2e0 [nouveau] > [ 9.493189] nvkm_udevice_init+0x47/0x70 [nouveau] > [ 9.493391] nvkm_object_init+0x41/0x1c0 [nouveau] > [ 9.493567] nvkm_ioctl_new+0x16a/0x290 [nouveau] > [ 9.493740] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau] > [ 9.493912] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau] > [ 9.494121] nvkm_ioctl+0x10e/0x250 [nouveau] > [ 9.494288] nvif_object_ctor+0x112/0x190 [nouveau] > [ 9.494456] nvif_device_ctor+0x23/0x60 [nouveau] > [ 9.494625] nouveau_cli_init+0x164/0x5d0 [nouveau] > [ 9.494820] nouveau_drm_device_init+0x97/0xe00 [nouveau] > [ 9.495022] ? srso_return_thunk+0x5/0x5f > [ 9.495030] ? pci_bus_read_config_word+0x4d/0x90 > [ 9.495039] ? srso_return_thunk+0x5/0x5f > [ 9.495047] ? pci_update_current_state+0x72/0xb0 > [ 9.495059] nouveau_drm_probe+0x12c/0x280 [nouveau] > [ 9.495245] ? srso_return_thunk+0x5/0x5f > [ 9.495254] local_pci_probe+0x45/0xa0 > [ 9.495263] pci_device_probe+0xc7/0x240 > [ 9.495272] really_probe+0xd6/0x390 > [ 9.495282] ? __pfx___driver_attach+0x10/0x10 > [ 9.495290] __driver_probe_device+0x78/0x150 > [ 9.495301] driver_probe_device+0x1f/0x90 > [ 9.495308] __driver_attach+0xd2/0x1c0 > [ 9.495316] bus_for_each_dev+0x88/0xd0 > [ 9.495325] bus_add_driver+0x116/0x220 > [ 9.495334] driver_register+0x59/0x100 > [ 9.495342] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau] > [ 9.495512] do_one_initcall+0x5b/0x320 > [ 9.495524] do_init_module+0x60/0x240 > [ 9.495536] init_module_from_file+0x86/0xc0 > [ 9.495550] idempotent_init_module+0x120/0x2b0 > [ 9.495562] __x64_sys_finit_module+0x5e/0xb0 > [ 9.495571] do_syscall_64+0x88/0x170 > [ 9.495581] ? srso_return_thunk+0x5/0x5f > [ 9.495589] ? syscall_exit_to_user_mode_prepare+0x15d/0x190 > [ 9.495600] ? srso_return_thunk+0x5/0x5f > [ 9.495607] ? syscall_exit_to_user_mode+0x60/0x210 > [ 9.495615] ? srso_return_thunk+0x5/0x5f > [ 9.495622] ? do_syscall_64+0x95/0x170 > [ 9.495630] ? srso_return_thunk+0x5/0x5f > [ 9.495636] ? do_syscall_64+0x95/0x170 > [ 9.495644] ? srso_return_thunk+0x5/0x5f > [ 9.495653] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 9.495663] RIP: 0033:0x7f4de9b2919d > [ 9.495680] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48 > [ 9.495697] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 9.495707] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d > [ 9.495717] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031 > [ 9.495726] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0 > [ 9.495734] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30 > [ 9.495742] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660 > [ 9.495754] </TASK> > [ 9.495759] ---[ end trace 0000000000000000 ]--- > [ 9.495778] ------------[ cut here ]------------ > [ 9.495784] WARNING: CPU: 5 PID: 921 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2187 r535_gsp_init+0xc5/0x7c0 [nouveau] > [ 9.495981] Modules linked in: nouveau(+) rfkill binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep wmi_bmof ppdev snd_hda_core drm_ttm_helper intel_rapl_msr snd_seq ttm snd_seq_device snd_pcm video gpu_sched snd_timer i2c_algo_bit drm_gpuvm drm_exec intel_rapl_common mxm_wmi rapl snd drm_display_helper acpi_cpufreq soundcore k10temp i2c_piix4 parport_pc wmi parport gpio_amdpt gpio_generic loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 r8169 realtek sha1_ssse3 ccp w83627hf_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse > [ 9.496112] CPU: 5 PID: 921 Comm: (udev-worker) Tainted: G W 6.9.0-rc3Lyude-Test+ #22 > [ 9.496123] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 > [ 9.496132] RIP: 0010:r535_gsp_init+0xc5/0x7c0 [nouveau] > [ 9.496317] Code: 24 18 4c 8d 63 08 89 6c 24 14 4c 89 e6 6a 00 4c 8d 44 24 20 48 8d 4c 24 1c e8 b7 c3 fa ff 5f 41 89 c7 85 c0 0f 84 97 00 00 00 <0f> 0b 48 83 bb 20 0a 00 00 00 75 37 48 8b 44 24 20 65 48 2b 04 25 > [ 9.496333] RSP: 0018:ffffb271c175f748 EFLAGS: 00010246 > [ 9.496341] RAX: 0000000000000000 RBX: ffffa098e192f000 RCX: ffffa098ca2768c8 > [ 9.496351] RDX: ffffa098e191d400 RSI: ffffb271cc110080 RDI: ffffb271cc111388 > [ 9.496360] RBP: 00000000dbdfe000 R08: 0000000000000003 R09: 0000000000000000 > [ 9.496368] R10: 0000000000000000 R11: ffffa098ca276828 R12: ffffa098e192f008 > [ 9.496375] R13: 000000022b906452 R14: ffffa098e192f008 R15: 00000000fffffffb > [ 9.496383] FS: 00007f4de98cc980(0000) GS:ffffa099c4a80000(0000) knlGS:0000000000000000 > [ 9.496393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9.496400] CR2: 00007f7bd8d18ea0 CR3: 0000000104e58000 CR4: 00000000003506f0 > [ 9.496410] Call Trace: > [ 9.496416] <TASK> > [ 9.496422] ? __warn+0x80/0x120 > [ 9.496429] ? r535_gsp_init+0xc5/0x7c0 [nouveau] > [ 9.496622] ? report_bug+0x164/0x190 > [ 9.496631] ? handle_bug+0x3c/0x80 > [ 9.496638] ? exc_invalid_op+0x17/0x70 > [ 9.496647] ? asm_exc_invalid_op+0x1a/0x20 > [ 9.496660] ? r535_gsp_init+0xc5/0x7c0 [nouveau] > [ 9.496851] ? r535_gsp_init+0x18e/0x7c0 [nouveau] > [ 9.497044] ? srso_return_thunk+0x5/0x5f > [ 9.497055] nvkm_subdev_init_+0x48/0x130 [nouveau] > [ 9.497227] ? srso_return_thunk+0x5/0x5f > [ 9.497236] nvkm_subdev_init+0x44/0x90 [nouveau] > [ 9.497405] nvkm_device_init+0x166/0x2e0 [nouveau] > [ 9.497608] nvkm_udevice_init+0x47/0x70 [nouveau] > [ 9.497808] nvkm_object_init+0x41/0x1c0 [nouveau] > [ 9.497983] nvkm_ioctl_new+0x16a/0x290 [nouveau] > [ 9.498154] ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau] > [ 9.498326] ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau] > [ 9.498531] nvkm_ioctl+0x10e/0x250 [nouveau] > [ 9.498702] nvif_object_ctor+0x112/0x190 [nouveau] > [ 9.498873] nvif_device_ctor+0x23/0x60 [nouveau] > [ 9.499049] nouveau_cli_init+0x164/0x5d0 [nouveau] > [ 9.499244] nouveau_drm_device_init+0x97/0xe00 [nouveau] > [ 9.499430] ? srso_return_thunk+0x5/0x5f > [ 9.499437] ? pci_bus_read_config_word+0x4d/0x90 > [ 9.499445] ? srso_return_thunk+0x5/0x5f > [ 9.499452] ? pci_update_current_state+0x72/0xb0 > [ 9.499461] nouveau_drm_probe+0x12c/0x280 [nouveau] > [ 9.499657] ? srso_return_thunk+0x5/0x5f > [ 9.499666] local_pci_probe+0x45/0xa0 > [ 9.499674] pci_device_probe+0xc7/0x240 > [ 9.499683] really_probe+0xd6/0x390 > [ 9.499692] ? __pfx___driver_attach+0x10/0x10 > [ 9.499699] __driver_probe_device+0x78/0x150 > [ 9.499709] driver_probe_device+0x1f/0x90 > [ 9.499718] __driver_attach+0xd2/0x1c0 > [ 9.499726] bus_for_each_dev+0x88/0xd0 > [ 9.499735] bus_add_driver+0x116/0x220 > [ 9.499744] driver_register+0x59/0x100 > [ 9.499751] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau] > [ 9.499915] do_one_initcall+0x5b/0x320 > [ 9.499926] do_init_module+0x60/0x240 > [ 9.499934] init_module_from_file+0x86/0xc0 > [ 9.499948] idempotent_init_module+0x120/0x2b0 > [ 9.499962] __x64_sys_finit_module+0x5e/0xb0 > [ 9.499971] do_syscall_64+0x88/0x170 > [ 9.499987] ? srso_return_thunk+0x5/0x5f > [ 9.499996] ? syscall_exit_to_user_mode_prepare+0x15d/0x190 > [ 9.500004] ? srso_return_thunk+0x5/0x5f > [ 9.500011] ? syscall_exit_to_user_mode+0x60/0x210 > [ 9.500019] ? srso_return_thunk+0x5/0x5f > [ 9.500026] ? do_syscall_64+0x95/0x170 > [ 9.500034] ? srso_return_thunk+0x5/0x5f > [ 9.500041] ? do_syscall_64+0x95/0x170 > [ 9.500050] ? srso_return_thunk+0x5/0x5f > [ 9.500058] entry_SYSCALL_64_after_hwframe+0x71/0x79 > [ 9.500067] RIP: 0033:0x7f4de9b2919d > [ 9.500075] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b cc 0c 00 f7 d8 64 89 01 48 > [ 9.500091] RSP: 002b:00007ffc56bfe468 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 9.500103] RAX: ffffffffffffffda RBX: 00005644a0432350 RCX: 00007f4de9b2919d > [ 9.500112] RDX: 0000000000000000 RSI: 00005644a042ef30 RDI: 0000000000000031 > [ 9.500121] RBP: 00007ffc56bfe520 R08: 00007f4de9bf6b20 R09: 00007ffc56bfe4b0 > [ 9.500128] R10: 00005644a04346a0 R11: 0000000000000246 R12: 00005644a042ef30 > [ 9.500136] R13: 0000000000020000 R14: 00005644a0432d10 R15: 00005644a0434660 > [ 9.500149] </TASK> > [ 9.500154] ---[ end trace 0000000000000000 ]--- > [ 9.500162] nouveau 0000:1f:00.0: gsp: init failed, -5 > [ 9.500189] nouveau 0000:1f:00.0: init failed with -5 > [ 9.500196] nouveau: DRM-master:00000000:00000080: init failed with -5 > [ 9.500207] nouveau 0000:1f:00.0: DRM-master: Device allocation failed: -5 > [ 9.502661] nouveau 0000:1f:00.0: probe with driver nouveau failed with error -5 > > > Which brings me to the second part - TImur had me enable CONFIG_SG_DEBUG, which quickly hit a different issue: > > [ 8.992320] RIP: 0010:sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) > [ 8.992331] Code: 71 93 37 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b 94 7d 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e ae 9f 01 eb b2 66 66 2e 0f 1f 84 00 > [ 8.992428] Call Trace: > [ 8.992433] <TASK> > [ 8.992439] ? die (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:421 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:434 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/dumpstack.c:447) > [ 8.992448] ? do_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:114 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:155) > [ 8.992455] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) > [ 8.992464] ? do_error_trap (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/traps.h:58 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:176) > [ 8.992472] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) > [ 8.992481] ? exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/kernel/traps.c:267) > [ 8.992489] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) > [ 8.992496] ? asm_exc_invalid_op (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./arch/x86/include/asm/idtentry.h:621) > [ 8.992509] ? sg_init_one (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/scatterlist.h:187 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/lib/scatterlist.c:143 (discriminator 1)) > [ 8.992518] nvkm_firmware_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/firmware.c:249) nouveau > [ 8.992722] nvkm_falcon_fw_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c:199) nouveau > [ 8.992898] ga102_gsp_booter_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c:62) nouveau > [ 8.993095] r535_gsp_oneinit (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:2309) nouveau > [ 8.993292] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.993302] ? kmem_cache_alloc_lru (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3748 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3827 (discriminator 2) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/mm/slub.c:3864 (discriminator 2)) > [ 8.993311] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.993317] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.993324] ? ktime_get (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:292 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:388 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/time/timekeeping.c:848) > [ 8.993334] nvkm_subdev_oneinit_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:113) nouveau > [ 8.993510] nvkm_subdev_init_ (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:139) nouveau > [ 8.993685] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.993693] nvkm_subdev_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/subdev.c:170) nouveau > [ 8.993867] nvkm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c:3023) nouveau > [ 8.994079] nvkm_udevice_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:295) nouveau > [ 8.994281] nvkm_object_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/object.c:245) nouveau > [ 8.994457] nvkm_ioctl_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:149) nouveau > [ 8.994630] ? __pfx_nvkm_client_child_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/client.c:125) nouveau > [ 8.994803] ? __pfx_nvkm_udevice_new (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/engine/device/user.c:386) nouveau > [ 8.995013] nvkm_ioctl (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:354 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c:376) nouveau > [ 8.995187] nvif_object_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:298 (discriminator 1)) nouveau > [ 8.995356] nvif_device_ctor (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/device.c:56) nouveau > [ 8.995524] nouveau_cli_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:270) nouveau > [ 8.995721] nouveau_drm_device_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:602) nouveau > [ 8.995915] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.995923] ? pci_bus_read_config_word (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/access.c:67 (discriminator 1)) > [ 8.995932] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.995939] ? pci_update_current_state (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1195 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci.c:1187) > [ 8.995949] nouveau_drm_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nouveau_drm.c:841) nouveau > [ 8.996145] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.996154] local_pci_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:325) > [ 8.996163] pci_device_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:392 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:417 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/pci/pci-driver.c:451 (discriminator 1)) > [ 8.996174] really_probe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:578 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:656) > [ 8.996185] ? __pfx___driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1155) > [ 8.996192] __driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:798) > [ 8.996201] driver_probe_device (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:828) > [ 8.996209] __driver_attach (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/dd.c:1215) > [ 8.996217] bus_for_each_dev (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:368) > [ 8.996228] bus_add_driver (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/bus.c:673) > [ 8.996238] driver_register (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/base/driver.c:246) > [ 8.996246] ? __pfx_nouveau_drm_init (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/drivers/gpu/drm/nouveau/nvif/object.c:32) nouveau > [ 8.996415] do_one_initcall (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/init/main.c:1238) > [ 8.996428] do_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:2538) > [ 8.996437] init_module_from_file (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3168) > [ 8.996450] idempotent_init_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/spinlock.h:351 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3131 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3185) > [ 8.996462] __x64_sys_finit_module (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/./include/linux/file.h:47 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3207 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189 /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/kernel/module/main.c:3189) > [ 8.996473] do_syscall_64 (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:52 (discriminator 1) /home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/common.c:83 (discriminator 1)) > [ 8.996482] ? srso_return_thunk (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/lib/retpoline.S:224) > [ 8.996490] entry_SYSCALL_64_after_hwframe (/home/lyudess/Projects/linux/worktrees/nouveau-aux-fixes/arch/x86/entry/entry_64.S:129) > [ 8.996499] RIP: 0033:0x7fd12f52919d > > I think timur actually mentioned this bug to you previously, but in > hopes of getting something more useful out of SG_DEBUG I dug into this > problem a bit and ended up with what I believe is an actually correct > patch: > > https://gitlab.freedesktop.org/lyudess/linux/-/commit/485f1fb62ddd4b42b60848eeb48206fef4376161I think this patch is fine, and does solve the issue for me here if I enable SG_DEBUG. Ben.> > ...unfortunately, fixing that issue on my system did not get SG_DEBUG > to give me any useful info. > > Anyway - that brings me to ask 1: do you have any idea what might be > going on with the falcon boot issue I mentioned, or if I might just be > doing something wrong/silly with how I'm setting up memory in > nvkm_gsp_mem_ctor()? > > And 2: if you have the time does that patch look correct? I'm happy to > submit it :) > > Also 3: welcome back again :) >