Ben Skeggs
2023-Jan-18 01:28 UTC
[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79
On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote:> > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote: > > On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote: > > As a quick check can you try changing > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ? > > Hello! > > Applying this change breaks probing in a different way, with a > bad PC=0x0. From a quick look at nvkm_falcon_load_dmem it looks like this > could happen due to the .load_dmem() callback not being properly > initialized. This is the kernel log I got:In addition to Dave's change, can you try changing the nvkm_falcon_load_dmem() call in gm20b_pmu_init() to: nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args, sizeof(args), 0, false); Ben.> > [ 2.010601] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > [ 2.019436] Mem abort info: > [ 2.022273] ESR = 0x0000000086000005 > [ 2.026066] EC = 0x21: IABT (current EL), IL = 32 bits > [ 2.031429] SET = 0, FnV = 0 > [ 2.034528] EA = 0, S1PTW = 0 > [ 2.037694] FSC = 0x05: level 1 translation fault > [ 2.042572] [0000000000000000] user address but active_mm is swapper > [ 2.048961] Internal error: Oops: 0000000086000005 [#1] SMP > [ 2.054529] Modules linked in: > [ 2.057582] CPU: 0 PID: 36 Comm: kworker/u8:1 Not tainted 6.2.0-rc3+ #2 > [ 2.064190] Hardware name: Google Pixel C (DT) > [ 2.068628] Workqueue: events_unbound deferred_probe_work_func > [ 2.074463] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2.081417] pc : 0x0 > [ 2.083600] lr : nvkm_falcon_load_dmem+0x58/0x80 > [ 2.088218] sp : ffffffc009ddb6f0 > [ 2.091526] x29: ffffffc009ddb6f0 x28: ffffff808028a008 x27: ffffff8081e43c38 > [ 2.098658] x26: 00000000000000ff x25: ffffff808028a0a0 x24: 0000000000000000 > [ 2.105788] x23: ffffff8080c328f8 x22: 000000000000002c x21: 0000000000005fd4 > [ 2.112917] x20: ffffffc009ddb76c x19: ffffff8080c328b8 x18: 0000000000000000 > [ 2.120047] x17: 2e74696e695f646f x16: 6874656d5f77732f x15: 0000000000000000 > [ 2.127176] x14: 0000000002f546c2 x13: 0000000000000000 x12: 00000000000001ce > [ 2.134306] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffffffc009ddb600 > [ 2.141436] x8 : ffffff80803d19f0 x7 : ffffff80bf971180 x6 : 00000000000001b9 > [ 2.148565] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000000002c > [ 2.155693] x2 : 0000000000005fd4 x1 : ffffffc009ddb76c x0 : ffffff8080c328b8 > [ 2.162822] Call trace: > [ 2.165264] 0x0 > [ 2.167099] gm20b_pmu_init+0x78/0xb4 > [ 2.170762] nvkm_pmu_init+0x20/0x34 > [ 2.174334] nvkm_subdev_init_+0x60/0x12c > [ 2.178339] nvkm_subdev_init+0x60/0xa0 > [ 2.182171] nvkm_device_init+0x14c/0x2a0 > [ 2.186178] nvkm_udevice_init+0x60/0x9c > [ 2.190097] nvkm_object_init+0x48/0x1b0 > [ 2.194013] nvkm_ioctl_new+0x168/0x254 > [ 2.197843] nvkm_ioctl+0xd0/0x220 > [ 2.201239] nvkm_client_ioctl+0x10/0x1c > [ 2.205160] nvif_object_ctor+0xf4/0x22c > [ 2.209079] nvif_device_ctor+0x28/0x70 > [ 2.212910] nouveau_cli_init+0x150/0x590 > [ 2.216916] nouveau_drm_device_init+0x60/0x2a0 > [ 2.221442] nouveau_platform_device_create+0x90/0xd0 > [ 2.226489] nouveau_platform_probe+0x3c/0x9c > [ 2.230841] platform_probe+0x68/0xc0 > [ 2.234500] really_probe+0xbc/0x2dc > [ 2.238070] __driver_probe_device+0x78/0xe0 > [ 2.242334] driver_probe_device+0xd8/0x160 > [ 2.246511] __device_attach_driver+0xb8/0x134 > [ 2.250948] bus_for_each_drv+0x78/0xd0 > [ 2.254782] __device_attach+0x9c/0x1a0 > [ 2.258612] device_initial_probe+0x14/0x20 > [ 2.262789] bus_probe_device+0x98/0xa0 > [ 2.266619] deferred_probe_work_func+0x88/0xc0 > [ 2.271142] process_one_work+0x204/0x40c > [ 2.275150] worker_thread+0x230/0x450 > [ 2.278894] kthread+0xc8/0xcc > [ 2.281946] ret_from_fork+0x10/0x20 > [ 2.285525] Code: bad PC value > [ 2.288576] ---[ end trace 0000000000000000 ]--- > > Diogo
Nicolas Chauvet
2023-Jan-18 08:42 UTC
[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79
Le mer. 18 janv. 2023 ? 02:29, Ben Skeggs <skeggsb at gmail.com> a ?crit :> > On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote: > > > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote: > > > On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote: > > > As a quick check can you try changing > > > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ? > > > > Hello! > > > > Applying this change breaks probing in a different way, with a > > bad PC=0x0. From a quick look at nvkm_falcon_load_dmem it looks like this > > could happen due to the .load_dmem() callback not being properly > > initialized. This is the kernel log I got: > In addition to Dave's change, can you try changing the > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to: > > nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args, > sizeof(args), 0, false);Here is the new stack trace: [ 1112.488211] nouveau: loading out-of-tree module taints kernel. [ 1112.494763] nouveau: module verification failed: signature and/or required key missing - tainting kernel [ 1112.534035] Failed to set up IOMMU for device 57000000.gpu; retaining platform DMA ops [ 1112.537536] nouveau 57000000.gpu: NVIDIA GM20B (12b000a1) [ 1112.537587] nouveau 57000000.gpu: imem: using IOMMU [ 1112.616677] ------------[ cut here ]------------ [ 1112.616820] nouveau 57000000.gpu: DRM: VRAM: 0 MiB [ 1112.616830] nouveau 57000000.gpu: DRM: GART: 1048576 MiB [ 1112.616688] WARNING: CPU: 0 PID: 388 at /var/tmp/linux/drivers/gpu/drm/nouveau/nvkm/falcon/base.c:135 nvkm_falcon_pio_rd+0x150/0x2bc [nouveau] [ 1112.617272] Modules linked in: nouveau(OE+) drm_ttm_helper ttm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils snd_soc_core snd_hda_codec_hdmi snd_hda_tegra snd_compress snd_hda_codec ac97_bus snd_hda_core snd_pcm_dmaengine snd_hwdep snd_seq snd_seq_device sunrpc snd_pcm usb_conn_gpio snd_timer snd max77620_thermal tegra_xudc tegra_soctherm udc_core soundcore cpufreq_dt at24 vfat fat zram r8152 mii panel_simple mmc_block tegra_drm drm_dp_aux_bus drm_display_helper rtc_max77686 lp855x_bl crct10dif_ce cec polyval_ce polyval_generic ghash_ce gpio_keys sdhci_tegra xhci_tegra sdhci_pltfm sdhci phy_tegra_xusb rtc_tegra cqhci ahci_tegra host1x tegra210_emc i2c_tegra ip6_tables [ 1112.617430] ip_tables fuse [ 1112.617440] CPU: 0 PID: 388 Comm: kworker/0:4 Tainted: G OE ------- --- 6.2.0-0.rc4.31.fc38.aarch64 #1 [ 1112.617446] Hardware name: nvidia,p2371-2180 NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2022.10 10/01/2022 [ 1112.617452] Workqueue: events nvkm_pmu_recv [nouveau] [ 1112.617934] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1112.617940] pc : nvkm_falcon_pio_rd+0x150/0x2bc [nouveau] [ 1112.618418] nouveau 57000000.gpu: DRM: MM: using COPY for buffer copies [ 1112.618525] lr : nvkm_falcon_pio_rd+0x50/0x2bc [nouveau] [ 1112.619057] sp : ffff80000bf13c40 [ 1112.619060] x29: ffff80000bf13c50 x28: 0000000000000000 x27: 0000000000000000 [ 1112.619070] x26: ffff8000553f3d70 x25: ffff0000b04704b8 x24: 0000000000000000 [ 1112.619079] x23: ffff8000554800a0 x22: 0000000000000000 x21: ffff80000bf13d56 [ 1112.619086] x20: 000000000000002a x19: 0000000000000001 x18: 0000000000000000 [ 1112.619093] x17: 000000040044ffff x16: ffff8000091f53b0 x15: 0000000000000000 [ 1112.619100] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 [ 1112.619108] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff x9 : ffff8000552cc224 [ 1112.619115] x8 : ffff0000b0470420 x7 : 0000000000000000 x6 : 000000000000002a [ 1112.619123] x5 : 0000000000000000 x4 : ffff80005540b7c8 x3 : ffff0000b0470408 [ 1112.619130] x2 : ffff0000b0470420 x1 : ffff0000b0470408 x0 : 0000000000000003 [ 1112.619138] Call trace: [ 1112.619141] nvkm_falcon_pio_rd+0x150/0x2bc [nouveau] [ 1112.619756] nvkm_falcon_msgq_pop+0x90/0x1c0 [nouveau] [ 1112.620313] nvkm_falcon_msgq_recv_initmsg+0xd4/0x1f4 [nouveau] [ 1112.620877] gm20b_pmu_initmsg+0x3c/0xd4 [nouveau] [ 1112.621418] gm20b_pmu_recv+0x30/0x80 [nouveau] [ 1112.622004] nvkm_pmu_recv+0x24/0x30 [nouveau] [ 1112.622547] process_one_work+0x1e8/0x480 [ 1112.622559] worker_thread+0x74/0x410 [ 1112.622564] kthread+0xe8/0xf4 [ 1112.622568] ret_from_fork+0x10/0x20 [ 1112.622577] ---[ end trace 0000000000000000 ]--- [ 1112.622696] nouveau 57000000.gpu: pmu: unexpected init message size 0 vs 42 [ 1112.622708] nouveau 57000000.gpu: pmu: error parsing init message: -22 [ 1112.623365] [drm] Initialized nouveau 1.3.1 20120801 for 57000000.gpu on minor 1 [ 1113.688183] nouveau 57000000.gpu: pmu:hpq: timeout waiting for queue ready [ 1113.688246] nouveau 57000000.gpu: gr: init failed, -110
Diogo Ivo
2023-Jan-20 11:37 UTC
[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79
On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs wrote:> On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote: > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote: > > > As a quick check can you try changing > > > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?> In addition to Dave's change, can you try changing the > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to: > > nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args, > sizeof(args), 0, false);Hello! Chiming in just to say that with this change I see the same as Nicolas except that the init message size is 255 instead of 0: [ 2.196934] nouveau 57000000.gpu: pmu: unexpected init message size 255 vs 42
Ben Skeggs
2023-Jan-27 06:00 UTC
[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79
On Fri, 20 Jan 2023 at 21:37, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote:> > On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs wrote: > > On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt> wrote: > > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote: > > > > As a quick check can you try changing > > > > > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target > > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ? > > > In addition to Dave's change, can you try changing the > > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to: > > > > nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args, > > sizeof(args), 0, false); > > Hello! > > Chiming in just to say that with this change I see the same as Nicolas > except that the init message size is 255 instead of 0: > > [ 2.196934] nouveau 57000000.gpu: pmu: unexpected init message size 255 vs 42I've attached an entirely untested patch (to go on top of the other hacks/fixes so far), that will hopefully get us a little further. Would be great if you guys could test it out for me. Thanks, Ben. -------------- next part -------------- A non-text attachment was scrubbed... Name: gm20b.diff Type: text/x-patch Size: 1030 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20230127/1d2002f0/attachment.bin>