thr3ads.net - Nouveau - [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79 [Jan 2023]

If this information is useful, please help other people find it:
Share via:

Ben Skeggs

2023-Jan-18 01:28 UTC

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt>
wrote:>
> On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo <diogo.ivo at
tecnico.ulisboa.pt> wrote:
> > As a quick check can you try changing
> >
> > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
>
> Hello!
>
> Applying this change breaks probing in a different way, with a
> bad PC=0x0. From a quick look at nvkm_falcon_load_dmem it looks like this
> could happen due to the .load_dmem() callback not being properly
> initialized. This is the kernel log I got:In addition to Dave's change, can you try changing the
nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:

nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args,
sizeof(args), 0, false);

Ben.
>
> [    2.010601] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
> [    2.019436] Mem abort info:
> [    2.022273]   ESR = 0x0000000086000005
> [    2.026066]   EC = 0x21: IABT (current EL), IL = 32 bits
> [    2.031429]   SET = 0, FnV = 0
> [    2.034528]   EA = 0, S1PTW = 0
> [    2.037694]   FSC = 0x05: level 1 translation fault
> [    2.042572] [0000000000000000] user address but active_mm is swapper
> [    2.048961] Internal error: Oops: 0000000086000005 [#1] SMP
> [    2.054529] Modules linked in:
> [    2.057582] CPU: 0 PID: 36 Comm: kworker/u8:1 Not tainted 6.2.0-rc3+ #2
> [    2.064190] Hardware name: Google Pixel C (DT)
> [    2.068628] Workqueue: events_unbound deferred_probe_work_func
> [    2.074463] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
> [    2.081417] pc : 0x0
> [    2.083600] lr : nvkm_falcon_load_dmem+0x58/0x80
> [    2.088218] sp : ffffffc009ddb6f0
> [    2.091526] x29: ffffffc009ddb6f0 x28: ffffff808028a008 x27:
ffffff8081e43c38
> [    2.098658] x26: 00000000000000ff x25: ffffff808028a0a0 x24:
0000000000000000
> [    2.105788] x23: ffffff8080c328f8 x22: 000000000000002c x21:
0000000000005fd4
> [    2.112917] x20: ffffffc009ddb76c x19: ffffff8080c328b8 x18:
0000000000000000
> [    2.120047] x17: 2e74696e695f646f x16: 6874656d5f77732f x15:
0000000000000000
> [    2.127176] x14: 0000000002f546c2 x13: 0000000000000000 x12:
00000000000001ce
> [    2.134306] x11: 0000000000000001 x10: 0000000000000a90 x9 :
ffffffc009ddb600
> [    2.141436] x8 : ffffff80803d19f0 x7 : ffffff80bf971180 x6 :
00000000000001b9
> [    2.148565] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
000000000000002c
> [    2.155693] x2 : 0000000000005fd4 x1 : ffffffc009ddb76c x0 :
ffffff8080c328b8
> [    2.162822] Call trace:
> [    2.165264]  0x0
> [    2.167099]  gm20b_pmu_init+0x78/0xb4
> [    2.170762]  nvkm_pmu_init+0x20/0x34
> [    2.174334]  nvkm_subdev_init_+0x60/0x12c
> [    2.178339]  nvkm_subdev_init+0x60/0xa0
> [    2.182171]  nvkm_device_init+0x14c/0x2a0
> [    2.186178]  nvkm_udevice_init+0x60/0x9c
> [    2.190097]  nvkm_object_init+0x48/0x1b0
> [    2.194013]  nvkm_ioctl_new+0x168/0x254
> [    2.197843]  nvkm_ioctl+0xd0/0x220
> [    2.201239]  nvkm_client_ioctl+0x10/0x1c
> [    2.205160]  nvif_object_ctor+0xf4/0x22c
> [    2.209079]  nvif_device_ctor+0x28/0x70
> [    2.212910]  nouveau_cli_init+0x150/0x590
> [    2.216916]  nouveau_drm_device_init+0x60/0x2a0
> [    2.221442]  nouveau_platform_device_create+0x90/0xd0
> [    2.226489]  nouveau_platform_probe+0x3c/0x9c
> [    2.230841]  platform_probe+0x68/0xc0
> [    2.234500]  really_probe+0xbc/0x2dc
> [    2.238070]  __driver_probe_device+0x78/0xe0
> [    2.242334]  driver_probe_device+0xd8/0x160
> [    2.246511]  __device_attach_driver+0xb8/0x134
> [    2.250948]  bus_for_each_drv+0x78/0xd0
> [    2.254782]  __device_attach+0x9c/0x1a0
> [    2.258612]  device_initial_probe+0x14/0x20
> [    2.262789]  bus_probe_device+0x98/0xa0
> [    2.266619]  deferred_probe_work_func+0x88/0xc0
> [    2.271142]  process_one_work+0x204/0x40c
> [    2.275150]  worker_thread+0x230/0x450
> [    2.278894]  kthread+0xc8/0xcc
> [    2.281946]  ret_from_fork+0x10/0x20
> [    2.285525] Code: bad PC value
> [    2.288576] ---[ end trace 0000000000000000 ]---
>
> Diogo

Nicolas Chauvet

2023-Jan-18 08:42 UTC

head link

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

Le mer. 18 janv. 2023 ? 02:29, Ben Skeggs <skeggsb at gmail.com> a ?crit
:>
> On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at
tecnico.ulisboa.pt> wrote:
> >
> > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > > On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo <diogo.ivo at
tecnico.ulisboa.pt> wrote:
> > > As a quick check can you try changing
> > >
> > >
drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
> >
> > Hello!
> >
> > Applying this change breaks probing in a different way, with a
> > bad PC=0x0. From a quick look at nvkm_falcon_load_dmem it looks like
this
> > could happen due to the .load_dmem() callback not being properly
> > initialized. This is the kernel log I got:
> In addition to Dave's change, can you try changing the
> nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:
>
> nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args,
> sizeof(args), 0, false);
Here is the new stack trace:

[ 1112.488211] nouveau: loading out-of-tree module taints kernel.
[ 1112.494763] nouveau: module verification failed: signature and/or
required key missing - tainting kernel
[ 1112.534035] Failed to set up IOMMU for device 57000000.gpu;
retaining platform DMA ops
[ 1112.537536] nouveau 57000000.gpu: NVIDIA GM20B (12b000a1)
[ 1112.537587] nouveau 57000000.gpu: imem: using IOMMU
[ 1112.616677] ------------[ cut here ]------------
[ 1112.616820] nouveau 57000000.gpu: DRM: VRAM: 0 MiB
[ 1112.616830] nouveau 57000000.gpu: DRM: GART: 1048576 MiB
[ 1112.616688] WARNING: CPU: 0 PID: 388 at
/var/tmp/linux/drivers/gpu/drm/nouveau/nvkm/falcon/base.c:135
nvkm_falcon_pio_rd+0x150/0x2bc [nouveau]
[ 1112.617272] Modules linked in: nouveau(OE+) drm_ttm_helper ttm
snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
ip_set nf_tables nfnetlink qrtr snd_soc_tegra_audio_graph_card
snd_soc_audio_graph_card snd_soc_simple_card_utils snd_soc_core
snd_hda_codec_hdmi snd_hda_tegra snd_compress snd_hda_codec ac97_bus
snd_hda_core snd_pcm_dmaengine snd_hwdep snd_seq snd_seq_device sunrpc
snd_pcm usb_conn_gpio snd_timer snd max77620_thermal tegra_xudc
tegra_soctherm udc_core soundcore cpufreq_dt at24 vfat fat zram r8152
mii panel_simple mmc_block tegra_drm drm_dp_aux_bus drm_display_helper
rtc_max77686 lp855x_bl crct10dif_ce cec polyval_ce polyval_generic
ghash_ce gpio_keys sdhci_tegra xhci_tegra sdhci_pltfm sdhci
phy_tegra_xusb rtc_tegra cqhci ahci_tegra host1x tegra210_emc
i2c_tegra ip6_tables
[ 1112.617430]  ip_tables fuse
[ 1112.617440] CPU: 0 PID: 388 Comm: kworker/0:4 Tainted: G
OE     -------  ---  6.2.0-0.rc4.31.fc38.aarch64 #1
[ 1112.617446] Hardware name: nvidia,p2371-2180 NVIDIA
P2371-2180/NVIDIA P2371-2180, BIOS 2022.10 10/01/2022
[ 1112.617452] Workqueue: events nvkm_pmu_recv [nouveau]
[ 1112.617934] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1112.617940] pc : nvkm_falcon_pio_rd+0x150/0x2bc [nouveau]
[ 1112.618418] nouveau 57000000.gpu: DRM: MM: using COPY for buffer copies
[ 1112.618525] lr : nvkm_falcon_pio_rd+0x50/0x2bc [nouveau]
[ 1112.619057] sp : ffff80000bf13c40
[ 1112.619060] x29: ffff80000bf13c50 x28: 0000000000000000 x27: 0000000000000000
[ 1112.619070] x26: ffff8000553f3d70 x25: ffff0000b04704b8 x24: 0000000000000000
[ 1112.619079] x23: ffff8000554800a0 x22: 0000000000000000 x21: ffff80000bf13d56
[ 1112.619086] x20: 000000000000002a x19: 0000000000000001 x18: 0000000000000000
[ 1112.619093] x17: 000000040044ffff x16: ffff8000091f53b0 x15: 0000000000000000
[ 1112.619100] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
[ 1112.619108] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff x9 : ffff8000552cc224
[ 1112.619115] x8 : ffff0000b0470420 x7 : 0000000000000000 x6 : 000000000000002a
[ 1112.619123] x5 : 0000000000000000 x4 : ffff80005540b7c8 x3 : ffff0000b0470408
[ 1112.619130] x2 : ffff0000b0470420 x1 : ffff0000b0470408 x0 : 0000000000000003
[ 1112.619138] Call trace:
[ 1112.619141]  nvkm_falcon_pio_rd+0x150/0x2bc [nouveau]
[ 1112.619756]  nvkm_falcon_msgq_pop+0x90/0x1c0 [nouveau]
[ 1112.620313]  nvkm_falcon_msgq_recv_initmsg+0xd4/0x1f4 [nouveau]
[ 1112.620877]  gm20b_pmu_initmsg+0x3c/0xd4 [nouveau]
[ 1112.621418]  gm20b_pmu_recv+0x30/0x80 [nouveau]
[ 1112.622004]  nvkm_pmu_recv+0x24/0x30 [nouveau]
[ 1112.622547]  process_one_work+0x1e8/0x480
[ 1112.622559]  worker_thread+0x74/0x410
[ 1112.622564]  kthread+0xe8/0xf4
[ 1112.622568]  ret_from_fork+0x10/0x20
[ 1112.622577] ---[ end trace 0000000000000000 ]---
[ 1112.622696] nouveau 57000000.gpu: pmu: unexpected init message size 0 vs 42
[ 1112.622708] nouveau 57000000.gpu: pmu: error parsing init message: -22
[ 1112.623365] [drm] Initialized nouveau 1.3.1 20120801 for
57000000.gpu on minor 1
[ 1113.688183] nouveau 57000000.gpu: pmu:hpq: timeout waiting for queue ready
[ 1113.688246] nouveau 57000000.gpu: gr: init failed, -110

Diogo Ivo

2023-Jan-20 11:37 UTC

head link

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs
wrote:> On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at
tecnico.ulisboa.pt> wrote:
> > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > > As a quick check can you try changing
> > >
> > >
drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
> In addition to Dave's change, can you try changing the
> nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:
> 
> nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args,
> sizeof(args), 0, false);
Hello!

Chiming in just to say that with this change I see the same as Nicolas
except that the init message size is 255 instead of 0:

[    2.196934] nouveau 57000000.gpu: pmu: unexpected init message size 255 vs 42

Ben Skeggs

2023-Jan-27 06:00 UTC

head link

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

On Fri, 20 Jan 2023 at 21:37, Diogo Ivo <diogo.ivo at tecnico.ulisboa.pt>
wrote:>
> On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs wrote:
> > On Mon, 16 Jan 2023 at 22:27, Diogo Ivo <diogo.ivo at
tecnico.ulisboa.pt> wrote:
> > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > > > As a quick check can you try changing
> > > >
> > > >
drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
>
> > In addition to Dave's change, can you try changing the
> > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:
> >
> > nvkm_falcon_pio_wr(falcon, (u8 *)&args, 0, 0, DMEM, addr_args,
> > sizeof(args), 0, false);
>
> Hello!
>
> Chiming in just to say that with this change I see the same as Nicolas
> except that the init message size is 255 instead of 0:
>
> [    2.196934] nouveau 57000000.gpu: pmu: unexpected init message size 255
vs 42I've attached an entirely untested patch (to go on top of the other
hacks/fixes so far), that will hopefully get us a little further.

Would be great if you guys could test it out for me.

Thanks,
Ben.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gm20b.diff
Type: text/x-patch
Size: 1030 bytes
Desc: not available
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20230127/1d2002f0/attachment.bin>

Maybe Matching Threads

Search for more seemingly similar threads

Nouveau - Jan 2023 - [REGRESSION] GM20B probe fails after commit 2541626cfb79

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

[Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

Maybe Matching Threads