David Woodhouse
2025-Apr-07 17:01 UTC
[6.13.6 stable regression?] Nouveau reboot failure in r535_gsp_msg_recv()
On Mon, 2025-04-07 at 16:28 +0000, Timur Tabi wrote:> On Mon, 2025-04-07 at 16:28 +0100, David Woodhouse wrote: > > [608593.5728743 CPU: 15 UID: 626640614 PID: 17529 Comm: WebKitWebProces > > Taint > > [608593.576062] Tainted: [W]=WARN > > What does this mean??That the kernel is 'tainted' because it's already emitted a warning?> > C608593.579235] Hardware name: LENOVO 21FVS16V08/21FVS16V80, BIOS N3ZET45W > > (1. > > [608593.582441] RIP: 0010:r535_gsp_msgq_wait+0x1c4/0x1f0 [nouveau] > > Can you add a bunch of printks to r535_gsp_msgq_wait() to help narrow down > which specific WARN this is?? Or maybe use addr2line?Not exactly the same build (I'm on 6.14 now) but: (gdb) list *(r535_gsp_msgq_wait+0x1c4) 0xd24 is in r535_gsp_msgq_wait (drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:117). 112 break; 113 114 usleep_range(1, 2); 115 } while (--(*ptime)); 116 117 if (WARN_ON(!*ptime)) 118 return ERR_PTR(-ETIMEDOUT); 119 120 mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + rptr * 0x1000); 121 It doesn't seem to happen with 6.14, but maybe it's only when the HDMI has stopped working, and that hasn't happened yet with 6.14. I booted back into 6.13.6 to test that again, and it also managed to reboot if I did so *before* the HDMI output was unhappy. Any clues on how to debug the USB-C output, and where to report that? -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5069 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20250407/2f84f5c6/attachment.bin>
Timur Tabi
2025-Apr-07 17:14 UTC
[6.13.6 stable regression?] Nouveau reboot failure in r535_gsp_msg_recv()
On Mon, 2025-04-07 at 18:01 +0100, David Woodhouse wrote:> > > > Not exactly the same build (I'm on 6.14 now) but: > > (gdb) list *(r535_gsp_msgq_wait+0x1c4) > 0xd24 is in r535_gsp_msgq_wait > (drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:117). > 112 break; > 113 > 114 usleep_range(1, 2); > 115 } while (--(*ptime)); > 116 > 117 if (WARN_ON(!*ptime)) > 118 return ERR_PTR(-ETIMEDOUT);This tells me that GSP-RM has crashed, which explains a lot of the behavior you're seeing. What I need now are the GSP-RM logs. ?In your /etc/modprobe.d, see if there is a file with "options nouveau". If there isn't, create one, and then add the "keep-gsp-logging=1" parameter, so it looks something like this: options nouveau keep-gsp-logging=1 Reboot and then tell me if you see anything like this: # ls -lR /sys/kernel/debug/nouveau/ /sys/kernel/debug/nouveau/: total 0 drwxr-xr-x 2 root root 0 Apr 7 12:06 0000:65:00.0 '/sys/kernel/debug/nouveau/0000:65:00.0': total 0 -r--r--r-- 1 root root 65536 Apr 7 12:06 loginit -r--r--r-- 1 root root 65536 Apr 7 12:06 logintr -r--r--r-- 1 root root 4096 Apr 7 12:06 logpmu -r--r--r-- 1 root root 65536 Apr 7 12:06 logrm If you do, I need the contents of these files. So e.g.: cp /sys/kernel/debug/nouveau/0000:65:00.0/loginit loginit cp /sys/kernel/debug/nouveau/0000:65:00.0/logrm logrm cp /sys/kernel/debug/nouveau/0000:65:00.0/logpmu logpmu cp /sys/kernel/debug/nouveau/0000:65:00.0/logintr logintr You may only see some of these files, that's okay. Zip them up and email them to me.> Any clues on how to debug the USB-C output, and where to report that?No, I can't help with that.