Owen T. Heisler
2022-Sep-12 05:20 UTC
[Nouveau] Regression: Kernel/GPU crash with "Asynchronous wait on fence"
Hi, I am experiencing a kernel/GPU crash once every few days on an Nvidia Optimus system with a secondary display connected to an Nvidia output. The secondary display turns off suddenly, X freezes, and in most cases the kernel hangs. Module parameter `nouveau.config=NvClkMode=15` is in use, but I get the same behavior without it. I have captured a variety of log data, but I find these two errors consistently: - Asynchronous wait on fence nouveau:systemd-logind - nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff - Fixing recursive fault but reboot is needed! This is a regression; nouveau was stable with Debian 10 buster (Linux v4.19). The crashes started after upgrading to Debian 11 bullseye. I have tested Linux v4.14.290, v4.19.255, and the latest nouveau-next commit 9622bcb7c72b230d64b7f7d2f9505e17214f3597; all exhibit the same behavior (with some variation in log output). Is a userspace change causing a kernel crash? Do I need to try different versions of libdrm and xf86-video-nouveau userspace components? I posted more information and log data at: <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180> Thanks, Owen