Vincent Vanackere
2017-May-07 20:03 UTC
[Nouveau] GT 730 freeze : how do diagnose / debug ?
Hi, I own an Asus GT730-SL-2GD3-BRK, trying to drive two monitors at 2560x1440 resolution. Using gnome-shell with either Xorg or wayland I get screen freezes very frequently. Those freezes usually require a reboot to get working graphics (below a sample trace that I got yesterday). I am running Ubuntu 17.04 with the latest kernels avalable, I also tested various more recent kernels including the latest drm tree at https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next but the problem always occurs. When a freeze occurs, the computer is still reachable through ssh but the only action I found so far to get graphics back is to restart the computer. I am willing to run diagnostics programs or test any patch if it would help. I'm also not excluding the possibility that I may have some faulty hardware so any hardwae-health-test advice would be welcome... Regards, Vincent Vanackère [ 1.199135] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1) [ 1.319930] nouveau 0000:01:00.0: bios: version 80.28.92.00.10 [ 1.322095] nouveau 0000:01:00.0: fb: 2048 MiB DDR3 [ 2.620362] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB [ 2.620362] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB [ 2.620364] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 2.620378] nouveau 0000:01:00.0: DRM: DCB version 4.0 [ 2.620379] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030 [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010 [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000 [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031 [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161 [ 2.620382] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200 [ 2.666199] nouveau 0000:01:00.0: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). [ 2.717519] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies [ 2.992994] nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb: 0x60000, bo ffff8cd1499f8000 [ 3.025200] fbcon: nouveaufb (fb0) is primary device [ 3.253561] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device [ 3.268163] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 [ 2150.225651] nouveau 0000:01:00.0: fifo: read fault at 0006710000 engine 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 31 [007e8cb000 Xwayland[3019]] [ 2150.225662] nouveau 0000:01:00.0: fifo: channel 31: killed [ 2150.225663] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery [ 2150.225666] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery [ 2150.225669] nouveau 0000:01:00.0: Xwayland[3019]: channel 31 killed! [ 2296.863975] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2296.863990] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] [ 2296.864032] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2296.864047] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] [ 2296.864118] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2296.864138] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] [ 2296.864153] ? nv84_fence_read+0x2e/0x30 [nouveau] [ 2296.864175] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2296.864189] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] [ 2417.699641] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2417.699656] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] [ 2417.699688] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2417.699705] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] [ 2417.699785] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2417.699808] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] [ 2417.699825] ? nv84_fence_read+0x2e/0x30 [nouveau] [ 2417.699851] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2417.699867] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] [ 2538.535424] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2538.535439] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] [ 2538.535469] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2538.535485] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] [ 2538.535555] Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] [ 2538.535576] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] [ 2538.535591] ? nv84_fence_read+0x2e/0x30 [nouveau] [ 2538.535614] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] [ 2538.535628] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170507/565f62b6/attachment.html>
You have two issues: (a) nouveau's GL driver messed something up, causing a read fault error (b) nouveau's kernel driver tried to recover. It failed. Solution to #1: None, really. You can try updating mesa, and hope it helps. Not sure what version you're on. Solution to #2: Ben Skeggs will hopefully have something clever to say. The recovery logic was recently beefed up considerably, so the fact that you even got that far is already a good start. If you're looking for a stable experience with Xorg, I recommend using xf86-video-nouveau -- it's been extensively battle-tested, and is quite simple logic; I also recommend against anything that uses GL on an ongoing basis (which, sadly, everyone thinks is the coolest thing to do these days). If you're looking for a stable experience with a GL-based Wayland compositor, you'll have to wait until either the nouveau GL driver is perfect or nouveau kernel module can properly recover from any screwups the GL driver makes. You can also remove nouveau_dri.so entirely, which is a big hammer against these types of issues (removes all GL-based acceleration), or you can run certain key pieces of software with LIBGL_ALWAYS_SOFTWARE=1, which will force a CPU-based GL implementation. Cheers, -ilia 2017-05-07 16:03 GMT-04:00 Vincent Vanackere <vincent.vanackere at gmail.com>:> Hi, > > I own an Asus GT730-SL-2GD3-BRK, trying to drive two monitors at 2560x1440 > resolution. Using gnome-shell with either Xorg or wayland I get screen > freezes very frequently. Those freezes usually require a reboot to get > working graphics (below a sample trace that I got yesterday). > I am running Ubuntu 17.04 with the latest kernels avalable, I also tested > various more recent kernels including the latest drm tree at > https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next but the problem > always occurs. > When a freeze occurs, the computer is still reachable through ssh but the > only action I found so far to get graphics back is to restart the computer. > I am willing to run diagnostics programs or test any patch if it would > help. I'm also not excluding the possibility that I may have some faulty > hardware so any hardwae-health-test advice would be welcome... > > Regards, > > Vincent Vanackère > > [ 1.199135] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1) > [ 1.319930] nouveau 0000:01:00.0: bios: version 80.28.92.00.10 > [ 1.322095] nouveau 0000:01:00.0: fb: 2048 MiB DDR3 > [ 2.620362] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB > [ 2.620362] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB > [ 2.620364] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 > [ 2.620378] nouveau 0000:01:00.0: DRM: DCB version 4.0 > [ 2.620379] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030 > [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010 > [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000 > [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031 > [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161 > [ 2.620382] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200 > [ 2.666199] nouveau 0000:01:00.0: hwmon_device_register() is deprecated. > Please convert the driver to use hwmon_device_register_with_info(). > [ 2.717519] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies > [ 2.992994] nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb: 0x60000, > bo ffff8cd1499f8000 > [ 3.025200] fbcon: nouveaufb (fb0) is primary device > [ 3.253561] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device > [ 3.268163] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on > minor 0 > [ 2150.225651] nouveau 0000:01:00.0: fifo: read fault at 0006710000 engine > 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 31 [007e8cb000 > Xwayland[3019]] > [ 2150.225662] nouveau 0000:01:00.0: fifo: channel 31: killed > [ 2150.225663] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery > [ 2150.225666] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery > [ 2150.225669] nouveau 0000:01:00.0: Xwayland[3019]: channel 31 killed! > [ 2296.863975] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2296.863990] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] > [ 2296.864032] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2296.864047] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > [ 2296.864118] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2296.864138] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] > [ 2296.864153] ? nv84_fence_read+0x2e/0x30 [nouveau] > [ 2296.864175] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2296.864189] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > [ 2417.699641] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2417.699656] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] > [ 2417.699688] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2417.699705] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > [ 2417.699785] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2417.699808] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] > [ 2417.699825] ? nv84_fence_read+0x2e/0x30 [nouveau] > [ 2417.699851] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2417.699867] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > [ 2538.535424] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2538.535439] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] > [ 2538.535469] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2538.535485] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > [ 2538.535555] Workqueue: events_unbound nv50_disp_atomic_commit_work > [nouveau] > [ 2538.535576] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] > [ 2538.535591] ? nv84_fence_read+0x2e/0x30 [nouveau] > [ 2538.535614] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] > [ 2538.535628] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] > > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau >
Vincent Vanackere
2017-May-08 11:50 UTC
[Nouveau] GT 730 freeze : how do diagnose / debug ?
On 07/05/2017 23:50, Ilia Mirkin wrote:> You have two issues: > > (a) nouveau's GL driver messed something up, causing a read fault error > (b) nouveau's kernel driver tried to recover. It failed. > > Solution to #1: None, really. You can try updating mesa, and hope it > helps. Not sure what version you're on.Here's my packages version: ii libegl1-mesa:amd64 17.0.3-1ubuntu1 amd64 free implementation of the EGL API -- runtime ii libegl1-mesa-dev:amd64 17.0.3-1ubuntu1 amd64 free implementation of the EGL API -- development files ii libgl1-mesa-dev:amd64 17.0.3-1ubuntu1 amd64 free implementation of the OpenGL API -- GLX development files ii libgl1-mesa-dri:amd64 17.0.3-1ubuntu1 amd64 free implementation of the OpenGL API -- DRI modules ii libgl1-mesa-glx:amd64 17.0.3-1ubuntu1 amd64 free implementation of the OpenGL API -- GLX runtime ii libglapi-mesa:amd64 17.0.3-1ubuntu1 amd64 free implementation of the GL API -- shared library ii libgles2-mesa:amd64 17.0.3-1ubuntu1 amd64 free implementation of the OpenGL|ES 2.x API -- runtime ii libglu1-mesa:amd64 9.0.0-2.1build1 amd64 Mesa OpenGL utility library (GLU) ii libglu1-mesa-dev:amd64 9.0.0-2.1build1 amd64 Mesa OpenGL utility library -- development files ii libwayland-egl1-mesa:amd64 17.0.3-1ubuntu1 amd64 implementation of the Wayland EGL platform -- runtime ii mesa-common-dev:amd64 17.0.3-1ubuntu1 amd64 Developer documentation for Mesa ii mesa-utils 8.3.0-4 amd64 Miscellaneous Mesa GL utilities ii mesa-vdpau-drivers:amd64 17.0.3-1ubuntu1 amd64 Mesa VDPAU video acceleration drivers I'll try compiling a newer version from git to see if it helps...> Solution to #2: Ben Skeggs will hopefully have something clever to > say. The recovery logic was recently beefed up considerably, so the > fact that you even got that far is already a good start. > > If you're looking for a stable experience with Xorg, I recommend using > xf86-video-nouveau -- it's been extensively battle-tested, and is > quite simple logic; I also recommend against anything that uses GL on > an ongoing basis (which, sadly, everyone thinks is the coolest thing > to do these days). If you're looking for a stable experience with a > GL-based Wayland compositor, you'll have to wait until either the > nouveau GL driver is perfect or nouveau kernel module can properly > recover from any screwups the GL driver makes.I'm not expecting the GL driver to be perfect ;-) However it would be nice if the kernel module could recover at least a bit better from bad commands from the GL driver (indeed I've had some hard lockups too where I could not even connect from ssh).> You can also remove nouveau_dri.so entirely, which is a big hammer > against these types of issues (removes all GL-based acceleration), or > you can run certain key pieces of software with > LIBGL_ALWAYS_SOFTWARE=1, which will force a CPU-based GL > implementation.Thanks for the hint, I'll try this workaround too ! Please let me know if I can do anything to improve the drivers's stablility (like dumping the cards's register or enabling some traces ?). Alternatively if you know of a fanless graphic card model that would be able to drive 2 monitors at 2560x1440 with proper linux support, I'm interested ;-) Regards> Cheers, > > -ilia > > > 2017-05-07 16:03 GMT-04:00 Vincent Vanackere <vincent.vanackere at gmail.com>: >> Hi, >> >> I own an Asus GT730-SL-2GD3-BRK, trying to drive two monitors at 2560x1440 >> resolution. Using gnome-shell with either Xorg or wayland I get screen >> freezes very frequently. Those freezes usually require a reboot to get >> working graphics (below a sample trace that I got yesterday). >> I am running Ubuntu 17.04 with the latest kernels avalable, I also tested >> various more recent kernels including the latest drm tree at >> https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next but the problem >> always occurs. >> When a freeze occurs, the computer is still reachable through ssh but the >> only action I found so far to get graphics back is to restart the computer. >> I am willing to run diagnostics programs or test any patch if it would >> help. I'm also not excluding the possibility that I may have some faulty >> hardware so any hardwae-health-test advice would be welcome... >> >> Regards, >> >> Vincent Vanackère >> >> [ 1.199135] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1) >> [ 1.319930] nouveau 0000:01:00.0: bios: version 80.28.92.00.10 >> [ 1.322095] nouveau 0000:01:00.0: fb: 2048 MiB DDR3 >> [ 2.620362] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB >> [ 2.620362] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB >> [ 2.620364] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 >> [ 2.620378] nouveau 0000:01:00.0: DRM: DCB version 4.0 >> [ 2.620379] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030 >> [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010 >> [ 2.620380] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000 >> [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031 >> [ 2.620381] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161 >> [ 2.620382] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200 >> [ 2.666199] nouveau 0000:01:00.0: hwmon_device_register() is deprecated. >> Please convert the driver to use hwmon_device_register_with_info(). >> [ 2.717519] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies >> [ 2.992994] nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb: 0x60000, >> bo ffff8cd1499f8000 >> [ 3.025200] fbcon: nouveaufb (fb0) is primary device >> [ 3.253561] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device >> [ 3.268163] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on >> minor 0 >> [ 2150.225651] nouveau 0000:01:00.0: fifo: read fault at 0006710000 engine >> 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 31 [007e8cb000 >> Xwayland[3019]] >> [ 2150.225662] nouveau 0000:01:00.0: fifo: channel 31: killed >> [ 2150.225663] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery >> [ 2150.225666] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery >> [ 2150.225669] nouveau 0000:01:00.0: Xwayland[3019]: channel 31 killed! >> [ 2296.863975] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2296.863990] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] >> [ 2296.864032] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2296.864047] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> [ 2296.864118] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2296.864138] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] >> [ 2296.864153] ? nv84_fence_read+0x2e/0x30 [nouveau] >> [ 2296.864175] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2296.864189] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> [ 2417.699641] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2417.699656] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] >> [ 2417.699688] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2417.699705] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> [ 2417.699785] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2417.699808] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] >> [ 2417.699825] ? nv84_fence_read+0x2e/0x30 [nouveau] >> [ 2417.699851] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2417.699867] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> [ 2538.535424] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2538.535439] ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau] >> [ 2538.535469] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2538.535485] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> [ 2538.535555] Workqueue: events_unbound nv50_disp_atomic_commit_work >> [nouveau] >> [ 2538.535576] ? nouveau_bo_rd32+0x2a/0x30 [nouveau] >> [ 2538.535591] ? nv84_fence_read+0x2e/0x30 [nouveau] >> [ 2538.535614] nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau] >> [ 2538.535628] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] >> >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau >>
Possibly Parallel Threads
- GT 730 freeze : how do diagnose / debug ?
- GT 730 freeze : how do diagnose / debug ?
- GT 730 freeze : how do diagnose / debug ?
- [Bug 100567] New: Nouveau system freeze fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
- [Bug 103721] New: Frequent freezes with nouveau on Thinkpad P50