Marcin Zajączkowski
2019-Dec-16 16:35 UTC
[Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
Hi, I've encountered a severe regression in TU116 (probably also TU117) introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system usually hangs on the subsequent graphic mode related operation (calling xrandr after login is enough) with the following error:> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []...> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM] > kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff > kernel: ------------[ cut here ]------------ > kernel: nouveau 0000:01:00.0: timeout > kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau](detailed log in a corresponding issue - [1]) With earlier kernels there was no hardware acceleration for NVidia GTX 1660 Ti, but at least I could use nouveau to disable it (to save battery, trees and lower temperature) or even have an external output (with Wayland). Now, the system is unusable with nouveau :(. I spent some time trying to narrow the scope using on the existing kernel builds for Fedora. I was able to determine that the problem was introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine) and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors). It's just a few days (7-11 Aug) and "only" around 250 commits. I went through them, but (based on the commits name) I haven't seen any nouveau related changes and in general no very suspected drm related changes.> git log 33920f1ec5bf..v5.3-rc4 --statMaybe some of more nouveau/drm-experienced developers could take a look at that to determine which commit could break it (to make it easier to find out what should be fixed to prevent that regression)? [1] - https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516 Thanks in advance Marcin
Ilia Mirkin
2019-Dec-16 17:08 UTC
[Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
Hi Marcin, You should do a git bisect rather than guessing about commits. I suspect that searching for "kernel git bisect fedora" should prove instructive if you're not sure how to do this. Cheers, -ilia On Mon, Dec 16, 2019 at 11:42 AM Marcin Zaj?czkowski <mszpak at wp.pl> wrote:> > Hi, > > I've encountered a severe regression in TU116 (probably also TU117) > introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system > usually hangs on the subsequent graphic mode related operation (calling > xrandr after login is enough) with the following error: > > > kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 [] > ... > > kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM] > > kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff > > kernel: ------------[ cut here ]------------ > > kernel: nouveau 0000:01:00.0: timeout > > kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau] > > (detailed log in a corresponding issue - [1]) > > With earlier kernels there was no hardware acceleration for NVidia GTX > 1660 Ti, but at least I could use nouveau to disable it (to save > battery, trees and lower temperature) or even have an external output > (with Wayland). Now, the system is unusable with nouveau :(. > > I spent some time trying to narrow the scope using on the existing > kernel builds for Fedora. I was able to determine that the problem was > introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine) > and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors). > > It's just a few days (7-11 Aug) and "only" around 250 commits. I went > through them, but (based on the commits name) I haven't seen any nouveau > related changes and in general no very suspected drm related changes. > > > git log 33920f1ec5bf..v5.3-rc4 --stat > > > Maybe some of more nouveau/drm-experienced developers could take a look > at that to determine which commit could break it (to make it easier to > find out what should be fixed to prevent that regression)? > > > [1] - > https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516 > > Thanks in advance > Marcin > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Marcin Zajączkowski
2019-Dec-16 17:42 UTC
[Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
On 2019-12-16 18:08, Ilia Mirkin wrote:> Hi Marcin, > > You should do a git bisect rather than guessing about commits. I > suspect that searching for "kernel git bisect fedora" should prove > instructive if you're not sure how to do this.Thanks for your suggestion. I realize that I can do it at the Git level and it is the ultimate way to go. However, building the kernel version from sources takes some time (in addition to a regular time needed to install/restart/verify which I already experienced narrowing down to a "just" ~250 commits). Therefore, I would be really thankful for a suggestion which commits could be good to check first - having 2, 4 is better than 8-10 (assuming someone is right :) ). Marcin> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zaj?czkowski <mszpak at wp.pl> wrote: >> >> Hi, >> >> I've encountered a severe regression in TU116 (probably also TU117) >> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system >> usually hangs on the subsequent graphic mode related operation (calling >> xrandr after login is enough) with the following error: >> >>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 [] >> ... >>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM] >>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff >>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff >>> kernel: ------------[ cut here ]------------ >>> kernel: nouveau 0000:01:00.0: timeout >>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau] >> >> (detailed log in a corresponding issue - [1]) >> >> With earlier kernels there was no hardware acceleration for NVidia GTX >> 1660 Ti, but at least I could use nouveau to disable it (to save >> battery, trees and lower temperature) or even have an external output >> (with Wayland). Now, the system is unusable with nouveau :(. >> >> I spent some time trying to narrow the scope using on the existing >> kernel builds for Fedora. I was able to determine that the problem was >> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine) >> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors). >> >> It's just a few days (7-11 Aug) and "only" around 250 commits. I went >> through them, but (based on the commits name) I haven't seen any nouveau >> related changes and in general no very suspected drm related changes. >> >>> git log 33920f1ec5bf..v5.3-rc4 --stat >> >> >> Maybe some of more nouveau/drm-experienced developers could take a look >> at that to determine which commit could break it (to make it easier to >> find out what should be fixed to prevent that regression)? >> >> >> [1] - >> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516 >> >> Thanks in advance >> Marcin >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
Reasonably Related Threads
- Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
- Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
- Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
- Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
- Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed