Mike Galbraith
2017-Jul-11 18:08 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:> Some details that may be useful in analysis of the bug: > > 1. lspci -nn -d 10de:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1> 2. What displays, if any, you have plugged into the NVIDIA board when > this happens?A Philips 273V, via DVI.> 3. Any boot parameters, esp relating to ACPI, PM, or related?None for those, what's there that will be unfamiliar to you are for patches that aren't applied. nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 ignore_loglevel crashkernel=256M,high -Mike
Ilia Mirkin
2017-Jul-11 18:22 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith <efault at gmx.de> wrote:> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: >> Some details that may be useful in analysis of the bug: >> >> 1. lspci -nn -d 10de: > > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) > 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1 > >> 2. What displays, if any, you have plugged into the NVIDIA board when >> this happens? > > A Philips 273V, via DVI. > >> 3. Any boot parameters, esp relating to ACPI, PM, or related? > > None for those, what's there that will be unfamiliar to you are for > patches that aren't applied. > > nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 > nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 > ignore_loglevel crashkernel=256M,highOK, thanks. So in other words, a fairly standard desktop with a PCIe board plugged in. No funny business. (Laptops can create a ton of additional weirdness, which I assumed you had since you were talking about STR.) My best guess is that gf119_head_vblank_put either has a bogus head id (should be in the 0..3 range) which causes it to do an out-of-bounds read on MMIO space, or that the MMIO mapping has already been removed by the time nouveau_display_suspend runs. Adding Ben Skeggs for additional insight. Some display stuff did change for 4.13 for GM20x+ boards. If it's not too much trouble, a bisect would be pretty useful. Cheers, -ilia
Mike Galbraith
2017-Jul-11 18:53 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:> > OK, thanks. So in other words, a fairly standard desktop with a PCIe > board plugged in. No funny business. (Laptops can create a ton of > additional weirdness, which I assumed you had since you were talking > about STR.)Yup, garden variety deskside box.> My best guess is that gf119_head_vblank_put either has a bogus head id > (should be in the 0..3 range) which causes it to do an out-of-bounds > read on MMIO space, or that the MMIO mapping has already been removed > by the time nouveau_display_suspend runs. Adding Ben Skeggs for > additional insight. > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful.Vacation -> back to work happens in the very early AM, so bisection will have to wait a bit. -Mike
Tobias Klausmann
2017-Jul-11 22:30 UTC
[Nouveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
>On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith wrote:>> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: >>> Some details that may be useful in analysis of the bug: >>> >>> 1. lspci -nn -d 10de: >> >> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) >> 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1 >> >>> 2. What displays, if any, you have plugged into the NVIDIA board when >>> this happens? >> >> A Philips 273V, via DVI. >> >>> 3. Any boot parameters, esp relating to ACPI, PM, or related? >> >> None for those, what's there that will be unfamiliar to you are for >> patches that aren't applied. >> >> nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 >> nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 >> ignore_loglevel crashkernel=256M,high > > OK, thanks. So in other words, a fairly standard desktop with a PCIe > board plugged in. No funny business. (Laptops can create a ton of > additional weirdness, which I assumed you had since you were talking > about STR.) > > My best guess is that gf119_head_vblank_put either has a bogus head id > (should be in the 0..3 range) which causes it to do an out-of-bounds > read on MMIO space, or that the MMIO mapping has already been removed > by the time nouveau_display_suspend runs. Adding Ben Skeggs for > additional insight. > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Hey Mike, just to inform you: i have a quite similar bug with no monitor attached while putting my nouveau card to sleep (laptop/optimus system) within nouveau_display_suspend(). I'm going to bisect this, hopefully on the long run this will aid in resolving your issue as well! Greeting, Tobias
Mike Galbraith
2017-Jul-12 09:55 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful.Bisection seemingly went fine, but the result is odd. e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit -Mike
Possibly Parallel Threads
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335