Mike Galbraith
2017-Jul-14 13:36 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:> On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efault at gmx.de> wrote: > > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: > >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > >> > > >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > >> > too much trouble, a bisect would be pretty useful. > >> > >> Bisection seemingly went fine, but the result is odd. > >> > >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit > > > > But it really really is bad. Looking at gitk fork in the road leading > > to it... > > > > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good > > e4e818cc2d7c drm: make drm_panel.h self-contained - good > > 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good > > > > Before the git highway splits, all is well. The lane with commits > > works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? > > Hmmm... that tree does not appear to have gotten a v4.12 backmerge at > any point. The last backmerge from Linus as far as I can tell was > v4.11-rc7. Could be an interaction with some out-of-tree change.Ok, a network outage gave me time to go hunting. Indeed it is a bad interaction with the tree DRM merged into. All DRM did was to slip a WARN_ON_ONCE() that nouveau triggers into a kernel module where such things no longer warn, they blow the box out of the water. I made a dinky testcase module (attached), and bisected to the real root.... 19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit commit 19d436268dde95389c616bb3819da73f0a8b28a8 Author: Peter Zijlstra <peterz at infradead.org> Date: Sat Feb 25 08:56:53 2017 +0100 debug: Add _ONCE() logic to report_bug() Josh suggested moving the _ONCE logic inside the trap handler, using a bit in the bug_entry::flags field, avoiding the need for the extra variable. Sadly this only works for WARN_ON_ONCE(), since the others have printk() statements prior to triggering the trap. Still, this saves a fair amount of text and some data: text data filename 10682460 4530992 defconfig-build/vmlinux.orig 10665111 4530096 defconfig-build/vmlinux.patched Suggested-by: Josh Poimboeuf <jpoimboe at redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz at infradead.org> Cc: Andy Lutomirski <luto at kernel.org> Cc: Arnd Bergmann <arnd at arndb.de> Cc: Borislav Petkov <bp at alien8.de> Cc: Brian Gerst <brgerst at gmail.com> Cc: Denys Vlasenko <dvlasenk at redhat.com> Cc: H. Peter Anvin <hpa at zytor.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Thomas Gleixner <tglx at linutronix.de> Signed-off-by: Ingo Molnar <mingo at kernel.org> :040000 040000 9f47f66ec4c234f6ee8e2a09e991c95fe47cf2c1 3e92aa9e77b39ed075ae2c3bdf041d92ef898f62 M arch :040000 040000 34f70b73d40c82533dd7df9b289106be69e2fa8d dd5d7248694a36b3e170f2dca5d9c4121535a990 M include :040000 040000 f6e627b0d378f0a00d2987fdd0c7b215306e6e3c b360d4ee2579744cce530184d7dab13493f73ee0 M lib -------------- next part -------------- A non-text attachment was scrubbed... Name: warn_on_once.patch Type: text/x-patch Size: 675 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170714/a0402ff4/attachment-0001.bin>
Mike Galbraith
2017-Jul-14 13:41 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:> All DRM did was to slip a > WARN_ON_ONCE() that nouveau triggers into a kernel module where such > things no longer warn, they blow the box out of the water.BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c into a WARN_ONCE(), and all is peachy, you get the warning, box lives. --- drivers/gpu/drm/drm_vblank.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp */ if (mode->crtc_clock == 0) { DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", + dev->driver->name); return false; }
Tobias Klausmann
2017-Jul-14 15:05 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On 7/14/17 3:41 PM, Mike Galbraith wrote:> On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote: >> All DRM did was to slip a >> WARN_ON_ONCE() that nouveau triggers into a kernel module where such >> things no longer warn, they blow the box out of the water. > BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c > into a WARN_ONCE(), and all is peachy, you get the warning, box lives. > > --- > drivers/gpu/drm/drm_vblank.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/drivers/gpu/drm/drm_vblank.c > +++ b/drivers/gpu/drm/drm_vblank.c > @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp > */ > if (mode->crtc_clock == 0) { > DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); > - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); > + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", > + dev->driver->name); > > return false; > }Hey, confirmed this helps saving the box, but we still have to find the root cause! Backtrace with the above fix applied (and the one which came in with the latest drm-fixes merge)! [1] https://hastebin.com/uyoqifijed.http Thanks, Tobias
Peter Zijlstra
2017-Jul-14 15:50 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, Jul 14, 2017 at 03:36:08PM +0200, Mike Galbraith wrote:> Ok, a network outage gave me time to go hunting. Indeed it is a bad > interaction with the tree DRM merged into. All DRM did was to slip a > WARN_ON_ONCE() that nouveau triggers into a kernel module where such > things no longer warn, they blow the box out of the water. I made a > dinky testcase module (attached), and bisected to the real root.... > > 19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit > commit 19d436268dde95389c616bb3819da73f0a8b28a8 > Author: Peter Zijlstra <peterz at infradead.org> > Date: Sat Feb 25 08:56:53 2017 +0100 > > debug: Add _ONCE() logic to report_bug()Urgh, is for some mysterious reason the __bug_table section of modules ending up in RO memory? I forever get lost in that link magic :/
Mike Galbraith
2017-Jul-14 15:58 UTC
[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, 2017-07-14 at 17:50 +0200, Peter Zijlstra wrote:> On Fri, Jul 14, 2017 at 03:36:08PM +0200, Mike Galbraith wrote: > > Ok, a network outage gave me time to go hunting. Indeed it is a bad > > interaction with the tree DRM merged into. All DRM did was to slip a > > WARN_ON_ONCE() that nouveau triggers into a kernel module where such > > things no longer warn, they blow the box out of the water. I made a > > dinky testcase module (attached), and bisected to the real root.... > > > > 19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit > > commit 19d436268dde95389c616bb3819da73f0a8b28a8 > > Author: Peter Zijlstra <peterz at infradead.org> > > Date: Sat Feb 25 08:56:53 2017 +0100 > > > > debug: Add _ONCE() logic to report_bug() > > Urgh, is for some mysterious reason the __bug_table section of modules > ending up in RO memory? > > I forever get lost in that link magic :/+1 drm.ko 20 __bug_table 00000630 0000000000000000 0000000000000000 0004bff3 2**0 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA vmlinux 15 __bug_table 0000ba84 ffffffff81af26c0 0000000001af26c0 00cf26c0 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA Danged if I know... um um RELOC business mucks things up? -Mike
Apparently Analagous Threads
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
- [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335