Pierre Moreau
2018-Feb-14 17:41 UTC
[Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On 2018-02-14 — 09:36, Ilia Mirkin wrote:> On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos at linux.ee> wrote: > >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: > >> > >> NV5 in another PC (secondary card in x86-64) made the systrem crash on > >> boot, in nvkm_therm_clkgate_fini. > > > > Mind booting with nouveau.debug=trace? That should hopefully tell us > > more exactly which thing is dying. If you have a cross-compile/distcc > > setup handy, a bisect may be even more useful. > > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is > somehow mis-hooked up for NV5 now. A bisect result would still make > the culprit a lot more obvious.CC’ing Lyude Paul as she hooked up the clockgating support. Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of their check for the clkgate_* hooks being there? Or instead, maybe have the check in nvkm_device_init() nvkm_device_init()? Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180214/f849de72/attachment.sig>
Lyude Paul
2018-Feb-14 19:11 UTC
[Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
Actually this was brought up to me already, there's a fix on the mailing list for this I reviewed a little while ago from nvidia that we should pull in: https://patchwork.freedesktop.org/patch/203205/ Would you guys mind confirming that this patch fixes your issues? On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote:> On 2018-02-14 — 09:36, Ilia Mirkin wrote: > > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos at linux.ee> wrote: > > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in > > > > > 4.15: > > > > > > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on > > > > boot, in nvkm_therm_clkgate_fini. > > > > > > Mind booting with nouveau.debug=trace? That should hopefully tell us > > > more exactly which thing is dying. If you have a cross-compile/distcc > > > setup handy, a bisect may be even more useful. > > > > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is > > somehow mis-hooked up for NV5 now. A bisect result would still make > > the culprit a lot more obvious. > > CC’ing Lyude Paul as she hooked up the clockgating support. > > Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t > nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and > nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of > their check for the clkgate_* hooks being there? Or instead, maybe have the > check in nvkm_device_init() nvkm_device_init()? > > Pierre-- Cheers, Lyude Paul
Meelis Roos
2018-Feb-14 20:59 UTC
[Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
> Actually this was brought up to me already, there's a fix on the mailing list > for this I reviewed a little while ago from nvidia that we should pull in: > > https://patchwork.freedesktop.org/patch/203205/ > > Would you guys mind confirming that this patch fixes your issues?It works on my amd64, P4 is still compiling. [ 1.124987] nouveau 0000:04:05.0: NVIDIA NV05 (20154000) [ 1.161464] nouveau 0000:04:05.0: bios: version 03.05.00.10.00 [ 1.161475] nouveau 0000:04:05.0: bios: DCB table not found [ 1.161535] nouveau 0000:04:05.0: bios: DCB table not found [ 1.161577] nouveau 0000:04:05.0: bios: DCB table not found [ 1.161586] nouveau 0000:04:05.0: bios: DCB table not found [ 1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz [ 1.344024] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb67c69f81, max_idle_ns: 440795210317 ns [ 1.344037] clocksource: Switched to clocksource tsc [ 1.408102] nouveau 0000:04:05.0: tmr: unknown input clock freq [ 1.409471] nouveau 0000:04:05.0: fb: 32 MiB SDRAM [ 1.414459] nouveau 0000:04:05.0: DRM: VRAM: 31 MiB [ 1.414467] nouveau 0000:04:05.0: DRM: GART: 128 MiB [ 1.414476] nouveau 0000:04:05.0: DRM: BMP version 5.17 [ 1.414484] nouveau 0000:04:05.0: DRM: No DCB data found in VBIOS [ 1.415629] nouveau 0000:04:05.0: DRM: Adaptor not initialised, running VBIOS init tables. [ 1.415829] nouveau 0000:04:05.0: bios: DCB table not found [ 1.416125] nouveau 0000:04:05.0: DRM: Saving VGA fonts [ 1.477526] nouveau 0000:04:05.0: DRM: No DCB data found in VBIOS [ 1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 1.478438] [drm] Driver supports precise vblank timestamp query. [ 1.479618] nouveau 0000:04:05.0: DRM: MM: using M2MF for buffer copies [ 1.517930] nouveau 0000:04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo 00000000a09f4d1f [ 1.519294] nouveau 0000:04:05.0: fb1: nouveaufb frame buffer device [ 1.519313] [drm] Initialized nouveau 1.3.1 20120801 for 0000:04:05.0 on minor 1 -- Meelis Roos (mroos at linux.ee)
Possibly Parallel Threads
- 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
- 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
- 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
- 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
- 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini