Hi, any pointer on how to debug this:
[ 19.741005] nouveau 0000:01:00.0: enabling device (0541 -> 0543)
[ 19.741095] nouveau 0000:01:00.0: Using 32-bit DMA via iommu
[ 19.741165] nouveau 0000:01:00.0: NVIDIA GP108 (138000a1)
[ 19.752562] tg3 0004:01:00.0 enP4p1s0f0: renamed from eth0
[ 19.832879] [drm] Initialized ast 0.1.0 20120228 for 0005:02:00.0 on minor 0
[ 19.856391] nouveau 0000:01:00.0: bios: version 86.08.13.00.12
[ 19.857574] nouveau 0000:01:00.0: Using 32-bit DMA via iommu
[ 19.857812] nouveau 0000:01:00.0: fb: 2048 MiB GDDR5
[ 22.401204] random: fast init done
[ 23.064311] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB
[ 23.064326] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[ 23.064341] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[ 23.064356] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[ 23.064371] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[ 23.064386] nouveau 0000:01:00.0: DRM: DCB version 4.1
[ 23.064399] nouveau 0000:01:00.0: DRM: DCB outp 00: 01800346 04600010
[ 23.064416] nouveau 0000:01:00.0: DRM: DCB outp 01: 01000342 00020010
[ 23.064432] nouveau 0000:01:00.0: DRM: DCB outp 02: 01011352 00020020
[ 23.064448] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001046
[ 23.064463] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[ 23.065303] nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000000
00001000 00000001
[ 23.065323] nouveau 0000:01:00.0: disp: chid 1 mthd 0000 data 00000000
00001000 00000001
[ 23.086649] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 23.087500] [drm] Driver supports precise vblank timestamp query.
[ 23.088876] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[ 23.354442] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800
[ TIMEOUT ]
[ 25.354017] ------------[ cut here ]------------
[ 25.355515] nouveau 0000:01:00.0: timeout
[ 25.357105] WARNING: CPU: 0 PID: 586 at
drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1524
gf100_gr_init_ctxctl_ext+0x798/0xa50 [nouveau]
[ 25.358654] Modules linked in: nouveau(+) ast i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm aacraid tg3 vmx_crypto
drm_panel_orientation_quirks i2c_core crc32c_vpmsum
[ 25.360265] CPU: 0 PID: 586 Comm: kworker/0:3 Not tainted 4.20.0+ #4
[ 25.361865] Workqueue: events work_for_cpu_fn
[ 25.363471] NIP: c00800000dbfae40 LR: c00800000dbfae3c CTR: c000000000c4f870
[ 25.365096] REGS: c00000000a416fa0 TRAP: 0700 Not tainted (4.20.0+)
[ 25.366737] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
48008482 XER: 00000000
[ 25.368402] CFAR: c000000000119834 IRQMASK: 0
GPR00: c00800000dbfae3c c00000000a417230 c00800000dd38c00
000000000000001d
GPR04: 0000000000000001 0000000000000000 0000000000000293
0000000000000000
GPR08: 0000000000000007 0000000000000007 0000000000000001
c00800001c64d0a0
GPR12: 0000000000008000 c0000000017c3000 c00000000014a9d8
c00000000abd3b00
GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR20: 0000000000000000 0000000000000000 fffffffffffffef7
0000000000000000
GPR24: c00000798484fab8 00000000000000ff 0000000000000020
0000000000000000
GPR28: c000000009f5d000 0000000000000000 c000000005020000
c000000009f5b000
[ 25.383286] NIP [c00800000dbfae40] gf100_gr_init_ctxctl_ext+0x798/0xa50
[nouveau]
[ 25.384940] LR [c00800000dbfae3c] gf100_gr_init_ctxctl_ext+0x794/0xa50
[nouveau]
[ 25.386539] Call Trace:
[ 25.388174] [c00000000a417230] [c00800000dbfae3c]
gf100_gr_init_ctxctl_ext+0x794/0xa50 [nouveau] (unreliable)
[ 25.389831] [c00000000a4172e0] [c00800000dbfc404]
gf100_gr_init_ctxctl+0x3c/0x3e0 [nouveau]
[ 25.391476] [c00000000a417390] [c00800000dbf9f94] gf100_gr_init_+0xac/0xd0
[nouveau]
[ 25.393117] [c00000000a4173c0] [c00800000dbe703c] nvkm_gr_init+0x34/0x50
[nouveau]
[ 25.394740] [c00000000a4173e0] [c00800000db0f4d8]
nvkm_engine_init+0x190/0x2e0 [nouveau]
[ 25.396354] [c00000000a417470] [c00800000db16ed4]
nvkm_subdev_init+0x11c/0x320 [nouveau]
[ 25.397965] [c00000000a4174f0] [c00800000db0f6ac]
nvkm_engine_ref.part.0+0x84/0xd0 [nouveau]
[ 25.399578] [c00000000a417530] [c00800000db11ef4] nvkm_ioctl_new+0x1cc/0x3c0
[nouveau]
[ 25.401173] [c00000000a417660] [c00800000db123b4] nvkm_ioctl+0x10c/0x370
[nouveau]
[ 25.402769] [c00000000a417700] [c00800000dc26818] nvkm_client_ioctl+0x20/0x40
[nouveau]
[ 25.404360] [c00000000a417720] [c00800000db0b07c] nvif_object_ioctl+0x74/0xa0
[nouveau]
[ 25.405945] [c00000000a417740] [c00800000db0bb90] nvif_object_init+0xe8/0x1a0
[nouveau]
[ 25.407539] [c00000000a4177b0] [c00800000dc3f888]
nvc0_fbcon_accel_init+0x70/0xaa0 [nouveau]
[ 25.409119] [c00000000a417810] [c00800000dc3b4f4]
nouveau_fbcon_create+0x58c/0x5b0 [nouveau]
[ 25.410653] [c00000000a417950] [c00800001c317eb0]
__drm_fb_helper_initial_config_and_unlock+0x2d8/0x5d0 [drm_kms_helper]
[ 25.412209] [c00000000a417a00] [c00800000dc3c1a8]
nouveau_fbcon_init+0x210/0x280 [nouveau]
[ 25.413708] [c00000000a417a50] [c00800000dc23e88]
nouveau_drm_device_init+0x5d0/0x9c0 [nouveau]
[ 25.415162] [c00000000a417b60] [c00800000dc24624]
nouveau_drm_probe+0x2bc/0x340 [nouveau]
[ 25.416494] [c00000000a417bb0] [c0000000006ed36c] local_pci_probe+0x6c/0x140
[ 25.417766] [c00000000a417c40] [c00000000013c748] work_for_cpu_fn+0x38/0x60
[ 25.418984] [c00000000a417c70] [c0000000001417d0]
process_one_work+0x250/0x500
[ 25.420210] [c00000000a417d10] [c000000000141cf0] worker_thread+0x270/0x5b0
[ 25.421434] [c00000000a417db0] [c00000000014ab7c] kthread+0x1ac/0x1c0
[ 25.422655] [c00000000a417e20] [c00000000000bdd4]
ret_from_kernel_thread+0x5c/0x68
[ 25.423866] Instruction dump:
[ 25.425068] e8410018 e9210060 7c641b78 e9290010 e9290010 e8a90050 2fa50000
419e0114
[ 25.426303] 3c620000 e863c7a0 4807a2f9 e8410018 <0fe00000> 3ba0fff0
4bfffc14 60000000
[ 25.427555] ---[ end trace 11a5d40b65319c36 ]---
[ 25.428806] nouveau 0000:01:00.0: gr: init failed, -16
[ 25.480400] nouveau 0000:01:00.0: DRM: allocated 3840x2160 fb: 0x200000, bo
(____ptrval____)
[ 25.746336] nouveau 0000:01:00.0: i2c: aux 0004: begin idle timeout bad00100
[ 25.791511] nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device
[ 25.871488] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on
minor 1
[ 26.367628] nouveau 0000:01:00.0: i2c: aux 0004: begin idle timeout bad00100
This is ppc64 5.0rc1 with 4k pages. Maybe it is some iommu issues
like thing not mapped properly to the GPU.
lspci:
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT
1030] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c98
Device tree node: /sys/firmware/devicetree/base/pciex at 600c3c0000000/pci at
0/vga at 0
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping-
SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 21
NUMA node: 0
Region 0: Memory at 600c000000000 (32-bit, non-prefetchable) [size=16M]
Region 1: [virtual] Memory at 6000000000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 6000010000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 0000
[virtual] Expansion ROM at 600c001000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 1000000000000000 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s
<512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+,
EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn+ ECRCChkCap- ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024
<?>
Capabilities: [900 v1] Secondary PCI Express <?>
Kernel driver in use: nouveau
Kernel modules: nouveau
0000:01:00.1 Audio device: NVIDIA Corporation GP108 High Definition Audio
Controller (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c98
Device tree node: /sys/firmware/devicetree/base/pciex at 600c3c0000000/pci at
0/multimedia-device at 0,1
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping-
SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 511
NUMA node: 0
Region 0: Memory at 600c001080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s
<512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn+ ECRCChkCap- ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
Only have intermittent access to that system :(
Cheers,
Jérôme