Karol Herbst
2019-May-21 17:35 UTC
[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
was able to get the lspci prints via ssh. Machine rebooted automatically each time though. relevant dmesg: kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff (last one is a 64 bit mmio read to get the on GPU timer value) # lspci -vvxxx -s 0:01.00 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff [size=4K] Memory behind bridge: ec000000-ed0fffff [size=17M] Prefetchable memory behind bridge: 00000000c0000000-00000000d1ffffff [size=288M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Dell Device 07be Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending+ InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=02 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed19000 Capabilities: [d94 v1] Secondary PCI Express <?> Kernel driver in use: pcieport 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02 b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00 c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00 d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00 lspci -vvxxx -s 1:00.00 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: nouveau Kernel modules: nouveau 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com> wrote:> > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote: > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote: > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote: > > > > > > > Apperantly things go south if we suspend the device with a different PCIE > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107. > > > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing > > > > > > > that outside of drivers? > > > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's > > > > > > feasible. > > > > > > > > > > > > It looks like this driver changes the PCIe link speed using some > > > > > > device-specific mechanism. When we suspend, we put the device in > > > > > > D3cold, so it loses all its state. When we resume, the link probably > > > > > > comes up at the boot speed because nothing did that device-specific > > > > > > magic to change it, so you probably end up with the link being slow > > > > > > but the driver thinking it's configured to be fast, and maybe that > > > > > > combination doesn't work. > > > > > > > > > > > > If it requires something device-specific to change that link speed, I > > > > > > don't know how to put that in the PCI core. But maybe I'm missing > > > > > > something? > > > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2): > > > > > > > > > > > > Initialization – During hardware initialization, each PCI Express > > > > > > Link is set up following a negotiation of Lane widths and frequency > > > > > > of operation by the two agents at each end of the Link. No firmware > > > > > > or operating system software is involved. > > > > > > > > > > > > I have been assuming that this means device-specific link speed > > > > > > management is out of spec, but it seems pretty common that devices > > > > > > don't come up by themselves at the fastest possible link speed. So > > > > > > maybe the spec just intends that devices can operate at *some* valid > > > > > > speed. > > > > > > > > > > I would expect that devices kind of have to figure out what they can > > > > > operate on and the operating system kind of just checks what the > > > > > current state is and doesn't try to "restore" the old state or > > > > > something? > > > > > > > > The devices at each end of the link negotiate the width and speed of > > > > the link. This is done directly by the hardware without any help from > > > > the OS. > > > > > > > > The OS can read the current link state (Current Link Speed and > > > > Negotiated Link Width, both in the Link Status register). The OS has > > > > very little control over that state. It can't directly restore the > > > > state because the hardware has to negotiate a width & speed that > > > > result in reliable operation. > > > > > > > > > We don't do anything in the driver after the device was suspended. And > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff > > > > > essentially. I have no idea how much of this is part of the actual pci > > > > > standard and how much is driver specific. But the driver also wants to > > > > > have some control over the link speed as it's tight to performance > > > > > states on GPU. > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to > > > > influence the link width or speed. If the GPU driver needs to do > > > > that, it would be via some device-specific mechanism. > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least > > > > > the communication with the controller is broken. But if we set it to > > > > > the boot speed, resuming the GPU just works. So my assumption was, > > > > > that _something_ (might it be the controller or the pci subsystem) > > > > > tries to force to operate on an invalid link speed and because the > > > > > bridge controller is actually powered down as well (as all children > > > > > are in D3cold) I could imagine that something in the pci subsystem > > > > > actually restores the state which lets the controller fail to > > > > > establish communication again? > > > > > > > > 1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s > > > > without OS/driver intervention. > > > > > > > > 2) Some mechanism reduces link speed to 2.5 GT/s. This probably > > > > requires driver intervention or at least some ACPI method. > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's > > > all done on the GPU. We just upload a script and some firmware on to > > > the GPU. The script runs then on the PMU inside the GPU and this > > > script also causes changing the PCIe link settings. But from a Nouveau > > > point of view we don't care about the link before or after that script > > > was invoked. Also there is no ACPI method involved. > > > > > > But if there is something we should notify pci core about, maybe > > > that's something we have to do then? > > > > I don't think there's anything the PCI core could do with that > > information anyway. The PCI core doesn't care at all about the link > > speed, and it really can't influence it directly. > > > > > > 3) Suspend puts GPU into D3cold (powered off). > > > > > > > > 4) Resume restores GPU to D0, and the Port and GPU hardware again > > > > negotiate 8.0 GT/s without OS/driver intervention, just like at > > > > initial boot. > > > > > > No, that negotiation fails apparently as any attempt to read anything > > > from the device just fails inside pci core. Or something goes wrong > > > when resuming the bridge controller. > > > > I'm surprised the negotiation would fail even after a power cycle of > > the device. But if you can avoid the issue by running another script > > on the PMU before suspend, that's probably what you'll have to do. > > > > there is none as far as we know. Or at least nothing inside the vbios. > We still have to get signed PMU firmware images from Nvidia for full > support, but this still would be a hacky issue as we would depend on > those then (and without having those in redistributable form, there > isn't much we can do about except fixing it on the kernel side). > > > > > 5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at > > > > 8.0 GT/s. > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau? > > > > I was thinking Nouveau because the PCI core doesn't care about the > > link speed. > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess > > > > why the GPU wouldn't work after resume. From a PCIe point of view, > > > > the link is supposed to work and the device should be reachable > > > > independent of the link speed. But maybe there's some weird > > > > dependency between the GPU and the driver here. > > > > > > but the device isn't reachable at all, not even from the pci > > > subsystem. All reads fail/return a default error value (0xffffffff). > > > > Are these PCI config reads that return 0xffffffff? Or MMIO reads? > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about > > whether a PCI error occurred. > > > > that's kind of problematic as it might just lock up my machine... but > let me try that. > > > > > It sounds like things work if you return to 8.0 GT/s before suspend, > > > > things work. That would make sense to me because then the driver's > > > > idea of the link state after resume would match the actual state. > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't > > > care one bit about the current link speed, so I assume you mean > > > something inside the pci core code? > > > > > > > But I don't see a way to deal with this in the PCI core. The PCI core > > > > does save and restore most of the architected config space around > > > > suspend/resume, but since this appears to be a device-specific thing, > > > > the PCI core would have no idea how to save/restore it. > > > > > > if we assume that the negotiation on a device level works as intended, > > > then I would expect this to be a pci core issue, which might actually > > > be not fixable there. But if it's not, then we would have to put > > > something like that inside the runpm documentation to tell drivers > > > they have to do something about it. > > >lspci -vvxxxx > > > But again, for me it just sounds like the negotiation on the device > > > level fails or something inside pci core messes it up. > > > > To me it sounds like the PMU script messed something up, and the PCI > > core has no way to know what that was or how to fix it. > > > > sure, I am mainly wondering why it doesn't work after we power cycled > the GPU and the host bridge controller, because no matter what the > state was before, we have to reprobe instead of relying on a known > state, no? > > > > > > > > Signed-off-by: Karol Herbst <kherbst at redhat.com> > > > > > > > Reviewed-by: Lyude Paul <lyude at redhat.com> > > > > > > > --- > > > > > > > drm/nouveau/include/nvkm/subdev/pci.h | 5 +++-- > > > > > > > drm/nouveau/nvkm/subdev/pci/base.c | 9 +++++++-- > > > > > > > drm/nouveau/nvkm/subdev/pci/pcie.c | 24 ++++++++++++++++++++---- > > > > > > > drm/nouveau/nvkm/subdev/pci/priv.h | 2 ++ > > > > > > > 4 files changed, 32 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > index 1fdf3098..b23793a2 100644 > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci { > > > > > > > } agp; > > > > > > > > > > > > > > struct { > > > > > > > - enum nvkm_pcie_speed speed; > > > > > > > - u8 width; > > > > > > > + enum nvkm_pcie_speed cur_speed; > > > > > > > + enum nvkm_pcie_speed def_speed; > > > > > > > + u8 cur_width; > > > > > > > } pcie; > > > > > > > > > > > > > > bool msi; > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > index ee2431a7..d9fb5a83 100644 > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend) > > > > > > > > > > > > > > if (pci->agp.bridge) > > > > > > > nvkm_agp_fini(pci); > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > + nvkm_pcie_fini(pci); > > > > > > > > > > > > > > return 0; > > > > > > > } > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev) > > > > > > > struct nvkm_pci *pci = nvkm_pci(subdev); > > > > > > > if (pci->agp.bridge) > > > > > > > nvkm_agp_preinit(pci); > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > + nvkm_pcie_preinit(pci); > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device, > > > > > > > pci->func = func; > > > > > > > pci->pdev = device->func->pci(device)->pdev; > > > > > > > pci->irq = -1; > > > > > > > - pci->pcie.speed = -1; > > > > > > > - pci->pcie.width = -1; > > > > > > > + pci->pcie.cur_speed = -1; > > > > > > > + pci->pcie.def_speed = -1; > > > > > > > + pci->pcie.cur_width = -1; > > > > > > > > > > > > > > if (device->type == NVKM_DEVICE_AGP) > > > > > > > nvkm_agp_ctor(pci); > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > index 70ccbe0d..731dd30e 100644 > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci) > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > +int > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci) > > > > > > > +{ > > > > > > > + pci->pcie.def_speed = nvkm_pcie_get_speed(pci); > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > int > > > > > > > nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > { > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > if (pci->func->pcie.init) > > > > > > > pci->func->pcie.init(pci); > > > > > > > > > > > > > > - if (pci->pcie.speed != -1) > > > > > > > - nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width); > > > > > > > + if (pci->pcie.cur_speed != -1) > > > > > > > + nvkm_pcie_set_link(pci, pci->pcie.cur_speed, > > > > > > > + pci->pcie.cur_width); > > > > > > > > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > +int > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci) > > > > > > > +{ > > > > > > > + if (!IS_ERR_VALUE(pci->pcie.def_speed)) > > > > > > > + return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16); > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > int > > > > > > > nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > { > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > speed = max_speed; > > > > > > > } > > > > > > > > > > > > > > - pci->pcie.speed = speed; > > > > > > > - pci->pcie.width = width; > > > > > > > + pci->pcie.cur_speed = speed; > > > > > > > + pci->pcie.cur_width = width; > > > > > > > > > > > > > > if (speed == cur_speed) { > > > > > > > nvkm_debug(subdev, "requested matches current speed\n"); > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > index a0d4c007..e7744671 100644 > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *); > > > > > > > int gk104_pcie_version_supported(struct nvkm_pci *); > > > > > > > > > > > > > > int nvkm_pcie_oneinit(struct nvkm_pci *); > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *); > > > > > > > int nvkm_pcie_init(struct nvkm_pci *); > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *); > > > > > > > #endif > > > > > > > -- > > > > > > > 2.21.0 > > > > > > >
Karol Herbst
2019-May-21 17:48 UTC
[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
doing the same on the bridge controller with my workarounds applied: please note some differences: LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded) SltSta: PresDet+ vs PresDet- LnkSta2: Equalization stuff Virtual channel: NegoPending- vs NegoPending+ both times I executed lspci while the GPU was still suspended. 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff [size=4K] Memory behind bridge: ec000000-ed0fffff [size=17M] Prefetchable memory behind bridge: 00000000c0000000-00000000d1ffffff [size=288M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Dell Device 07be Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=02 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed19000 Capabilities: [d94 v1] Secondary PCI Express <?> Kernel driver in use: pcieport 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02 b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00 c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00 d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00 On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst at redhat.com> wrote:> > was able to get the lspci prints via ssh. Machine rebooted > automatically each time though. > > relevant dmesg: > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff > > (last one is a 64 bit mmio read to get the on GPU timer value) > > # lspci -vvxxx -s 0:01.00 > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal > decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 16 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: 0000e000-0000efff [size=4K] > Memory behind bridge: ec000000-ed0fffff [size=17M] > Prefetchable memory behind bridge: > 00000000c0000000-00000000d1ffffff [size=288M] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort+ <SERR- <PERR- > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [88] Subsystem: Dell Device 07be > Capabilities: [80] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- > Address: 00000000 Data: 0000 > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0 > ExtTag- RBE+ > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 256 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- > AuxPwr- TransPend- > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, > Exit Latency L0s <256ns, L1 <8us > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok) > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- > HotPlug- Surprise- > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- > HPIrq- LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, > Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- > PresDet- Interlock- > Changed: MRL- PresDet+ LinkState- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- > PMEIntEna- CRSVisible- > RootCap: CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Not Supported, > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- > AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+ > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, > LTR+, OBFF Via WAKE# ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, > LinkEqualizationRequest- > Capabilities: [100 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > Status: NegoPending+ InProgress- > Capabilities: [140 v1] Root Complex Link > Desc: PortNumber=02 ComponentID=01 EltType=Config > Link0: Desc: TargetPort=00 TargetComponent=01 > AssocRCRB- LinkType=MemMapped LinkValid+ > Addr: 00000000fed19000 > Capabilities: [d94 v1] Secondary PCI Express <?> > Kernel driver in use: pcieport > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00 > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20 > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07 > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02 > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00 > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00 > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00 > > lspci -vvxxx -s 1:00.00 > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 > Mobile] (rev ff) (prog-if ff) > !!! Unknown header type 7f > Kernel driver in use: nouveau > Kernel modules: nouveau > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com> wrote: > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote: > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote: > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote: > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107. > > > > > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing > > > > > > > > that outside of drivers? > > > > > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's > > > > > > > feasible. > > > > > > > > > > > > > > It looks like this driver changes the PCIe link speed using some > > > > > > > device-specific mechanism. When we suspend, we put the device in > > > > > > > D3cold, so it loses all its state. When we resume, the link probably > > > > > > > comes up at the boot speed because nothing did that device-specific > > > > > > > magic to change it, so you probably end up with the link being slow > > > > > > > but the driver thinking it's configured to be fast, and maybe that > > > > > > > combination doesn't work. > > > > > > > > > > > > > > If it requires something device-specific to change that link speed, I > > > > > > > don't know how to put that in the PCI core. But maybe I'm missing > > > > > > > something? > > > > > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2): > > > > > > > > > > > > > > Initialization – During hardware initialization, each PCI Express > > > > > > > Link is set up following a negotiation of Lane widths and frequency > > > > > > > of operation by the two agents at each end of the Link. No firmware > > > > > > > or operating system software is involved. > > > > > > > > > > > > > > I have been assuming that this means device-specific link speed > > > > > > > management is out of spec, but it seems pretty common that devices > > > > > > > don't come up by themselves at the fastest possible link speed. So > > > > > > > maybe the spec just intends that devices can operate at *some* valid > > > > > > > speed. > > > > > > > > > > > > I would expect that devices kind of have to figure out what they can > > > > > > operate on and the operating system kind of just checks what the > > > > > > current state is and doesn't try to "restore" the old state or > > > > > > something? > > > > > > > > > > The devices at each end of the link negotiate the width and speed of > > > > > the link. This is done directly by the hardware without any help from > > > > > the OS. > > > > > > > > > > The OS can read the current link state (Current Link Speed and > > > > > Negotiated Link Width, both in the Link Status register). The OS has > > > > > very little control over that state. It can't directly restore the > > > > > state because the hardware has to negotiate a width & speed that > > > > > result in reliable operation. > > > > > > > > > > > We don't do anything in the driver after the device was suspended. And > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff > > > > > > essentially. I have no idea how much of this is part of the actual pci > > > > > > standard and how much is driver specific. But the driver also wants to > > > > > > have some control over the link speed as it's tight to performance > > > > > > states on GPU. > > > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to > > > > > influence the link width or speed. If the GPU driver needs to do > > > > > that, it would be via some device-specific mechanism. > > > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least > > > > > > the communication with the controller is broken. But if we set it to > > > > > > the boot speed, resuming the GPU just works. So my assumption was, > > > > > > that _something_ (might it be the controller or the pci subsystem) > > > > > > tries to force to operate on an invalid link speed and because the > > > > > > bridge controller is actually powered down as well (as all children > > > > > > are in D3cold) I could imagine that something in the pci subsystem > > > > > > actually restores the state which lets the controller fail to > > > > > > establish communication again? > > > > > > > > > > 1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s > > > > > without OS/driver intervention. > > > > > > > > > > 2) Some mechanism reduces link speed to 2.5 GT/s. This probably > > > > > requires driver intervention or at least some ACPI method. > > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's > > > > all done on the GPU. We just upload a script and some firmware on to > > > > the GPU. The script runs then on the PMU inside the GPU and this > > > > script also causes changing the PCIe link settings. But from a Nouveau > > > > point of view we don't care about the link before or after that script > > > > was invoked. Also there is no ACPI method involved. > > > > > > > > But if there is something we should notify pci core about, maybe > > > > that's something we have to do then? > > > > > > I don't think there's anything the PCI core could do with that > > > information anyway. The PCI core doesn't care at all about the link > > > speed, and it really can't influence it directly. > > > > > > > > 3) Suspend puts GPU into D3cold (powered off). > > > > > > > > > > 4) Resume restores GPU to D0, and the Port and GPU hardware again > > > > > negotiate 8.0 GT/s without OS/driver intervention, just like at > > > > > initial boot. > > > > > > > > No, that negotiation fails apparently as any attempt to read anything > > > > from the device just fails inside pci core. Or something goes wrong > > > > when resuming the bridge controller. > > > > > > I'm surprised the negotiation would fail even after a power cycle of > > > the device. But if you can avoid the issue by running another script > > > on the PMU before suspend, that's probably what you'll have to do. > > > > > > > there is none as far as we know. Or at least nothing inside the vbios. > > We still have to get signed PMU firmware images from Nvidia for full > > support, but this still would be a hacky issue as we would depend on > > those then (and without having those in redistributable form, there > > isn't much we can do about except fixing it on the kernel side). > > > > > > > 5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at > > > > > 8.0 GT/s. > > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau? > > > > > > I was thinking Nouveau because the PCI core doesn't care about the > > > link speed. > > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess > > > > > why the GPU wouldn't work after resume. From a PCIe point of view, > > > > > the link is supposed to work and the device should be reachable > > > > > independent of the link speed. But maybe there's some weird > > > > > dependency between the GPU and the driver here. > > > > > > > > but the device isn't reachable at all, not even from the pci > > > > subsystem. All reads fail/return a default error value (0xffffffff). > > > > > > Are these PCI config reads that return 0xffffffff? Or MMIO reads? > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about > > > whether a PCI error occurred. > > > > > > > that's kind of problematic as it might just lock up my machine... but > > let me try that. > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend, > > > > > things work. That would make sense to me because then the driver's > > > > > idea of the link state after resume would match the actual state. > > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't > > > > care one bit about the current link speed, so I assume you mean > > > > something inside the pci core code? > > > > > > > > > But I don't see a way to deal with this in the PCI core. The PCI core > > > > > does save and restore most of the architected config space around > > > > > suspend/resume, but since this appears to be a device-specific thing, > > > > > the PCI core would have no idea how to save/restore it. > > > > > > > > if we assume that the negotiation on a device level works as intended, > > > > then I would expect this to be a pci core issue, which might actually > > > > be not fixable there. But if it's not, then we would have to put > > > > something like that inside the runpm documentation to tell drivers > > > > they have to do something about it. > > > >lspci -vvxxxx > > > > But again, for me it just sounds like the negotiation on the device > > > > level fails or something inside pci core messes it up. > > > > > > To me it sounds like the PMU script messed something up, and the PCI > > > core has no way to know what that was or how to fix it. > > > > > > > sure, I am mainly wondering why it doesn't work after we power cycled > > the GPU and the host bridge controller, because no matter what the > > state was before, we have to reprobe instead of relying on a known > > state, no? > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst at redhat.com> > > > > > > > > Reviewed-by: Lyude Paul <lyude at redhat.com> > > > > > > > > --- > > > > > > > > drm/nouveau/include/nvkm/subdev/pci.h | 5 +++-- > > > > > > > > drm/nouveau/nvkm/subdev/pci/base.c | 9 +++++++-- > > > > > > > > drm/nouveau/nvkm/subdev/pci/pcie.c | 24 ++++++++++++++++++++---- > > > > > > > > drm/nouveau/nvkm/subdev/pci/priv.h | 2 ++ > > > > > > > > 4 files changed, 32 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > index 1fdf3098..b23793a2 100644 > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci { > > > > > > > > } agp; > > > > > > > > > > > > > > > > struct { > > > > > > > > - enum nvkm_pcie_speed speed; > > > > > > > > - u8 width; > > > > > > > > + enum nvkm_pcie_speed cur_speed; > > > > > > > > + enum nvkm_pcie_speed def_speed; > > > > > > > > + u8 cur_width; > > > > > > > > } pcie; > > > > > > > > > > > > > > > > bool msi; > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > index ee2431a7..d9fb5a83 100644 > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend) > > > > > > > > > > > > > > > > if (pci->agp.bridge) > > > > > > > > nvkm_agp_fini(pci); > > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > > + nvkm_pcie_fini(pci); > > > > > > > > > > > > > > > > return 0; > > > > > > > > } > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev) > > > > > > > > struct nvkm_pci *pci = nvkm_pci(subdev); > > > > > > > > if (pci->agp.bridge) > > > > > > > > nvkm_agp_preinit(pci); > > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > > + nvkm_pcie_preinit(pci); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device, > > > > > > > > pci->func = func; > > > > > > > > pci->pdev = device->func->pci(device)->pdev; > > > > > > > > pci->irq = -1; > > > > > > > > - pci->pcie.speed = -1; > > > > > > > > - pci->pcie.width = -1; > > > > > > > > + pci->pcie.cur_speed = -1; > > > > > > > > + pci->pcie.def_speed = -1; > > > > > > > > + pci->pcie.cur_width = -1; > > > > > > > > > > > > > > > > if (device->type == NVKM_DEVICE_AGP) > > > > > > > > nvkm_agp_ctor(pci); > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > index 70ccbe0d..731dd30e 100644 > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci) > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > +int > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci) > > > > > > > > +{ > > > > > > > > + pci->pcie.def_speed = nvkm_pcie_get_speed(pci); > > > > > > > > + return 0; > > > > > > > > +} > > > > > > > > + > > > > > > > > int > > > > > > > > nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > > { > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > > if (pci->func->pcie.init) > > > > > > > > pci->func->pcie.init(pci); > > > > > > > > > > > > > > > > - if (pci->pcie.speed != -1) > > > > > > > > - nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width); > > > > > > > > + if (pci->pcie.cur_speed != -1) > > > > > > > > + nvkm_pcie_set_link(pci, pci->pcie.cur_speed, > > > > > > > > + pci->pcie.cur_width); > > > > > > > > > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > +int > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci) > > > > > > > > +{ > > > > > > > > + if (!IS_ERR_VALUE(pci->pcie.def_speed)) > > > > > > > > + return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16); > > > > > > > > + return 0; > > > > > > > > +} > > > > > > > > + > > > > > > > > int > > > > > > > > nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > > { > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > > speed = max_speed; > > > > > > > > } > > > > > > > > > > > > > > > > - pci->pcie.speed = speed; > > > > > > > > - pci->pcie.width = width; > > > > > > > > + pci->pcie.cur_speed = speed; > > > > > > > > + pci->pcie.cur_width = width; > > > > > > > > > > > > > > > > if (speed == cur_speed) { > > > > > > > > nvkm_debug(subdev, "requested matches current speed\n"); > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > index a0d4c007..e7744671 100644 > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *); > > > > > > > > int gk104_pcie_version_supported(struct nvkm_pci *); > > > > > > > > > > > > > > > > int nvkm_pcie_oneinit(struct nvkm_pci *); > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *); > > > > > > > > int nvkm_pcie_init(struct nvkm_pci *); > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *); > > > > > > > > #endif > > > > > > > > -- > > > > > > > > 2.21.0 > > > > > > > >
Karol Herbst
2019-Jun-03 13:18 UTC
[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
@bjorn: any further ideas? Otherwise I'd like to just go ahead and fix this issue inside Nouveau and leave it there until we have a better understanding or non Nouveau cases of this issue. On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst at redhat.com> wrote:> > doing the same on the bridge controller with my workarounds applied: > > please note some differences: > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded) > SltSta: PresDet+ vs PresDet- > LnkSta2: Equalization stuff > Virtual channel: NegoPending- vs NegoPending+ > > both times I executed lspci while the GPU was still suspended. > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal > decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 16 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: 0000e000-0000efff [size=4K] > Memory behind bridge: ec000000-ed0fffff [size=17M] > Prefetchable memory behind bridge: > 00000000c0000000-00000000d1ffffff [size=288M] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort+ <SERR- <PERR- > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [88] Subsystem: Dell Device 07be > Capabilities: [80] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- > Address: 00000000 Data: 0000 > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0 > ExtTag- RBE+ > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 256 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- > AuxPwr- TransPend- > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, > Exit Latency L0s <256ns, L1 <8us > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s (ok), Width x16 (ok) > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- > HotPlug- Surprise- > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- > CmdCplt- HPIrq- LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, > Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- > PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- > PMEIntEna- CRSVisible- > RootCap: CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Not Supported, > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- > AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+ > DevCtl2: Completion Timeout: 50us to 50ms, > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete+, EqualizationPhase1+ > EqualizationPhase2+, EqualizationPhase3+, > LinkEqualizationRequest- > Capabilities: [100 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > Status: NegoPending- InProgress- > Capabilities: [140 v1] Root Complex Link > Desc: PortNumber=02 ComponentID=01 EltType=Config > Link0: Desc: TargetPort=00 TargetComponent=01 > AssocRCRB- LinkType=MemMapped LinkValid+ > Addr: 00000000fed19000 > Capabilities: [d94 v1] Secondary PCI Express <?> > Kernel driver in use: pcieport > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00 > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20 > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07 > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02 > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00 > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00 > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00 > > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst at redhat.com> wrote: > > > > was able to get the lspci prints via ssh. Machine rebooted > > automatically each time though. > > > > relevant dmesg: > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3 > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff > > > > (last one is a 64 bit mmio read to get the on GPU timer value) > > > > # lspci -vvxxx -s 0:01.00 > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal > > decode]) > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > > ParErr- Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0 > > Interrupt: pin A routed to IRQ 16 > > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > > I/O behind bridge: 0000e000-0000efff [size=4K] > > Memory behind bridge: ec000000-ed0fffff [size=17M] > > Prefetchable memory behind bridge: > > 00000000c0000000-00000000d1ffffff [size=288M] > > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > > <TAbort- <MAbort+ <SERR- <PERR- > > BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- > > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > > Capabilities: [88] Subsystem: Dell Device 07be > > Capabilities: [80] Power Management version 3 > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > > PME(D0+,D1-,D2-,D3hot+,D3cold+) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- > > Address: 00000000 Data: 0000 > > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > > DevCap: MaxPayload 256 bytes, PhantFunc 0 > > ExtTag- RBE+ > > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > > MaxPayload 256 bytes, MaxReadReq 128 bytes > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- > > AuxPwr- TransPend- > > LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, > > Exit Latency L0s <256ns, L1 <8us > > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok) > > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ > > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- > > HotPlug- Surprise- > > Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ > > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- > > HPIrq- LinkChg- > > Control: AttnInd Unknown, PwrInd Unknown, > > Power- Interlock- > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- > > PresDet- Interlock- > > Changed: MRL- PresDet+ LinkState- > > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- > > PMEIntEna- CRSVisible- > > RootCap: CRSVisible- > > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > > DevCap2: Completion Timeout: Not Supported, > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- > > AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+ > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, > > LTR+, OBFF Via WAKE# ARIFwd- > > AtomicOpsCtl: ReqEn- EgressBlck- > > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > > Transmit Margin: Normal Operating Range, > > EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -6dB, > > EqualizationComplete-, EqualizationPhase1- > > EqualizationPhase2-, EqualizationPhase3-, > > LinkEqualizationRequest- > > Capabilities: [100 v1] Virtual Channel > > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > > Arb: Fixed- WRR32- WRR64- WRR128- > > Ctrl: ArbSelect=Fixed > > Status: InProgress- > > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > > Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- > > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > > Status: NegoPending+ InProgress- > > Capabilities: [140 v1] Root Complex Link > > Desc: PortNumber=02 ComponentID=01 EltType=Config > > Link0: Desc: TargetPort=00 TargetComponent=01 > > AssocRCRB- LinkType=MemMapped LinkValid+ > > Addr: 00000000fed19000 > > Capabilities: [d94 v1] Secondary PCI Express <?> > > Kernel driver in use: pcieport > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00 > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20 > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00 > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00 > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07 > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02 > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00 > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00 > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00 > > > > lspci -vvxxx -s 1:00.00 > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 > > Mobile] (rev ff) (prog-if ff) > > !!! Unknown header type 7f > > Kernel driver in use: nouveau > > Kernel modules: nouveau > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com> wrote: > > > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote: > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote: > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas at kernel.org> wrote: > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote: > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107. > > > > > > > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing > > > > > > > > > that outside of drivers? > > > > > > > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's > > > > > > > > feasible. > > > > > > > > > > > > > > > > It looks like this driver changes the PCIe link speed using some > > > > > > > > device-specific mechanism. When we suspend, we put the device in > > > > > > > > D3cold, so it loses all its state. When we resume, the link probably > > > > > > > > comes up at the boot speed because nothing did that device-specific > > > > > > > > magic to change it, so you probably end up with the link being slow > > > > > > > > but the driver thinking it's configured to be fast, and maybe that > > > > > > > > combination doesn't work. > > > > > > > > > > > > > > > > If it requires something device-specific to change that link speed, I > > > > > > > > don't know how to put that in the PCI core. But maybe I'm missing > > > > > > > > something? > > > > > > > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2): > > > > > > > > > > > > > > > > Initialization – During hardware initialization, each PCI Express > > > > > > > > Link is set up following a negotiation of Lane widths and frequency > > > > > > > > of operation by the two agents at each end of the Link. No firmware > > > > > > > > or operating system software is involved. > > > > > > > > > > > > > > > > I have been assuming that this means device-specific link speed > > > > > > > > management is out of spec, but it seems pretty common that devices > > > > > > > > don't come up by themselves at the fastest possible link speed. So > > > > > > > > maybe the spec just intends that devices can operate at *some* valid > > > > > > > > speed. > > > > > > > > > > > > > > I would expect that devices kind of have to figure out what they can > > > > > > > operate on and the operating system kind of just checks what the > > > > > > > current state is and doesn't try to "restore" the old state or > > > > > > > something? > > > > > > > > > > > > The devices at each end of the link negotiate the width and speed of > > > > > > the link. This is done directly by the hardware without any help from > > > > > > the OS. > > > > > > > > > > > > The OS can read the current link state (Current Link Speed and > > > > > > Negotiated Link Width, both in the Link Status register). The OS has > > > > > > very little control over that state. It can't directly restore the > > > > > > state because the hardware has to negotiate a width & speed that > > > > > > result in reliable operation. > > > > > > > > > > > > > We don't do anything in the driver after the device was suspended. And > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff > > > > > > > essentially. I have no idea how much of this is part of the actual pci > > > > > > > standard and how much is driver specific. But the driver also wants to > > > > > > > have some control over the link speed as it's tight to performance > > > > > > > states on GPU. > > > > > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to > > > > > > influence the link width or speed. If the GPU driver needs to do > > > > > > that, it would be via some device-specific mechanism. > > > > > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least > > > > > > > the communication with the controller is broken. But if we set it to > > > > > > > the boot speed, resuming the GPU just works. So my assumption was, > > > > > > > that _something_ (might it be the controller or the pci subsystem) > > > > > > > tries to force to operate on an invalid link speed and because the > > > > > > > bridge controller is actually powered down as well (as all children > > > > > > > are in D3cold) I could imagine that something in the pci subsystem > > > > > > > actually restores the state which lets the controller fail to > > > > > > > establish communication again? > > > > > > > > > > > > 1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s > > > > > > without OS/driver intervention. > > > > > > > > > > > > 2) Some mechanism reduces link speed to 2.5 GT/s. This probably > > > > > > requires driver intervention or at least some ACPI method. > > > > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's > > > > > all done on the GPU. We just upload a script and some firmware on to > > > > > the GPU. The script runs then on the PMU inside the GPU and this > > > > > script also causes changing the PCIe link settings. But from a Nouveau > > > > > point of view we don't care about the link before or after that script > > > > > was invoked. Also there is no ACPI method involved. > > > > > > > > > > But if there is something we should notify pci core about, maybe > > > > > that's something we have to do then? > > > > > > > > I don't think there's anything the PCI core could do with that > > > > information anyway. The PCI core doesn't care at all about the link > > > > speed, and it really can't influence it directly. > > > > > > > > > > 3) Suspend puts GPU into D3cold (powered off). > > > > > > > > > > > > 4) Resume restores GPU to D0, and the Port and GPU hardware again > > > > > > negotiate 8.0 GT/s without OS/driver intervention, just like at > > > > > > initial boot. > > > > > > > > > > No, that negotiation fails apparently as any attempt to read anything > > > > > from the device just fails inside pci core. Or something goes wrong > > > > > when resuming the bridge controller. > > > > > > > > I'm surprised the negotiation would fail even after a power cycle of > > > > the device. But if you can avoid the issue by running another script > > > > on the PMU before suspend, that's probably what you'll have to do. > > > > > > > > > > there is none as far as we know. Or at least nothing inside the vbios. > > > We still have to get signed PMU firmware images from Nvidia for full > > > support, but this still would be a hacky issue as we would depend on > > > those then (and without having those in redistributable form, there > > > isn't much we can do about except fixing it on the kernel side). > > > > > > > > > 5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at > > > > > > 8.0 GT/s. > > > > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau? > > > > > > > > I was thinking Nouveau because the PCI core doesn't care about the > > > > link speed. > > > > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess > > > > > > why the GPU wouldn't work after resume. From a PCIe point of view, > > > > > > the link is supposed to work and the device should be reachable > > > > > > independent of the link speed. But maybe there's some weird > > > > > > dependency between the GPU and the driver here. > > > > > > > > > > but the device isn't reachable at all, not even from the pci > > > > > subsystem. All reads fail/return a default error value (0xffffffff). > > > > > > > > Are these PCI config reads that return 0xffffffff? Or MMIO reads? > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about > > > > whether a PCI error occurred. > > > > > > > > > > that's kind of problematic as it might just lock up my machine... but > > > let me try that. > > > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend, > > > > > > things work. That would make sense to me because then the driver's > > > > > > idea of the link state after resume would match the actual state. > > > > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't > > > > > care one bit about the current link speed, so I assume you mean > > > > > something inside the pci core code? > > > > > > > > > > > But I don't see a way to deal with this in the PCI core. The PCI core > > > > > > does save and restore most of the architected config space around > > > > > > suspend/resume, but since this appears to be a device-specific thing, > > > > > > the PCI core would have no idea how to save/restore it. > > > > > > > > > > if we assume that the negotiation on a device level works as intended, > > > > > then I would expect this to be a pci core issue, which might actually > > > > > be not fixable there. But if it's not, then we would have to put > > > > > something like that inside the runpm documentation to tell drivers > > > > > they have to do something about it. > > > > >lspci -vvxxxx > > > > > But again, for me it just sounds like the negotiation on the device > > > > > level fails or something inside pci core messes it up. > > > > > > > > To me it sounds like the PMU script messed something up, and the PCI > > > > core has no way to know what that was or how to fix it. > > > > > > > > > > sure, I am mainly wondering why it doesn't work after we power cycled > > > the GPU and the host bridge controller, because no matter what the > > > state was before, we have to reprobe instead of relying on a known > > > state, no? > > > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst at redhat.com> > > > > > > > > > Reviewed-by: Lyude Paul <lyude at redhat.com> > > > > > > > > > --- > > > > > > > > > drm/nouveau/include/nvkm/subdev/pci.h | 5 +++-- > > > > > > > > > drm/nouveau/nvkm/subdev/pci/base.c | 9 +++++++-- > > > > > > > > > drm/nouveau/nvkm/subdev/pci/pcie.c | 24 ++++++++++++++++++++---- > > > > > > > > > drm/nouveau/nvkm/subdev/pci/priv.h | 2 ++ > > > > > > > > > 4 files changed, 32 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > > index 1fdf3098..b23793a2 100644 > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci { > > > > > > > > > } agp; > > > > > > > > > > > > > > > > > > struct { > > > > > > > > > - enum nvkm_pcie_speed speed; > > > > > > > > > - u8 width; > > > > > > > > > + enum nvkm_pcie_speed cur_speed; > > > > > > > > > + enum nvkm_pcie_speed def_speed; > > > > > > > > > + u8 cur_width; > > > > > > > > > } pcie; > > > > > > > > > > > > > > > > > > bool msi; > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > > index ee2431a7..d9fb5a83 100644 > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend) > > > > > > > > > > > > > > > > > > if (pci->agp.bridge) > > > > > > > > > nvkm_agp_fini(pci); > > > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > > > + nvkm_pcie_fini(pci); > > > > > > > > > > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev) > > > > > > > > > struct nvkm_pci *pci = nvkm_pci(subdev); > > > > > > > > > if (pci->agp.bridge) > > > > > > > > > nvkm_agp_preinit(pci); > > > > > > > > > + else if (pci_is_pcie(pci->pdev)) > > > > > > > > > + nvkm_pcie_preinit(pci); > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device, > > > > > > > > > pci->func = func; > > > > > > > > > pci->pdev = device->func->pci(device)->pdev; > > > > > > > > > pci->irq = -1; > > > > > > > > > - pci->pcie.speed = -1; > > > > > > > > > - pci->pcie.width = -1; > > > > > > > > > + pci->pcie.cur_speed = -1; > > > > > > > > > + pci->pcie.def_speed = -1; > > > > > > > > > + pci->pcie.cur_width = -1; > > > > > > > > > > > > > > > > > > if (device->type == NVKM_DEVICE_AGP) > > > > > > > > > nvkm_agp_ctor(pci); > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > > index 70ccbe0d..731dd30e 100644 > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci) > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > > > > > > > > > > +int > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci) > > > > > > > > > +{ > > > > > > > > > + pci->pcie.def_speed = nvkm_pcie_get_speed(pci); > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > int > > > > > > > > > nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > > > { > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci) > > > > > > > > > if (pci->func->pcie.init) > > > > > > > > > pci->func->pcie.init(pci); > > > > > > > > > > > > > > > > > > - if (pci->pcie.speed != -1) > > > > > > > > > - nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width); > > > > > > > > > + if (pci->pcie.cur_speed != -1) > > > > > > > > > + nvkm_pcie_set_link(pci, pci->pcie.cur_speed, > > > > > > > > > + pci->pcie.cur_width); > > > > > > > > > > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > > > > > > > > > > +int > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci) > > > > > > > > > +{ > > > > > > > > > + if (!IS_ERR_VALUE(pci->pcie.def_speed)) > > > > > > > > > + return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16); > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > int > > > > > > > > > nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > > > { > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width) > > > > > > > > > speed = max_speed; > > > > > > > > > } > > > > > > > > > > > > > > > > > > - pci->pcie.speed = speed; > > > > > > > > > - pci->pcie.width = width; > > > > > > > > > + pci->pcie.cur_speed = speed; > > > > > > > > > + pci->pcie.cur_width = width; > > > > > > > > > > > > > > > > > > if (speed == cur_speed) { > > > > > > > > > nvkm_debug(subdev, "requested matches current speed\n"); > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > > index a0d4c007..e7744671 100644 > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *); > > > > > > > > > int gk104_pcie_version_supported(struct nvkm_pci *); > > > > > > > > > > > > > > > > > > int nvkm_pcie_oneinit(struct nvkm_pci *); > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *); > > > > > > > > > int nvkm_pcie_init(struct nvkm_pci *); > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *); > > > > > > > > > #endif > > > > > > > > > -- > > > > > > > > > 2.21.0 > > > > > > > > >
Reasonably Related Threads
- [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
- [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
- [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
- [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
- [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini