thr3ads.net - Nouveau - [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2019-May-21 17:35 UTC

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

was able to get the lspci prints via ssh. Machine rebooted
automatically each time though.

relevant dmesg:
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff

(last one is a 64 bit mmio read to get the on GPU timer value)

# lspci -vvxxx -s 0:01.00
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
       Latency: 0
       Interrupt: pin A routed to IRQ 16
       Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
       I/O behind bridge: 0000e000-0000efff [size=4K]
       Memory behind bridge: ec000000-ed0fffff [size=17M]
       Prefetchable memory behind bridge:
00000000c0000000-00000000d1ffffff [size=288M]
       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
       BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
               PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
       Capabilities: [88] Subsystem: Dell Device 07be
       Capabilities: [80] Power Management version 3
               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
               Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
       Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
               Address: 00000000  Data: 0000
       Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
               DevCap: MaxPayload 256 bytes, PhantFunc 0
                       ExtTag- RBE+
               DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                       MaxPayload 256 bytes, MaxReadReq 128 bytes
               DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
               LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
                       ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
               LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
                       TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
               SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                       Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
               SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
                       Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
               SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                       Changed: MRL- PresDet+ LinkState-
               RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
               RootCap: CRSVisible-
               RootSta: PME ReqID 0000, PMEStatus- PMEPending-
               DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                        AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
LTR+, OBFF Via WAKE# ARIFwd-
                        AtomicOpsCtl: ReqEn- EgressBlck-
               LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                        Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                        Compliance De-emphasis: -6dB
               LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                        EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
       Capabilities: [100 v1] Virtual Channel
               Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
               Arb:    Fixed- WRR32- WRR64- WRR128-
               Ctrl:   ArbSelect=Fixed
               Status: InProgress-
               VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                       Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                       Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                       Status: NegoPending+ InProgress-
       Capabilities: [140 v1] Root Complex Link
               Desc:   PortNumber=02 ComponentID=01 EltType=Config
               Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
                       Addr:   00000000fed19000
       Capabilities: [d94 v1] Secondary PCI Express <?>
       Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00

lspci -vvxxx -s 1:00.00
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
Mobile] (rev ff) (prog-if ff)
       !!! Unknown header type 7f
       Kernel driver in use: nouveau
       Kernel modules: nouveau
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com>
wrote:>
> On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at kernel.org>
wrote:
> >
> > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at
kernel.org> wrote:
> > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst
wrote:
> > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas
<helgaas at kernel.org> wrote:
> > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol
Herbst wrote:
> > > > > > > Apperantly things go south if we suspend the
device with a different PCIE
> > > > > > > link speed set than it got booted with. Fixes
runtime suspend on my gp107.
> > > > > > >
> > > > > > > This all looks like some bug inside the pci
subsystem and I would prefer a
> > > > > > > fix there instead of nouveau, but maybe there
is no real nice way of doing
> > > > > > > that outside of drivers?
> > > > > >
> > > > > > I agree it would be nice to fix this in the PCI
core if that's
> > > > > > feasible.
> > > > > >
> > > > > > It looks like this driver changes the PCIe link
speed using some
> > > > > > device-specific mechanism.  When we suspend, we
put the device in
> > > > > > D3cold, so it loses all its state.  When we
resume, the link probably
> > > > > > comes up at the boot speed because nothing did
that device-specific
> > > > > > magic to change it, so you probably end up with
the link being slow
> > > > > > but the driver thinking it's configured to be
fast, and maybe that
> > > > > > combination doesn't work.
> > > > > >
> > > > > > If it requires something device-specific to change
that link speed, I
> > > > > > don't know how to put that in the PCI core. 
But maybe I'm missing
> > > > > > something?
> > > > > >
> > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > >
> > > > > >   Initialization – During hardware initialization,
each PCI Express
> > > > > >   Link is set up following a negotiation of Lane
widths and frequency
> > > > > >   of operation by the two agents at each end of
the Link. No firmware
> > > > > >   or operating system software is involved.
> > > > > >
> > > > > > I have been assuming that this means
device-specific link speed
> > > > > > management is out of spec, but it seems pretty
common that devices
> > > > > > don't come up by themselves at the fastest
possible link speed.  So
> > > > > > maybe the spec just intends that devices can
operate at *some* valid
> > > > > > speed.
> > > > >
> > > > > I would expect that devices kind of have to figure out
what they can
> > > > > operate on and the operating system kind of just checks
what the
> > > > > current state is and doesn't try to
"restore" the old state or
> > > > > something?
> > > >
> > > > The devices at each end of the link negotiate the width and
speed of
> > > > the link.  This is done directly by the hardware without any
help from
> > > > the OS.
> > > >
> > > > The OS can read the current link state (Current Link Speed
and
> > > > Negotiated Link Width, both in the Link Status register). 
The OS has
> > > > very little control over that state.  It can't directly
restore the
> > > > state because the hardware has to negotiate a width &
speed that
> > > > result in reliable operation.
> > > >
> > > > > We don't do anything in the driver after the device
was suspended. And
> > > > > the 0x88000 is a mirror of the PCI config space, but we
also got some
> > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for
gen3 stuff
> > > > > essentially. I have no idea how much of this is part of
the actual pci
> > > > > standard and how much is driver specific. But the
driver also wants to
> > > > > have some control over the link speed as it's tight
to performance
> > > > > states on GPU.
> > > >
> > > > As far as I'm aware, there is no generic PCIe way for
the OS to
> > > > influence the link width or speed.  If the GPU driver needs
to do
> > > > that, it would be via some device-specific mechanism.
> > > >
> > > > > The big issue here is just, that the GPU boots with
8.0, some on-gpu
> > > > > init mechanism decreases it to 2.5. If we suspend, the
GPU or at least
> > > > > the communication with the controller is broken. But if
we set it to
> > > > > the boot speed, resuming the GPU just works. So my
assumption was,
> > > > > that _something_ (might it be the controller or the pci
subsystem)
> > > > > tries to force to operate on an invalid link speed and
because the
> > > > > bridge controller is actually powered down as well (as
all children
> > > > > are in D3cold) I could imagine that something in the
pci subsystem
> > > > > actually restores the state which lets the controller
fail to
> > > > > establish communication again?
> > > >
> > > >   1) At boot-time, the Port and the GPU hardware negotiate
8.0 GT/s
> > > >      without OS/driver intervention.
> > > >
> > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This
probably
> > > >      requires driver intervention or at least some ACPI
method.
> > >
> > > there is no driver intervention and Nouveau doesn't care at
all. It's
> > > all done on the GPU. We just upload a script and some firmware on
to
> > > the GPU. The script runs then on the PMU inside the GPU and this
> > > script also causes changing the PCIe link settings. But from a
Nouveau
> > > point of view we don't care about the link before or after
that script
> > > was invoked. Also there is no ACPI method involved.
> > >
> > > But if there is something we should notify pci core about, maybe
> > > that's something we have to do then?
> >
> > I don't think there's anything the PCI core could do with that
> > information anyway.  The PCI core doesn't care at all about the
link
> > speed, and it really can't influence it directly.
> >
> > > >   3) Suspend puts GPU into D3cold (powered off).
> > > >
> > > >   4) Resume restores GPU to D0, and the Port and GPU
hardware again
> > > >      negotiate 8.0 GT/s without OS/driver intervention, just
like at
> > > >      initial boot.
> > >
> > > No, that negotiation fails apparently as any attempt to read
anything
> > > from the device just fails inside pci core. Or something goes
wrong
> > > when resuming the bridge controller.
> >
> > I'm surprised the negotiation would fail even after a power cycle
of
> > the device.  But if you can avoid the issue by running another script
> > on the PMU before suspend, that's probably what you'll have to
do.
> >
>
> there is none as far as we know. Or at least nothing inside the vbios.
> We still have to get signed PMU firmware images from Nvidia for full
> support, but this still would be a hacky issue as we would depend on
> those then (and without having those in  redistributable form, there
> isn't much we can do about except fixing it on the kernel side).
>
> > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but
it's actually at
> > > >      8.0 GT/s.
> > >
> > > what is actually meant by "driver" here? The pci
subsystem or Nouveau?
> >
> > I was thinking Nouveau because the PCI core doesn't care about the
> > link speed.
> >
> > > > Without knowing more about the transition to 2.5 GT/s, I
can't guess
> > > > why the GPU wouldn't work after resume.  From a PCIe
point of view,
> > > > the link is supposed to work and the device should be
reachable
> > > > independent of the link speed.  But maybe there's some
weird
> > > > dependency between the GPU and the driver here.
> > >
> > > but the device isn't reachable at all, not even from the pci
> > > subsystem. All reads fail/return a default error value
(0xffffffff).
> >
> > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > "lspci -vvxxxx" of the bridge and the GPU might have a clue
about
> > whether a PCI error occurred.
> >
>
> that's kind of problematic as it might just lock up my machine... but
> let me try that.
>
> > > > It sounds like things work if you return to 8.0 GT/s before
suspend,
> > > > things work.  That would make sense to me because then the
driver's
> > > > idea of the link state after resume would match the actual
state.
> > >
> > > depends on what is meant by the driver here. Inside Nouveau we
don't
> > > care one bit about the current link speed, so I assume you mean
> > > something inside the pci core code?
> > >
> > > > But I don't see a way to deal with this in the PCI core.
The PCI core
> > > > does save and restore most of the architected config space
around
> > > > suspend/resume, but since this appears to be a
device-specific thing,
> > > > the PCI core would have no idea how to save/restore it.
> > >
> > > if we assume that the negotiation on a device level works as
intended,
> > > then I would expect this to be a pci core issue, which might
actually
> > > be not fixable there. But if it's not, then we would have to
put
> > > something like that inside the runpm documentation to tell
drivers
> > > they have to do something about it.
> > >lspci -vvxxxx
> > > But again, for me it just sounds like the negotiation on the
device
> > > level fails or something inside pci core messes it up.
> >
> > To me it sounds like the PMU script messed something up, and the PCI
> > core has no way to know what that was or how to fix it.
> >
>
> sure, I am mainly wondering why it doesn't work after we power cycled
> the GPU and the host bridge controller, because no matter what the
> state was before, we have to reprobe instead of relying on a known
> state, no?
>
> > > > > > > Signed-off-by: Karol Herbst <kherbst at
redhat.com>
> > > > > > > Reviewed-by: Lyude Paul <lyude at
redhat.com>
> > > > > > > ---
> > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5
+++--
> > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9
+++++++--
> > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24
++++++++++++++++++++----
> > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2
++
> > > > > > >  4 files changed, 32 insertions(+), 8
deletions(-)
> > > > > > >
> > > > > > > diff --git
a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > >       } agp;
> > > > > > >
> > > > > > >       struct {
> > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > -             u8 width;
> > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > +             u8 cur_width;
> > > > > > >       } pcie;
> > > > > > >
> > > > > > >       bool msi;
> > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct
nvkm_subdev *subdev, bool suspend)
> > > > > > >
> > > > > > >       if (pci->agp.bridge)
> > > > > > >               nvkm_agp_fini(pci);
> > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > >
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct
nvkm_subdev *subdev)
> > > > > > >       struct nvkm_pci *pci =
nvkm_pci(subdev);
> > > > > > >       if (pci->agp.bridge)
> > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const
struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > >       pci->func = func;
> > > > > > >       pci->pdev =
device->func->pci(device)->pdev;
> > > > > > >       pci->irq = -1;
> > > > > > > -     pci->pcie.speed = -1;
> > > > > > > -     pci->pcie.width = -1;
> > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > >
> > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct
nvkm_pci *pci)
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +int
> > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > +{
> > > > > > > +     pci->pcie.def_speed =
nvkm_pcie_get_speed(pci);
> > > > > > > +     return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  int
> > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > >  {
> > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct
nvkm_pci *pci)
> > > > > > >       if (pci->func->pcie.init)
> > > > > > >              
pci->func->pcie.init(pci);
> > > > > > >
> > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > -             nvkm_pcie_set_link(pci,
pci->pcie.speed, pci->pcie.width);
> > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > +             nvkm_pcie_set_link(pci,
pci->pcie.cur_speed,
> > > > > > > +                               
pci->pcie.cur_width);
> > > > > > >
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +int
> > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > +{
> > > > > > > +     if
(!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > +             return nvkm_pcie_set_link(pci,
pci->pcie.def_speed, 16);
> > > > > > > +     return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  int
> > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci,
enum nvkm_pcie_speed speed, u8 width)
> > > > > > >  {
> > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct
nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > >               speed = max_speed;
> > > > > > >       }
> > > > > > >
> > > > > > > -     pci->pcie.speed = speed;
> > > > > > > -     pci->pcie.width = width;
> > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > +     pci->pcie.cur_width = width;
> > > > > > >
> > > > > > >       if (speed == cur_speed) {
> > > > > > >               nvkm_debug(subdev,
"requested matches current speed\n");
> > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > index a0d4c007..e7744671 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed
gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > >  int gk104_pcie_version_supported(struct
nvkm_pci *);
> > > > > > >
> > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.21.0
> > > > > > >

Karol Herbst

2019-May-21 17:48 UTC

head link

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

doing the same on the bridge controller with my workarounds applied:

please note some differences:
LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
SltSta: PresDet+ vs PresDet-
LnkSta2: Equalization stuff
Virtual channel: NegoPending- vs NegoPending+

both times I executed lspci while the GPU was still suspended.

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: ec000000-ed0fffff [size=17M]
        Prefetchable memory behind bridge:
00000000c0000000-00000000d1ffffff [size=288M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [88] Subsystem: Dell Device 07be
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+,
LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [140 v1] Root Complex Link
                Desc:   PortNumber=02 ComponentID=01 EltType=Config
                Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
                        Addr:   00000000fed19000
        Capabilities: [d94 v1] Secondary PCI Express <?>
        Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00

On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst at redhat.com>
wrote:>
> was able to get the lspci prints via ssh. Machine rebooted
> automatically each time though.
>
> relevant dmesg:
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in
D3
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in
D3
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in
D3
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>
> (last one is a 64 bit mmio read to get the on GPU timer value)
>
> # lspci -vvxxx -s 0:01.00
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> decode])
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Interrupt: pin A routed to IRQ 16
>        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>        I/O behind bridge: 0000e000-0000efff [size=4K]
>        Memory behind bridge: ec000000-ed0fffff [size=17M]
>        Prefetchable memory behind bridge:
> 00000000c0000000-00000000d1ffffff [size=288M]
>        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
>        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset-
FastB2B-
>                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>        Capabilities: [88] Subsystem: Dell Device 07be
>        Capabilities: [80] Power Management version 3
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
>                Address: 00000000  Data: 0000
>        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>                DevCap: MaxPayload 256 bytes, PhantFunc 0
>                        ExtTag- RBE+
>                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                        MaxPayload 256 bytes, MaxReadReq 128 bytes
>                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> AuxPwr- TransPend-
>                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> Exit Latency L0s <256ns, L1 <8us
>                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
>                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
>                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug- Surprise-
>                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
>                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> HPIrq- LinkChg-
>                        Control: AttnInd Unknown, PwrInd Unknown,
> Power- Interlock-
>                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet- Interlock-
>                        Changed: MRL- PresDet+ LinkState-
>                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
>                RootCap: CRSVisible-
>                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                DevCap2: Completion Timeout: Not Supported,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
>                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> LTR+, OBFF Via WAKE# ARIFwd-
>                         AtomicOpsCtl: ReqEn- EgressBlck-
>                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance-
SpeedDis-
>                         Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                         Compliance De-emphasis: -6dB
>                LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>                         EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
>        Capabilities: [100 v1] Virtual Channel
>                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                Arb:    Fixed- WRR32- WRR64- WRR128-
>                Ctrl:   ArbSelect=Fixed
>                Status: InProgress-
>                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128-
WRR256-
>                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                        Status: NegoPending+ InProgress-
>        Capabilities: [140 v1] Root Complex Link
>                Desc:   PortNumber=02 ComponentID=01 EltType=Config
>                Link0:  Desc:   TargetPort=00 TargetComponent=01
> AssocRCRB- LinkType=MemMapped LinkValid+
>                        Addr:   00000000fed19000
>        Capabilities: [d94 v1] Secondary PCI Express <?>
>        Kernel driver in use: pcieport
> 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
>
> lspci -vvxxx -s 1:00.00
> 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> Mobile] (rev ff) (prog-if ff)
>        !!! Unknown header type 7f
>        Kernel driver in use: nouveau
>        Kernel modules: nouveau
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com>
wrote:
> >
> > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at
kernel.org> wrote:
> > >
> > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at
kernel.org> wrote:
> > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst
wrote:
> > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas
<helgaas at kernel.org> wrote:
> > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200,
Karol Herbst wrote:
> > > > > > > > Apperantly things go south if we suspend
the device with a different PCIE
> > > > > > > > link speed set than it got booted with.
Fixes runtime suspend on my gp107.
> > > > > > > >
> > > > > > > > This all looks like some bug inside the
pci subsystem and I would prefer a
> > > > > > > > fix there instead of nouveau, but maybe
there is no real nice way of doing
> > > > > > > > that outside of drivers?
> > > > > > >
> > > > > > > I agree it would be nice to fix this in the
PCI core if that's
> > > > > > > feasible.
> > > > > > >
> > > > > > > It looks like this driver changes the PCIe
link speed using some
> > > > > > > device-specific mechanism.  When we suspend,
we put the device in
> > > > > > > D3cold, so it loses all its state.  When we
resume, the link probably
> > > > > > > comes up at the boot speed because nothing
did that device-specific
> > > > > > > magic to change it, so you probably end up
with the link being slow
> > > > > > > but the driver thinking it's configured
to be fast, and maybe that
> > > > > > > combination doesn't work.
> > > > > > >
> > > > > > > If it requires something device-specific to
change that link speed, I
> > > > > > > don't know how to put that in the PCI
core.  But maybe I'm missing
> > > > > > > something?
> > > > > > >
> > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > >
> > > > > > >   Initialization – During hardware
initialization, each PCI Express
> > > > > > >   Link is set up following a negotiation of
Lane widths and frequency
> > > > > > >   of operation by the two agents at each end
of the Link. No firmware
> > > > > > >   or operating system software is involved.
> > > > > > >
> > > > > > > I have been assuming that this means
device-specific link speed
> > > > > > > management is out of spec, but it seems
pretty common that devices
> > > > > > > don't come up by themselves at the
fastest possible link speed.  So
> > > > > > > maybe the spec just intends that devices can
operate at *some* valid
> > > > > > > speed.
> > > > > >
> > > > > > I would expect that devices kind of have to figure
out what they can
> > > > > > operate on and the operating system kind of just
checks what the
> > > > > > current state is and doesn't try to
"restore" the old state or
> > > > > > something?
> > > > >
> > > > > The devices at each end of the link negotiate the width
and speed of
> > > > > the link.  This is done directly by the hardware
without any help from
> > > > > the OS.
> > > > >
> > > > > The OS can read the current link state (Current Link
Speed and
> > > > > Negotiated Link Width, both in the Link Status
register).  The OS has
> > > > > very little control over that state.  It can't
directly restore the
> > > > > state because the hardware has to negotiate a width
& speed that
> > > > > result in reliable operation.
> > > > >
> > > > > > We don't do anything in the driver after the
device was suspended. And
> > > > > > the 0x88000 is a mirror of the PCI config space,
but we also got some
> > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs
for gen3 stuff
> > > > > > essentially. I have no idea how much of this is
part of the actual pci
> > > > > > standard and how much is driver specific. But the
driver also wants to
> > > > > > have some control over the link speed as it's
tight to performance
> > > > > > states on GPU.
> > > > >
> > > > > As far as I'm aware, there is no generic PCIe way
for the OS to
> > > > > influence the link width or speed.  If the GPU driver
needs to do
> > > > > that, it would be via some device-specific mechanism.
> > > > >
> > > > > > The big issue here is just, that the GPU boots
with 8.0, some on-gpu
> > > > > > init mechanism decreases it to 2.5. If we suspend,
the GPU or at least
> > > > > > the communication with the controller is broken.
But if we set it to
> > > > > > the boot speed, resuming the GPU just works. So my
assumption was,
> > > > > > that _something_ (might it be the controller or
the pci subsystem)
> > > > > > tries to force to operate on an invalid link speed
and because the
> > > > > > bridge controller is actually powered down as well
(as all children
> > > > > > are in D3cold) I could imagine that something in
the pci subsystem
> > > > > > actually restores the state which lets the
controller fail to
> > > > > > establish communication again?
> > > > >
> > > > >   1) At boot-time, the Port and the GPU hardware
negotiate 8.0 GT/s
> > > > >      without OS/driver intervention.
> > > > >
> > > > >   2) Some mechanism reduces link speed to 2.5 GT/s. 
This probably
> > > > >      requires driver intervention or at least some ACPI
method.
> > > >
> > > > there is no driver intervention and Nouveau doesn't care
at all. It's
> > > > all done on the GPU. We just upload a script and some
firmware on to
> > > > the GPU. The script runs then on the PMU inside the GPU and
this
> > > > script also causes changing the PCIe link settings. But from
a Nouveau
> > > > point of view we don't care about the link before or
after that script
> > > > was invoked. Also there is no ACPI method involved.
> > > >
> > > > But if there is something we should notify pci core about,
maybe
> > > > that's something we have to do then?
> > >
> > > I don't think there's anything the PCI core could do with
that
> > > information anyway.  The PCI core doesn't care at all about
the link
> > > speed, and it really can't influence it directly.
> > >
> > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > >
> > > > >   4) Resume restores GPU to D0, and the Port and GPU
hardware again
> > > > >      negotiate 8.0 GT/s without OS/driver intervention,
just like at
> > > > >      initial boot.
> > > >
> > > > No, that negotiation fails apparently as any attempt to read
anything
> > > > from the device just fails inside pci core. Or something
goes wrong
> > > > when resuming the bridge controller.
> > >
> > > I'm surprised the negotiation would fail even after a power
cycle of
> > > the device.  But if you can avoid the issue by running another
script
> > > on the PMU before suspend, that's probably what you'll
have to do.
> > >
> >
> > there is none as far as we know. Or at least nothing inside the vbios.
> > We still have to get signed PMU firmware images from Nvidia for full
> > support, but this still would be a hacky issue as we would depend on
> > those then (and without having those in  redistributable form, there
> > isn't much we can do about except fixing it on the kernel side).
> >
> > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but
it's actually at
> > > > >      8.0 GT/s.
> > > >
> > > > what is actually meant by "driver" here? The pci
subsystem or Nouveau?
> > >
> > > I was thinking Nouveau because the PCI core doesn't care
about the
> > > link speed.
> > >
> > > > > Without knowing more about the transition to 2.5 GT/s,
I can't guess
> > > > > why the GPU wouldn't work after resume.  From a
PCIe point of view,
> > > > > the link is supposed to work and the device should be
reachable
> > > > > independent of the link speed.  But maybe there's
some weird
> > > > > dependency between the GPU and the driver here.
> > > >
> > > > but the device isn't reachable at all, not even from the
pci
> > > > subsystem. All reads fail/return a default error value
(0xffffffff).
> > >
> > > Are these PCI config reads that return 0xffffffff?  Or MMIO
reads?
> > > "lspci -vvxxxx" of the bridge and the GPU might have a
clue about
> > > whether a PCI error occurred.
> > >
> >
> > that's kind of problematic as it might just lock up my machine...
but
> > let me try that.
> >
> > > > > It sounds like things work if you return to 8.0 GT/s
before suspend,
> > > > > things work.  That would make sense to me because then
the driver's
> > > > > idea of the link state after resume would match the
actual state.
> > > >
> > > > depends on what is meant by the driver here. Inside Nouveau
we don't
> > > > care one bit about the current link speed, so I assume you
mean
> > > > something inside the pci core code?
> > > >
> > > > > But I don't see a way to deal with this in the PCI
core.  The PCI core
> > > > > does save and restore most of the architected config
space around
> > > > > suspend/resume, but since this appears to be a
device-specific thing,
> > > > > the PCI core would have no idea how to save/restore it.
> > > >
> > > > if we assume that the negotiation on a device level works as
intended,
> > > > then I would expect this to be a pci core issue, which might
actually
> > > > be not fixable there. But if it's not, then we would
have to put
> > > > something like that inside the runpm documentation to tell
drivers
> > > > they have to do something about it.
> > > >lspci -vvxxxx
> > > > But again, for me it just sounds like the negotiation on the
device
> > > > level fails or something inside pci core messes it up.
> > >
> > > To me it sounds like the PMU script messed something up, and the
PCI
> > > core has no way to know what that was or how to fix it.
> > >
> >
> > sure, I am mainly wondering why it doesn't work after we power
cycled
> > the GPU and the host bridge controller, because no matter what the
> > state was before, we have to reprobe instead of relying on a known
> > state, no?
> >
> > > > > > > > Signed-off-by: Karol Herbst <kherbst
at redhat.com>
> > > > > > > > Reviewed-by: Lyude Paul <lyude at
redhat.com>
> > > > > > > > ---
> > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |
5 +++--
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |
9 +++++++--
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    |
24 ++++++++++++++++++++----
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |
2 ++
> > > > > > > >  4 files changed, 32 insertions(+), 8
deletions(-)
> > > > > > > >
> > > > > > > > diff --git
a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > ---
a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > +++
b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > >       } agp;
> > > > > > > >
> > > > > > > >       struct {
> > > > > > > > -             enum nvkm_pcie_speed
speed;
> > > > > > > > -             u8 width;
> > > > > > > > +             enum nvkm_pcie_speed
cur_speed;
> > > > > > > > +             enum nvkm_pcie_speed
def_speed;
> > > > > > > > +             u8 cur_width;
> > > > > > > >       } pcie;
> > > > > > > >
> > > > > > > >       bool msi;
> > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct
nvkm_subdev *subdev, bool suspend)
> > > > > > > >
> > > > > > > >       if (pci->agp.bridge)
> > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > +     else if
(pci_is_pcie(pci->pdev))
> > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > >
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > > @@ -100,6 +102,8 @@
nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > >       struct nvkm_pci *pci =
nvkm_pci(subdev);
> > > > > > > >       if (pci->agp.bridge)
> > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > +     else if
(pci_is_pcie(pci->pdev))
> > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const
struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > >       pci->func = func;
> > > > > > > >       pci->pdev =
device->func->pci(device)->pdev;
> > > > > > > >       pci->irq = -1;
> > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > >
> > > > > > > >       if (device->type ==
NVKM_DEVICE_AGP)
> > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > @@ -85,6 +85,13 @@
nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int
> > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > +{
> > > > > > > > +     pci->pcie.def_speed =
nvkm_pcie_get_speed(pci);
> > > > > > > > +     return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  int
> > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > >  {
> > > > > > > > @@ -105,12 +112,21 @@
nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > >       if (pci->func->pcie.init)
> > > > > > > >              
pci->func->pcie.init(pci);
> > > > > > > >
> > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > -             nvkm_pcie_set_link(pci,
pci->pcie.speed, pci->pcie.width);
> > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > +             nvkm_pcie_set_link(pci,
pci->pcie.cur_speed,
> > > > > > > > +                               
pci->pcie.cur_width);
> > > > > > > >
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int
> > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > +{
> > > > > > > > +     if
(!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > +             return
nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > +     return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  int
> > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci
*pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > >  {
> > > > > > > > @@ -146,8 +162,8 @@
nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > >               speed = max_speed;
> > > > > > > >       }
> > > > > > > >
> > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > -     pci->pcie.width = width;
> > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > >
> > > > > > > >       if (speed == cur_speed) {
> > > > > > > >               nvkm_debug(subdev,
"requested matches current speed\n");
> > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed
gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > >  int gk104_pcie_version_supported(struct
nvkm_pci *);
> > > > > > > >
> > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci
*);
> > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci
*);
> > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.21.0
> > > > > > > >

Karol Herbst

2019-Jun-03 13:18 UTC

head link

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

@bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
this issue inside Nouveau and leave it there until we have a better
understanding or non Nouveau cases of this issue.

On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst at redhat.com>
wrote:>
> doing the same on the bridge controller with my workarounds applied:
>
> please note some differences:
> LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> SltSta: PresDet+ vs PresDet-
> LnkSta2: Equalization stuff
> Virtual channel: NegoPending- vs NegoPending+
>
> both times I executed lspci while the GPU was still suspended.
>
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 16
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 0000e000-0000efff [size=4K]
>         Memory behind bridge: ec000000-ed0fffff [size=17M]
>         Prefetchable memory behind bridge:
> 00000000c0000000-00000000d1ffffff [size=288M]
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset-
FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [88] Subsystem: Dell Device 07be
>         Capabilities: [80] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
>                 Address: 00000000  Data: 0000
>         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0
>                         ExtTag- RBE+
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 256 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> AuxPwr- TransPend-
>                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> Exit Latency L0s <256ns, L1 <8us
>                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug- Surprise-
>                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> CmdCplt- HPIrq- LinkChg-
>                         Control: AttnInd Unknown, PwrInd Unknown,
> Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet+ Interlock-
>                         Changed: MRL- PresDet+ LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Not Supported,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
>                 DevCtl2: Completion Timeout: 50us to 50ms,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                          AtomicOpsCtl: ReqEn- EgressBlck-
>                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance-
SpeedDis-
>                          Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete+, EqualizationPhase1+
>                          EqualizationPhase2+, EqualizationPhase3+,
> LinkEqualizationRequest-
>         Capabilities: [100 v1] Virtual Channel
>                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                 Arb:    Fixed- WRR32- WRR64- WRR128-
>                 Ctrl:   ArbSelect=Fixed
>                 Status: InProgress-
>                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128-
WRR256-
>                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                         Status: NegoPending- InProgress-
>         Capabilities: [140 v1] Root Complex Link
>                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
>                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> AssocRCRB- LinkType=MemMapped LinkValid+
>                         Addr:   00000000fed19000
>         Capabilities: [d94 v1] Secondary PCI Express <?>
>         Kernel driver in use: pcieport
> 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
>
> On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst at redhat.com>
wrote:
> >
> > was able to get the lspci prints via ssh. Machine rebooted
> > automatically each time though.
> >
> > relevant dmesg:
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently
in D3
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently
in D3
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently
in D3
> > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> >
> > (last one is a 64 bit mmio read to get the on GPU timer value)
> >
> > # lspci -vvxxx -s 0:01.00
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > decode])
> >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >        Latency: 0
> >        Interrupt: pin A routed to IRQ 16
> >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> >        I/O behind bridge: 0000e000-0000efff [size=4K]
> >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> >        Prefetchable memory behind bridge:
> > 00000000c0000000-00000000d1ffffff [size=288M]
> >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast
>TAbort-
> > <TAbort- <MAbort+ <SERR- <PERR-
> >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset-
FastB2B-
> >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >        Capabilities: [88] Subsystem: Dell Device 07be
> >        Capabilities: [80] Power Management version 3
> >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                Address: 00000000  Data: 0000
> >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> >                        ExtTag- RBE+
> >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > AuxPwr- TransPend-
> >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > Exit Latency L0s <256ns, L1 <8us
> >                        ClockPM- Surprise- LLActRep- BwNot+
ASPMOptComp+
> >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+
ABWMgmt+
> >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > HotPlug- Surprise-
> >                        Slot #1, PowerLimit 75.000W; Interlock-
NoCompl+
> >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > HPIrq- LinkChg-
> >                        Control: AttnInd Unknown, PwrInd Unknown,
> > Power- Interlock-
> >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > PresDet- Interlock-
> >                        Changed: MRL- PresDet+ LinkState-
> >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > PMEIntEna- CRSVisible-
> >                RootCap: CRSVisible-
> >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                DevCap2: Completion Timeout: Not Supported,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >                         AtomicOpsCap: Routing- 32bit+ 64bit+
128bitCAS+
> >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > LTR+, OBFF Via WAKE# ARIFwd-
> >                         AtomicOpsCtl: ReqEn- EgressBlck-
> >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance-
SpeedDis-
> >                         Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                         Compliance De-emphasis: -6dB
> >                LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> >                         EqualizationPhase2-, EqualizationPhase3-,
> > LinkEqualizationRequest-
> >        Capabilities: [100 v1] Virtual Channel
> >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> >                Arb:    Fixed- WRR32- WRR64- WRR128-
> >                Ctrl:   ArbSelect=Fixed
> >                Status: InProgress-
> >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1
RejSnoopTrans-
> >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128-
WRR256-
> >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> >                        Status: NegoPending+ InProgress-
> >        Capabilities: [140 v1] Root Complex Link
> >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > AssocRCRB- LinkType=MemMapped LinkValid+
> >                        Addr:   00000000fed19000
> >        Capabilities: [d94 v1] Secondary PCI Express <?>
> >        Kernel driver in use: pcieport
> > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> >
> > lspci -vvxxx -s 1:00.00
> > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > Mobile] (rev ff) (prog-if ff)
> >        !!! Unknown header type 7f
> >        Kernel driver in use: nouveau
> >        Kernel modules: nouveau
> > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >
> > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at
redhat.com> wrote:
> > >
> > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at
kernel.org> wrote:
> > > >
> > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst
wrote:
> > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas
<helgaas at kernel.org> wrote:
> > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol
Herbst wrote:
> > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn
Helgaas <helgaas at kernel.org> wrote:
> > > > > > > > On Tue, May 07, 2019 at 10:12:45PM
+0200, Karol Herbst wrote:
> > > > > > > > > Apperantly things go south if we
suspend the device with a different PCIE
> > > > > > > > > link speed set than it got booted
with. Fixes runtime suspend on my gp107.
> > > > > > > > >
> > > > > > > > > This all looks like some bug inside
the pci subsystem and I would prefer a
> > > > > > > > > fix there instead of nouveau, but
maybe there is no real nice way of doing
> > > > > > > > > that outside of drivers?
> > > > > > > >
> > > > > > > > I agree it would be nice to fix this in
the PCI core if that's
> > > > > > > > feasible.
> > > > > > > >
> > > > > > > > It looks like this driver changes the
PCIe link speed using some
> > > > > > > > device-specific mechanism.  When we
suspend, we put the device in
> > > > > > > > D3cold, so it loses all its state.  When
we resume, the link probably
> > > > > > > > comes up at the boot speed because
nothing did that device-specific
> > > > > > > > magic to change it, so you probably end
up with the link being slow
> > > > > > > > but the driver thinking it's
configured to be fast, and maybe that
> > > > > > > > combination doesn't work.
> > > > > > > >
> > > > > > > > If it requires something device-specific
to change that link speed, I
> > > > > > > > don't know how to put that in the
PCI core.  But maybe I'm missing
> > > > > > > > something?
> > > > > > > >
> > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > >
> > > > > > > >   Initialization – During hardware
initialization, each PCI Express
> > > > > > > >   Link is set up following a negotiation
of Lane widths and frequency
> > > > > > > >   of operation by the two agents at each
end of the Link. No firmware
> > > > > > > >   or operating system software is
involved.
> > > > > > > >
> > > > > > > > I have been assuming that this means
device-specific link speed
> > > > > > > > management is out of spec, but it seems
pretty common that devices
> > > > > > > > don't come up by themselves at the
fastest possible link speed.  So
> > > > > > > > maybe the spec just intends that devices
can operate at *some* valid
> > > > > > > > speed.
> > > > > > >
> > > > > > > I would expect that devices kind of have to
figure out what they can
> > > > > > > operate on and the operating system kind of
just checks what the
> > > > > > > current state is and doesn't try to
"restore" the old state or
> > > > > > > something?
> > > > > >
> > > > > > The devices at each end of the link negotiate the
width and speed of
> > > > > > the link.  This is done directly by the hardware
without any help from
> > > > > > the OS.
> > > > > >
> > > > > > The OS can read the current link state (Current
Link Speed and
> > > > > > Negotiated Link Width, both in the Link Status
register).  The OS has
> > > > > > very little control over that state.  It can't
directly restore the
> > > > > > state because the hardware has to negotiate a
width & speed that
> > > > > > result in reliable operation.
> > > > > >
> > > > > > > We don't do anything in the driver after
the device was suspended. And
> > > > > > > the 0x88000 is a mirror of the PCI config
space, but we also got some
> > > > > > > PCIe stuff at 0x8c000 which is used by newer
GPUs for gen3 stuff
> > > > > > > essentially. I have no idea how much of this
is part of the actual pci
> > > > > > > standard and how much is driver specific. But
the driver also wants to
> > > > > > > have some control over the link speed as
it's tight to performance
> > > > > > > states on GPU.
> > > > > >
> > > > > > As far as I'm aware, there is no generic PCIe
way for the OS to
> > > > > > influence the link width or speed.  If the GPU
driver needs to do
> > > > > > that, it would be via some device-specific
mechanism.
> > > > > >
> > > > > > > The big issue here is just, that the GPU
boots with 8.0, some on-gpu
> > > > > > > init mechanism decreases it to 2.5. If we
suspend, the GPU or at least
> > > > > > > the communication with the controller is
broken. But if we set it to
> > > > > > > the boot speed, resuming the GPU just works.
So my assumption was,
> > > > > > > that _something_ (might it be the controller
or the pci subsystem)
> > > > > > > tries to force to operate on an invalid link
speed and because the
> > > > > > > bridge controller is actually powered down as
well (as all children
> > > > > > > are in D3cold) I could imagine that something
in the pci subsystem
> > > > > > > actually restores the state which lets the
controller fail to
> > > > > > > establish communication again?
> > > > > >
> > > > > >   1) At boot-time, the Port and the GPU hardware
negotiate 8.0 GT/s
> > > > > >      without OS/driver intervention.
> > > > > >
> > > > > >   2) Some mechanism reduces link speed to 2.5
GT/s.  This probably
> > > > > >      requires driver intervention or at least some
ACPI method.
> > > > >
> > > > > there is no driver intervention and Nouveau doesn't
care at all. It's
> > > > > all done on the GPU. We just upload a script and some
firmware on to
> > > > > the GPU. The script runs then on the PMU inside the GPU
and this
> > > > > script also causes changing the PCIe link settings. But
from a Nouveau
> > > > > point of view we don't care about the link before
or after that script
> > > > > was invoked. Also there is no ACPI method involved.
> > > > >
> > > > > But if there is something we should notify pci core
about, maybe
> > > > > that's something we have to do then?
> > > >
> > > > I don't think there's anything the PCI core could do
with that
> > > > information anyway.  The PCI core doesn't care at all
about the link
> > > > speed, and it really can't influence it directly.
> > > >
> > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > >
> > > > > >   4) Resume restores GPU to D0, and the Port and
GPU hardware again
> > > > > >      negotiate 8.0 GT/s without OS/driver
intervention, just like at
> > > > > >      initial boot.
> > > > >
> > > > > No, that negotiation fails apparently as any attempt to
read anything
> > > > > from the device just fails inside pci core. Or
something goes wrong
> > > > > when resuming the bridge controller.
> > > >
> > > > I'm surprised the negotiation would fail even after a
power cycle of
> > > > the device.  But if you can avoid the issue by running
another script
> > > > on the PMU before suspend, that's probably what
you'll have to do.
> > > >
> > >
> > > there is none as far as we know. Or at least nothing inside the
vbios.
> > > We still have to get signed PMU firmware images from Nvidia for
full
> > > support, but this still would be a hacky issue as we would depend
on
> > > those then (and without having those in  redistributable form,
there
> > > isn't much we can do about except fixing it on the kernel
side).
> > >
> > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s
but it's actually at
> > > > > >      8.0 GT/s.
> > > > >
> > > > > what is actually meant by "driver" here? The
pci subsystem or Nouveau?
> > > >
> > > > I was thinking Nouveau because the PCI core doesn't care
about the
> > > > link speed.
> > > >
> > > > > > Without knowing more about the transition to 2.5
GT/s, I can't guess
> > > > > > why the GPU wouldn't work after resume.  From
a PCIe point of view,
> > > > > > the link is supposed to work and the device should
be reachable
> > > > > > independent of the link speed.  But maybe
there's some weird
> > > > > > dependency between the GPU and the driver here.
> > > > >
> > > > > but the device isn't reachable at all, not even
from the pci
> > > > > subsystem. All reads fail/return a default error value
(0xffffffff).
> > > >
> > > > Are these PCI config reads that return 0xffffffff?  Or MMIO
reads?
> > > > "lspci -vvxxxx" of the bridge and the GPU might
have a clue about
> > > > whether a PCI error occurred.
> > > >
> > >
> > > that's kind of problematic as it might just lock up my
machine... but
> > > let me try that.
> > >
> > > > > > It sounds like things work if you return to 8.0
GT/s before suspend,
> > > > > > things work.  That would make sense to me because
then the driver's
> > > > > > idea of the link state after resume would match
the actual state.
> > > > >
> > > > > depends on what is meant by the driver here. Inside
Nouveau we don't
> > > > > care one bit about the current link speed, so I assume
you mean
> > > > > something inside the pci core code?
> > > > >
> > > > > > But I don't see a way to deal with this in the
PCI core.  The PCI core
> > > > > > does save and restore most of the architected
config space around
> > > > > > suspend/resume, but since this appears to be a
device-specific thing,
> > > > > > the PCI core would have no idea how to
save/restore it.
> > > > >
> > > > > if we assume that the negotiation on a device level
works as intended,
> > > > > then I would expect this to be a pci core issue, which
might actually
> > > > > be not fixable there. But if it's not, then we
would have to put
> > > > > something like that inside the runpm documentation to
tell drivers
> > > > > they have to do something about it.
> > > > >lspci -vvxxxx
> > > > > But again, for me it just sounds like the negotiation
on the device
> > > > > level fails or something inside pci core messes it up.
> > > >
> > > > To me it sounds like the PMU script messed something up, and
the PCI
> > > > core has no way to know what that was or how to fix it.
> > > >
> > >
> > > sure, I am mainly wondering why it doesn't work after we
power cycled
> > > the GPU and the host bridge controller, because no matter what
the
> > > state was before, we have to reprobe instead of relying on a
known
> > > state, no?
> > >
> > > > > > > > > Signed-off-by: Karol Herbst
<kherbst at redhat.com>
> > > > > > > > > Reviewed-by: Lyude Paul <lyude
at redhat.com>
> > > > > > > > > ---
> > > > > > > > > 
drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c
|  9 +++++++--
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c
| 24 ++++++++++++++++++++----
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h
|  2 ++
> > > > > > > > >  4 files changed, 32 insertions(+),
8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git
a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > ---
a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > +++
b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > >       } agp;
> > > > > > > > >
> > > > > > > > >       struct {
> > > > > > > > > -             enum nvkm_pcie_speed
speed;
> > > > > > > > > -             u8 width;
> > > > > > > > > +             enum nvkm_pcie_speed
cur_speed;
> > > > > > > > > +             enum nvkm_pcie_speed
def_speed;
> > > > > > > > > +             u8 cur_width;
> > > > > > > > >       } pcie;
> > > > > > > > >
> > > > > > > > >       bool msi;
> > > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > ---
a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > +++
b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > @@ -90,6 +90,8 @@
nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > >
> > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > +     else if
(pci_is_pcie(pci->pdev))
> > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > >
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > > @@ -100,6 +102,8 @@
nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > >       struct nvkm_pci *pci =
nvkm_pci(subdev);
> > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > >              
nvkm_agp_preinit(pci);
> > > > > > > > > +     else if
(pci_is_pcie(pci->pdev))
> > > > > > > > > +            
nvkm_pcie_preinit(pci);
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -193,8 +197,9 @@
nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > >       pci->func = func;
> > > > > > > > >       pci->pdev =
device->func->pci(device)->pdev;
> > > > > > > > >       pci->irq = -1;
> > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > >
> > > > > > > > >       if (device->type ==
NVKM_DEVICE_AGP)
> > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > ---
a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > +++
b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > @@ -85,6 +85,13 @@
nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +int
> > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci
*pci)
> > > > > > > > > +{
> > > > > > > > > +     pci->pcie.def_speed =
nvkm_pcie_get_speed(pci);
> > > > > > > > > +     return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  int
> > > > > > > > >  nvkm_pcie_init(struct nvkm_pci
*pci)
> > > > > > > > >  {
> > > > > > > > > @@ -105,12 +112,21 @@
nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > >       if
(pci->func->pcie.init)
> > > > > > > > >              
pci->func->pcie.init(pci);
> > > > > > > > >
> > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > -            
nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > +     if (pci->pcie.cur_speed !=
-1)
> > > > > > > > > +            
nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > +                               
pci->pcie.cur_width);
> > > > > > > > >
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +int
> > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci
*pci)
> > > > > > > > > +{
> > > > > > > > > +     if
(!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > +             return
nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > +     return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  int
> > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci
*pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > >  {
> > > > > > > > > @@ -146,8 +162,8 @@
nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > >               speed = max_speed;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > +     pci->pcie.cur_speed =
speed;
> > > > > > > > > +     pci->pcie.cur_width =
width;
> > > > > > > > >
> > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > >               nvkm_debug(subdev,
"requested matches current speed\n");
> > > > > > > > > diff --git
a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > ---
a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > +++
b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > @@ -60,5 +60,7 @@ enum
nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > >  int
gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > >
> > > > > > > > >  int nvkm_pcie_oneinit(struct
nvkm_pci *);
> > > > > > > > > +int nvkm_pcie_preinit(struct
nvkm_pci *);
> > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci
*);
> > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci
*);
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.21.0
> > > > > > > > >

Seemingly Similar Threads

Search for more apparently analagous threads

Nouveau - Jun 2019 - [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

Seemingly Similar Threads