thr3ads.net - Nouveau - [Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2019-Sep-27 21:53 UTC

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

On Fri, Sep 27, 2019 at 11:42 PM Bjorn Helgaas <helgaas at kernel.org>
wrote:>
> [+cc Rafael, Mika, linux-pm]
>
> On Fri, Sep 27, 2019 at 04:44:21PM +0200, Karol Herbst wrote:
> > Fixes runpm breakage mainly on Nvidia GPUs as they are not able to
resume.
>
> I don't know what runpm is.  Some userspace utility?  Module
> parameter?
>
runpm aka runtime powermanagement aka runtime_resume and runtime_suspend
> > Works perfectly with this workaround applied.
> >
> > RFC comment:
> > We are quite sure that there is a higher amount of bridges affected by
this,
> > but I was only testing it on my own machine for now.
> >
> > I've stresstested runpm by doing 5000 runpm cycles with that patch
applied
> > and never saw it fail.
> >
> > I mainly wanted to get a discussion going on if that's a feasable
workaround
> > indeed or if we need something better.
> >
> > I am also sure, that the nouveau driver itself isn't at fault as I
am able
> > to reproduce the same issue by poking into some PCI registers on the
PCIe
> > bridge to put the GPU into D3cold as it's done in ACPI code.
> >
> > I've written a little python script to reproduce this issue
without the need
> > of loading nouveau:
> >
https://raw.githubusercontent.com/karolherbst/pci-stub-runpm/master/nv_runpm_bug_test.py
>
> Nice script, thanks for sharing it :)  I could learn a lot of useful
> python by studying it.
>
> > Signed-off-by: Karol Herbst <kherbst at redhat.com>
> > Cc: Bjorn Helgaas <bhelgaas at google.com>
> > Cc: Lyude Paul <lyude at redhat.com>
> > Cc: linux-pci at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: nouveau at lists.freedesktop.org
> > ---
> >  drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 088fcdc8d2b4..9dbd29ced1ac 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -799,6 +799,42 @@ static inline bool platform_pci_bridge_d3(struct
pci_dev *dev)
> >       return pci_platform_pm ? pci_platform_pm->bridge_d3(dev) :
false;
> >  }
> >
> > +/*
> > + * some intel bridges cause serious issues with runpm if the client
device
> > + * is put into D1/D2/D3hot before putting the client into D3cold via
> > + * platform means (generally ACPI).
>
> You mention Nvidia GPUs above, but I guess the same issue may affect
> other devices?  I would really like to chase this down to a more
> specific issue, e.g., a hardware defect with erratum, an ACPI defect,
> or a Linux defect.  Without the specifics, this is just a band-aid.
>
yep.. but we were trying to talk to Nvidia and Intel about it and had
no luck getting anything out of them so far. We gave up on Nvidia,
Intel is still pending.
> I don't see any relevant requirements in the _OFF description, but I
> don't know much about ACPI power control.
>
> Your script allows several scenarios; I *guess* the one that causes
> the problem is:
>
>   - write 3 (D3hot) to GPU PowerState (PCIE_PM_REG == 0x64, I assume
>     PM Capability Control Register)
correct
>   - write 3 (D3hot) to bridge PowerState (0x84, I assume PM Capability
>     Control Register)
correct, but this seems to be fine and doesn't fix the issue if that
part is skipped
>   - run _OFF on the power resource for the bridge
>
> From your script I assume you do:
>
>   - run _ON on the power resource for the bridge
>   - write 0 (D0) to the bridge PowerState
>
> You do *not* write the GPU PowerState (which we can't do if the GPU is
> in D3cold).  Is there some assumption that it comes out of D3cold via
> some other mechanism, e.g., is the _ON supposed to wake up the GPU?
if the "link" is powered up again (via 0x248, 0xbc or 0xb0 on the GPU
bridge or the ACPI _ON method) the GPU comes up in the D0 state and is
fully operational, well besides the little issue we've got with the D3
write. It can also be worked around by putting the PCIe link into 5.0
and 8.0 mode, but that's not reliable enough as it fails it around 10%
of all tries.
>
> What exactly is the serious issue?  I guess it's that the rescan
> doesn't detect the GPU, which means it's not responding to config
> accesses?  Is there any timing component here, e.g., maybe we're
> missing some delay like the ones Mika is adding to the reset paths?
>
When I was checking up on some of the PCI registers of the bridge
controller, the slot detection told me that there is no device
recognized anymore. I don't know which register it was anymore, though
I guess one could read it up in the SoC spec document by Intel.

My guess is, that the bridge controller fails to detect the GPU being
here or actively threw it of the bus or something. But a normal system
suspend/resume cycle brings the GPU back online (doing a rescan via
sysfs gets the device detected again)
> > + *
> > + * skipping this makes runpm work perfectly fine on such devices.
> > + *
> > + * As far as we know only skylake and kaby lake SoCs are affected.
> > + */
> > +static unsigned short intel_broken_d3_bridges[] = {
> > +     /* kbl */
> > +     0x1901,
> > +};
> > +
> > +static inline bool intel_broken_pci_pm(struct pci_bus *bus)
> > +{
> > +     struct pci_dev *bridge;
> > +     int i;
> > +
> > +     if (!bus || !bus->self)
> > +             return false;
> > +
> > +     bridge = bus->self;
> > +     if (bridge->vendor != PCI_VENDOR_ID_INTEL)
> > +             return false;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(intel_broken_d3_bridges); i++) {
> > +             if (bridge->device == intel_broken_d3_bridges[i]) {
> > +                     pci_err(bridge, "found broken intel
bridge\n");
>
> If this ends up being a hardware defect, we should use a quirk to set
> a bit in the pci_dev once, as we do for broken_intx_masking and
> similar bits.
okay, if you think this is the preferred way then I can change the
patch accordingly.
>
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /**
> >   * pci_raw_set_power_state - Use PCI PM registers to set the power
state of
> >   *                        given PCI device
> > @@ -827,6 +863,9 @@ static int pci_raw_set_power_state(struct pci_dev
*dev, pci_power_t state)
> >       if (state < PCI_D0 || state > PCI_D3hot)
> >               return -EINVAL;
> >
> > +     if (state != PCI_D0 && intel_broken_pci_pm(dev->bus))
> > +             return 0;
> > +
> >       /*
> >        * Validate current state:
> >        * Can enter D0 from any state, but if we can only go deeper
> > --
> > 2.21.0
> >

Mika Westerberg

2019-Sep-30 08:05 UTC

head link

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

Hi Karol,

On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst
wrote:> > What exactly is the serious issue?  I guess it's that the rescan
> > doesn't detect the GPU, which means it's not responding to
config
> > accesses?  Is there any timing component here, e.g., maybe we're
> > missing some delay like the ones Mika is adding to the reset paths?
> 
> When I was checking up on some of the PCI registers of the bridge
> controller, the slot detection told me that there is no device
> recognized anymore. I don't know which register it was anymore, though
> I guess one could read it up in the SoC spec document by Intel.
> 
> My guess is, that the bridge controller fails to detect the GPU being
> here or actively threw it of the bus or something. But a normal system
> suspend/resume cycle brings the GPU back online (doing a rescan via
> sysfs gets the device detected again)
Can you elaborate a bit what kind of scenario the issue happens (e.g
steps how it reproduces)? It was not 100% clear from the changelog. Also
what the result when the failure happens?

I see there is a script that does something but unfortunately I'm not
fluent in Python so can't extract the steps how the issue can be
reproduced ;-)

One thing that I'm working on is that Linux PCI subsystem misses certain
delays that are needed after D3cold -> D0 transition, otherwise the
device and/or link may not be ready before we access it. What you are
experiencing sounds similar. I wonder if you could try the following
patch and see if it makes any difference?

https://patchwork.kernel.org/patch/11106611/

Karol Herbst

2019-Sep-30 09:15 UTC

head link

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

On Mon, Sep 30, 2019 at 10:05 AM Mika Westerberg
<mika.westerberg at linux.intel.com> wrote:>
> Hi Karol,
>
> On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst wrote:
> > > What exactly is the serious issue?  I guess it's that the
rescan
> > > doesn't detect the GPU, which means it's not responding
to config
> > > accesses?  Is there any timing component here, e.g., maybe
we're
> > > missing some delay like the ones Mika is adding to the reset
paths?
> >
> > When I was checking up on some of the PCI registers of the bridge
> > controller, the slot detection told me that there is no device
> > recognized anymore. I don't know which register it was anymore,
though
> > I guess one could read it up in the SoC spec document by Intel.
> >
> > My guess is, that the bridge controller fails to detect the GPU being
> > here or actively threw it of the bus or something. But a normal system
> > suspend/resume cycle brings the GPU back online (doing a rescan via
> > sysfs gets the device detected again)
>
> Can you elaborate a bit what kind of scenario the issue happens (e.g
> steps how it reproduces)? It was not 100% clear from the changelog. Also
> what the result when the failure happens?
>
yeah, I already have an updated patch in the works which also does the
rework Bjorn suggested. Had no time yet to test if I didn't mess it
up.

I am also thinking of adding a kernel parameter to enable this
workaround on demand, but not quite sure on that one yet.
> I see there is a script that does something but unfortunately I'm not
> fluent in Python so can't extract the steps how the issue can be
> reproduced ;-)
>
> One thing that I'm working on is that Linux PCI subsystem misses
certain
> delays that are needed after D3cold -> D0 transition, otherwise the
> device and/or link may not be ready before we access it. What you are
> experiencing sounds similar. I wonder if you could try the following
> patch and see if it makes any difference?
>
> https://patchwork.kernel.org/patch/11106611/
I think I already tried this path. The problem isn't that the device
isn't accessible too late, but that it seems that the device
completely falls off the bus. But I can retest again just to be sure.

Possibly Parallel Threads

Search for more seemingly similar threads

Nouveau - Sep 2019 - [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

Possibly Parallel Threads