thr3ads.net - Nouveau - [Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2019-Sep-27 14:44 UTC

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

Fixes runpm breakage mainly on Nvidia GPUs as they are not able to resume.

Works perfectly with this workaround applied.

RFC comment:
We are quite sure that there is a higher amount of bridges affected by this,
but I was only testing it on my own machine for now.

I've stresstested runpm by doing 5000 runpm cycles with that patch applied
and never saw it fail.

I mainly wanted to get a discussion going on if that's a feasable workaround
indeed or if we need something better.

I am also sure, that the nouveau driver itself isn't at fault as I am able
to reproduce the same issue by poking into some PCI registers on the PCIe
bridge to put the GPU into D3cold as it's done in ACPI code.

I've written a little python script to reproduce this issue without the need
of loading nouveau:
https://raw.githubusercontent.com/karolherbst/pci-stub-runpm/master/nv_runpm_bug_test.py

Signed-off-by: Karol Herbst <kherbst at redhat.com>
Cc: Bjorn Helgaas <bhelgaas at google.com>
Cc: Lyude Paul <lyude at redhat.com>
Cc: linux-pci at vger.kernel.org
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
---
 drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 088fcdc8d2b4..9dbd29ced1ac 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -799,6 +799,42 @@ static inline bool platform_pci_bridge_d3(struct pci_dev
*dev)
 	return pci_platform_pm ? pci_platform_pm->bridge_d3(dev) : false;
 }
 
+/*
+ * some intel bridges cause serious issues with runpm if the client device
+ * is put into D1/D2/D3hot before putting the client into D3cold via
+ * platform means (generally ACPI).
+ *
+ * skipping this makes runpm work perfectly fine on such devices.
+ *
+ * As far as we know only skylake and kaby lake SoCs are affected.
+ */
+static unsigned short intel_broken_d3_bridges[] = {
+	/* kbl */
+	0x1901,
+};
+
+static inline bool intel_broken_pci_pm(struct pci_bus *bus)
+{
+	struct pci_dev *bridge;
+	int i;
+
+	if (!bus || !bus->self)
+		return false;
+
+	bridge = bus->self;
+	if (bridge->vendor != PCI_VENDOR_ID_INTEL)
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(intel_broken_d3_bridges); i++) {
+		if (bridge->device == intel_broken_d3_bridges[i]) {
+			pci_err(bridge, "found broken intel bridge\n");
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * pci_raw_set_power_state - Use PCI PM registers to set the power state of
  *			     given PCI device
@@ -827,6 +863,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev,
pci_power_t state)
 	if (state < PCI_D0 || state > PCI_D3hot)
 		return -EINVAL;
 
+	if (state != PCI_D0 && intel_broken_pci_pm(dev->bus))
+		return 0;
+
 	/*
 	 * Validate current state:
 	 * Can enter D0 from any state, but if we can only go deeper
-- 
2.21.0

Bjorn Helgaas

2019-Sep-27 21:42 UTC

head link

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[+cc Rafael, Mika, linux-pm]

On Fri, Sep 27, 2019 at 04:44:21PM +0200, Karol Herbst
wrote:> Fixes runpm breakage mainly on Nvidia GPUs as they are not able to resume.
I don't know what runpm is.  Some userspace utility?  Module
parameter?
> Works perfectly with this workaround applied.
> 
> RFC comment:
> We are quite sure that there is a higher amount of bridges affected by
this,
> but I was only testing it on my own machine for now.
> 
> I've stresstested runpm by doing 5000 runpm cycles with that patch
applied
> and never saw it fail.
> 
> I mainly wanted to get a discussion going on if that's a feasable
workaround
> indeed or if we need something better.
> 
> I am also sure, that the nouveau driver itself isn't at fault as I am
able
> to reproduce the same issue by poking into some PCI registers on the PCIe
> bridge to put the GPU into D3cold as it's done in ACPI code.
> 
> I've written a little python script to reproduce this issue without the
need
> of loading nouveau:
>
https://raw.githubusercontent.com/karolherbst/pci-stub-runpm/master/nv_runpm_bug_test.py
Nice script, thanks for sharing it :)  I could learn a lot of useful
python by studying it.
> Signed-off-by: Karol Herbst <kherbst at redhat.com>
> Cc: Bjorn Helgaas <bhelgaas at google.com>
> Cc: Lyude Paul <lyude at redhat.com>
> Cc: linux-pci at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: nouveau at lists.freedesktop.org
> ---
>  drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 088fcdc8d2b4..9dbd29ced1ac 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -799,6 +799,42 @@ static inline bool platform_pci_bridge_d3(struct
pci_dev *dev)
>  	return pci_platform_pm ? pci_platform_pm->bridge_d3(dev) : false;
>  }
>  
> +/*
> + * some intel bridges cause serious issues with runpm if the client device
> + * is put into D1/D2/D3hot before putting the client into D3cold via
> + * platform means (generally ACPI).
You mention Nvidia GPUs above, but I guess the same issue may affect
other devices?  I would really like to chase this down to a more
specific issue, e.g., a hardware defect with erratum, an ACPI defect,
or a Linux defect.  Without the specifics, this is just a band-aid.

I don't see any relevant requirements in the _OFF description, but I
don't know much about ACPI power control.

Your script allows several scenarios; I *guess* the one that causes
the problem is:

  - write 3 (D3hot) to GPU PowerState (PCIE_PM_REG == 0x64, I assume
    PM Capability Control Register)
  - write 3 (D3hot) to bridge PowerState (0x84, I assume PM Capability
    Control Register)
  - run _OFF on the power resource for the bridge
>From your script I assume you do:
  - run _ON on the power resource for the bridge
  - write 0 (D0) to the bridge PowerState

You do *not* write the GPU PowerState (which we can't do if the GPU is
in D3cold).  Is there some assumption that it comes out of D3cold via
some other mechanism, e.g., is the _ON supposed to wake up the GPU?

What exactly is the serious issue?  I guess it's that the rescan
doesn't detect the GPU, which means it's not responding to config
accesses?  Is there any timing component here, e.g., maybe we're
missing some delay like the ones Mika is adding to the reset paths?
> + *
> + * skipping this makes runpm work perfectly fine on such devices.
> + *
> + * As far as we know only skylake and kaby lake SoCs are affected.
> + */
> +static unsigned short intel_broken_d3_bridges[] = {
> +	/* kbl */
> +	0x1901,
> +};
> +
> +static inline bool intel_broken_pci_pm(struct pci_bus *bus)
> +{
> +	struct pci_dev *bridge;
> +	int i;
> +
> +	if (!bus || !bus->self)
> +		return false;
> +
> +	bridge = bus->self;
> +	if (bridge->vendor != PCI_VENDOR_ID_INTEL)
> +		return false;
> +
> +	for (i = 0; i < ARRAY_SIZE(intel_broken_d3_bridges); i++) {
> +		if (bridge->device == intel_broken_d3_bridges[i]) {
> +			pci_err(bridge, "found broken intel bridge\n");
If this ends up being a hardware defect, we should use a quirk to set
a bit in the pci_dev once, as we do for broken_intx_masking and
similar bits.
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * pci_raw_set_power_state - Use PCI PM registers to set the power state
of
>   *			     given PCI device
> @@ -827,6 +863,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev,
pci_power_t state)
>  	if (state < PCI_D0 || state > PCI_D3hot)
>  		return -EINVAL;
>  
> +	if (state != PCI_D0 && intel_broken_pci_pm(dev->bus))
> +		return 0;
> +
>  	/*
>  	 * Validate current state:
>  	 * Can enter D0 from any state, but if we can only go deeper
> -- 
> 2.21.0
>

Karol Herbst

2019-Sep-27 21:53 UTC

head link

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

On Fri, Sep 27, 2019 at 11:42 PM Bjorn Helgaas <helgaas at kernel.org>
wrote:>
> [+cc Rafael, Mika, linux-pm]
>
> On Fri, Sep 27, 2019 at 04:44:21PM +0200, Karol Herbst wrote:
> > Fixes runpm breakage mainly on Nvidia GPUs as they are not able to
resume.
>
> I don't know what runpm is.  Some userspace utility?  Module
> parameter?
>
runpm aka runtime powermanagement aka runtime_resume and runtime_suspend
> > Works perfectly with this workaround applied.
> >
> > RFC comment:
> > We are quite sure that there is a higher amount of bridges affected by
this,
> > but I was only testing it on my own machine for now.
> >
> > I've stresstested runpm by doing 5000 runpm cycles with that patch
applied
> > and never saw it fail.
> >
> > I mainly wanted to get a discussion going on if that's a feasable
workaround
> > indeed or if we need something better.
> >
> > I am also sure, that the nouveau driver itself isn't at fault as I
am able
> > to reproduce the same issue by poking into some PCI registers on the
PCIe
> > bridge to put the GPU into D3cold as it's done in ACPI code.
> >
> > I've written a little python script to reproduce this issue
without the need
> > of loading nouveau:
> >
https://raw.githubusercontent.com/karolherbst/pci-stub-runpm/master/nv_runpm_bug_test.py
>
> Nice script, thanks for sharing it :)  I could learn a lot of useful
> python by studying it.
>
> > Signed-off-by: Karol Herbst <kherbst at redhat.com>
> > Cc: Bjorn Helgaas <bhelgaas at google.com>
> > Cc: Lyude Paul <lyude at redhat.com>
> > Cc: linux-pci at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: nouveau at lists.freedesktop.org
> > ---
> >  drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 088fcdc8d2b4..9dbd29ced1ac 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -799,6 +799,42 @@ static inline bool platform_pci_bridge_d3(struct
pci_dev *dev)
> >       return pci_platform_pm ? pci_platform_pm->bridge_d3(dev) :
false;
> >  }
> >
> > +/*
> > + * some intel bridges cause serious issues with runpm if the client
device
> > + * is put into D1/D2/D3hot before putting the client into D3cold via
> > + * platform means (generally ACPI).
>
> You mention Nvidia GPUs above, but I guess the same issue may affect
> other devices?  I would really like to chase this down to a more
> specific issue, e.g., a hardware defect with erratum, an ACPI defect,
> or a Linux defect.  Without the specifics, this is just a band-aid.
>
yep.. but we were trying to talk to Nvidia and Intel about it and had
no luck getting anything out of them so far. We gave up on Nvidia,
Intel is still pending.
> I don't see any relevant requirements in the _OFF description, but I
> don't know much about ACPI power control.
>
> Your script allows several scenarios; I *guess* the one that causes
> the problem is:
>
>   - write 3 (D3hot) to GPU PowerState (PCIE_PM_REG == 0x64, I assume
>     PM Capability Control Register)
correct
>   - write 3 (D3hot) to bridge PowerState (0x84, I assume PM Capability
>     Control Register)
correct, but this seems to be fine and doesn't fix the issue if that
part is skipped
>   - run _OFF on the power resource for the bridge
>
> From your script I assume you do:
>
>   - run _ON on the power resource for the bridge
>   - write 0 (D0) to the bridge PowerState
>
> You do *not* write the GPU PowerState (which we can't do if the GPU is
> in D3cold).  Is there some assumption that it comes out of D3cold via
> some other mechanism, e.g., is the _ON supposed to wake up the GPU?
if the "link" is powered up again (via 0x248, 0xbc or 0xb0 on the GPU
bridge or the ACPI _ON method) the GPU comes up in the D0 state and is
fully operational, well besides the little issue we've got with the D3
write. It can also be worked around by putting the PCIe link into 5.0
and 8.0 mode, but that's not reliable enough as it fails it around 10%
of all tries.
>
> What exactly is the serious issue?  I guess it's that the rescan
> doesn't detect the GPU, which means it's not responding to config
> accesses?  Is there any timing component here, e.g., maybe we're
> missing some delay like the ones Mika is adding to the reset paths?
>
When I was checking up on some of the PCI registers of the bridge
controller, the slot detection told me that there is no device
recognized anymore. I don't know which register it was anymore, though
I guess one could read it up in the SoC spec document by Intel.

My guess is, that the bridge controller fails to detect the GPU being
here or actively threw it of the bus or something. But a normal system
suspend/resume cycle brings the GPU back online (doing a rescan via
sysfs gets the device detected again)
> > + *
> > + * skipping this makes runpm work perfectly fine on such devices.
> > + *
> > + * As far as we know only skylake and kaby lake SoCs are affected.
> > + */
> > +static unsigned short intel_broken_d3_bridges[] = {
> > +     /* kbl */
> > +     0x1901,
> > +};
> > +
> > +static inline bool intel_broken_pci_pm(struct pci_bus *bus)
> > +{
> > +     struct pci_dev *bridge;
> > +     int i;
> > +
> > +     if (!bus || !bus->self)
> > +             return false;
> > +
> > +     bridge = bus->self;
> > +     if (bridge->vendor != PCI_VENDOR_ID_INTEL)
> > +             return false;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(intel_broken_d3_bridges); i++) {
> > +             if (bridge->device == intel_broken_d3_bridges[i]) {
> > +                     pci_err(bridge, "found broken intel
bridge\n");
>
> If this ends up being a hardware defect, we should use a quirk to set
> a bit in the pci_dev once, as we do for broken_intx_masking and
> similar bits.
okay, if you think this is the preferred way then I can change the
patch accordingly.
>
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /**
> >   * pci_raw_set_power_state - Use PCI PM registers to set the power
state of
> >   *                        given PCI device
> > @@ -827,6 +863,9 @@ static int pci_raw_set_power_state(struct pci_dev
*dev, pci_power_t state)
> >       if (state < PCI_D0 || state > PCI_D3hot)
> >               return -EINVAL;
> >
> > +     if (state != PCI_D0 && intel_broken_pci_pm(dev->bus))
> > +             return 0;
> > +
> >       /*
> >        * Validate current state:
> >        * Can enter D0 from any state, but if we can only go deeper
> > --
> > 2.21.0
> >

Reasonably Related Threads

Search for more seemingly similar threads

Nouveau - Sep 2019 - [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

[Nouveau] [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

Reasonably Related Threads