thr3ads.net - Nouveau - [Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2019-Nov-21 22:50 UTC

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 11:39 PM Rafael J. Wysocki <rafael at kernel.org>
wrote:>
> On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
> <mika.westerberg at intel.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> > > <mika.westerberg at intel.com> wrote:
> > > >
> > > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg
wrote:
> > > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J.
Wysocki wrote:
> > > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > > > <mika.westerberg at intel.com> wrote:
> > > > > > >
> > > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100,
Rafael J. Wysocki wrote:
> > > > > > > > > last week or so I found systems
where the GPU was under the "PCI
> > > > > > > > > Express Root Port" (name from
lspci) and on those systems all of that
> > > > > > > > > seems to work. So I am wondering if
it's indeed just the 0x1901 one,
> > > > > > > > > which also explains Mikas case that
Thunderbolt stuff works as devices
> > > > > > > > > never get populated under this
particular bridge controller, but under
> > > > > > > > > those "Root Port"s
> > > > > > > >
> > > > > > > > It always is a PCIe port, but its
location within the SoC may matter.
> > > > > > >
> > > > > > > Exactly. Intel hardware has PCIe ports on CPU
side (these are called
> > > > > > > PEG, PCI Express Graphics, ports), and the
PCH side. I think the IP is
> > > > > > > still the same.
> > > > > > >
> > > > > > > > Also some custom AML-based power
management is involved and that may
> > > > > > > > be making specific assumptions on the
configuration of the SoC and the
> > > > > > > > GPU at the time of its invocation which
unfortunately are not known to
> > > > > > > > us.
> > > > > > > >
> > > > > > > > However, it looks like the AML invoked
to power down the GPU from
> > > > > > > > acpi_pci_set_power_state() gets confused
if it is not in PCI D0 at
> > > > > > > > that point, so it looks like that AML
tries to access device memory on
> > > > > > > > the GPU (beyond the PCI config space) or
similar which is not
> > > > > > > > accessible in PCI power states below D0.
> > > > > > >
> > > > > > > Or the PCI config space of the GPU when the
parent root port is in D3hot
> > > > > > > (as it is the case here). Also then the GPU
config space is not
> > > > > > > accessible.
> > > > > >
> > > > > > Why would the parent port be in D3hot at that
point?  Wouldn't that be
> > > > > > a suspend ordering violation?
> > > > >
> > > > > No. We put the GPU into D3hot first,
> > >
> > > OK
> > >
> > > Does this involve any AML, like a _PS3 under the GPU object?
> >
> > I don't see _PS3 (nor _PS0) for that object. If I read it right
the GPU
> > itself is not described in ACPI tables at all.
>
> OK
>
> > > > > then the root port and then turn
> > > > > off the power resource (which is attached to the root
port) resulting
> > > > > the topology entering D3cold.
> > > >
> > > > I don't see that happening in the AML though.
> > >
> > > Which AML do you mean, specifically?  The _OFF method for the
root
> > > port's _PR3 power resource or something else?
> >
> > The root port's _OFF method for the power resource returned by its
_PR3.
>
> OK, so without the $subject patch we (1) program the downstream
> component (GPU) into D3hot, then we (2) program the port holding it
> into D3hot and then we (3) let the AML (_OFF for the power resource
> listed by _PR3 under the port object) run.
>
> Something strange happens at this point (and I guess that _OFF doesn't
> even reach the point where it removes power from the port which is why
> we see a lock-up).
>
it does though. I will post the data shortly (with the change in power
consumption), as I also want to do the ACPI traces now.
> We know that skipping (1) makes things work and we kind of suspect
> that skipping (3) would make things work either, but what about doing
> (1) and (3) without (2)?
>
> > > > Basically the difference is that when Windows 7 or Linux
(the _REV==5
> > > > check) then we directly do link disable whereas in Windows
8+ we invoke
> > > > LKDS() method that puts the link into L2/L3. None of the
fields they
> > > > access seem to touch the GPU itself.
> > >
> > > So that may be where the problem is.
> > >
> > > Putting the downstream component into PCI D[1-3] is expected to
put
> > > the link into L1, so I'm not sure how that plays with the
later
> > > attempt to put it into L2/L3 Ready.
> >
> > That should be fine. What I've seen the link goes into L1 when
> > downstream component is put to D-state (not D0) and then it is put
back
> > to L0 when L2/3 ready is propagated. Eventually it goes into L2 or L3.
>
> Well, that's the expected behavior, but the observed behavior isn't
as
> expected. :-)
>
> > > Also, L2/L3 Ready is expected to be transient, so finally power
should
> > > be removed somehow.
> >
> > There is GPIO for both power and PERST, I think the line here:
> >
> >   \_SB.SGOV (0x01010004, Zero)
> >
> > is the one that removes power.
>
> OK
>
> > > > LKDS() for the first PEG port looks like this:
> > > >
> > > >    P0L2 = One
> > > >    Sleep (0x10)
> > > >    Local0 = Zero
> > > >    While (P0L2)
> > > >    {
> > > >         If ((Local0 > 0x04))
> > > >         {
> > > >             Break
> > > >         }
> > > >
> > > >         Sleep (0x10)
> > > >         Local0++
> > > >    }
> > > >
> > > > One thing that comes to mind is that the loop can end even
if P0L2 is
> > > > not cleared as it does only 5 iterations with 16 ms sleep
between. Maybe
> > > > Sleep() is implemented differently in Windows? I mean Linux
may be
> > > > "faster" here and return prematurely and if we
leave the port into D0
> > > > this does not happen, or something. I'm just throwing
out ideas :)
> > >
> > > But this actually works for the downstream component in D0,
doesn't it?
> >
> > It does and that leaves the link in L0 so it could be that then the
> > above AML works better or something.
>
> That would be my guess.
>
> > That reminds me, ASPM may have something to do with this as well.
>
> Not really if D-states are involved.
>
> > > Also, if the downstream component is in D0, the port actually
should
> > > stay in D0 too, so what would happen with the $subject patch
applied?
> >
> > Parent port cannot be lower D-state than the child so I agree it
should
> > stay in D0 as well. However, it seems that what happens is that the
> > issue goes away :)
>
> Well, at least this is kind of out of the spec.
>
> Note that pci_pm_suspend_noirq() won't let the port go into D3 if the
> downstream device is in D0, so the $subject patch will not work as
> expected in the suspend-to-idle case.
>
> Also we really should make up our minds on whether or not to force
> PCIe ports to stay in D0 when downstream devices are in D0 and be
> consequent about that.  Right now we do one thing during system-wide
> suspend and the other one in PM-runtime, which is confusing.
>
> The current design is mostly based on the PCI PM Spec 1.2, so it would
> be consequent to follow system-wide suspend in PM-runtime and avoid
> putting PCIe ports holding devices in D0 into any low-power states.
> but that would make the approach in the $subject patch ineffective.
>
> Moreover, the fact that there are separate branches for "Windows
7"
> and "Windows 8+" kind of suggest a change in the expected
behavior
> between Windows 7 and Windows 8, from the AML perspective.  I would
> guess that Windows 7 followed PCI PM 1.2 and Windows 8 (and later)
> does something else.  Now, the structure of the "Windows 8+"
branch
> described by you suggests that, at least in the cases when it is going
> to remove power from the port eventually, it goes straight for the
> link preparation (the L2/L3 Ready transition) and power removal
> without bothering to program the downstream device and port into D3hot
> (because that's kind of redundant).
>
> That hypothetical "Windows 8+" approach may really work
universally,
> because it doesn't seem to break any rules (going straight from D0 to
> D3cold is not disallowed and doing that for both a port and a
> downstream device at the same time is kind of OK either, as long as
> the link is ready for that).
>

Karol Herbst

2019-Nov-22 00:13 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

so while trying to test with d3cold disabled, I noticed that I run
into the exact same error. And I verified that the
\_SB.PCI0.PEG0.PG00._STA returns 1, which indicates it should still be
turned on.

On Thu, Nov 21, 2019 at 11:50 PM Karol Herbst <kherbst at redhat.com>
wrote:>
> On Thu, Nov 21, 2019 at 11:39 PM Rafael J. Wysocki <rafael at
kernel.org> wrote:
> >
> > On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
> > <mika.westerberg at intel.com> wrote:
> > >
> > > On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki
wrote:
> > > > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> > > > <mika.westerberg at intel.com> wrote:
> > > > >
> > > > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika
Westerberg wrote:
> > > > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael
J. Wysocki wrote:
> > > > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika
Westerberg
> > > > > > > <mika.westerberg at intel.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Nov 20, 2019 at 11:29:33PM
+0100, Rafael J. Wysocki wrote:
> > > > > > > > > > last week or so I found
systems where the GPU was under the "PCI
> > > > > > > > > > Express Root Port" (name
from lspci) and on those systems all of that
> > > > > > > > > > seems to work. So I am
wondering if it's indeed just the 0x1901 one,
> > > > > > > > > > which also explains Mikas case
that Thunderbolt stuff works as devices
> > > > > > > > > > never get populated under this
particular bridge controller, but under
> > > > > > > > > > those "Root Port"s
> > > > > > > > >
> > > > > > > > > It always is a PCIe port, but its
location within the SoC may matter.
> > > > > > > >
> > > > > > > > Exactly. Intel hardware has PCIe ports
on CPU side (these are called
> > > > > > > > PEG, PCI Express Graphics, ports), and
the PCH side. I think the IP is
> > > > > > > > still the same.
> > > > > > > >
> > > > > > > > > Also some custom AML-based power
management is involved and that may
> > > > > > > > > be making specific assumptions on
the configuration of the SoC and the
> > > > > > > > > GPU at the time of its invocation
which unfortunately are not known to
> > > > > > > > > us.
> > > > > > > > >
> > > > > > > > > However, it looks like the AML
invoked to power down the GPU from
> > > > > > > > > acpi_pci_set_power_state() gets
confused if it is not in PCI D0 at
> > > > > > > > > that point, so it looks like that
AML tries to access device memory on
> > > > > > > > > the GPU (beyond the PCI config
space) or similar which is not
> > > > > > > > > accessible in PCI power states
below D0.
> > > > > > > >
> > > > > > > > Or the PCI config space of the GPU when
the parent root port is in D3hot
> > > > > > > > (as it is the case here). Also then the
GPU config space is not
> > > > > > > > accessible.
> > > > > > >
> > > > > > > Why would the parent port be in D3hot at that
point?  Wouldn't that be
> > > > > > > a suspend ordering violation?
> > > > > >
> > > > > > No. We put the GPU into D3hot first,
> > > >
> > > > OK
> > > >
> > > > Does this involve any AML, like a _PS3 under the GPU object?
> > >
> > > I don't see _PS3 (nor _PS0) for that object. If I read it
right the GPU
> > > itself is not described in ACPI tables at all.
> >
> > OK
> >
> > > > > > then the root port and then turn
> > > > > > off the power resource (which is attached to the
root port) resulting
> > > > > > the topology entering D3cold.
> > > > >
> > > > > I don't see that happening in the AML though.
> > > >
> > > > Which AML do you mean, specifically?  The _OFF method for
the root
> > > > port's _PR3 power resource or something else?
> > >
> > > The root port's _OFF method for the power resource returned
by its _PR3.
> >
> > OK, so without the $subject patch we (1) program the downstream
> > component (GPU) into D3hot, then we (2) program the port holding it
> > into D3hot and then we (3) let the AML (_OFF for the power resource
> > listed by _PR3 under the port object) run.
> >
> > Something strange happens at this point (and I guess that _OFF
doesn't
> > even reach the point where it removes power from the port which is why
> > we see a lock-up).
> >
>
> it does though. I will post the data shortly (with the change in power
> consumption), as I also want to do the ACPI traces now.
>
> > We know that skipping (1) makes things work and we kind of suspect
> > that skipping (3) would make things work either, but what about doing
> > (1) and (3) without (2)?
> >
> > > > > Basically the difference is that when Windows 7 or
Linux (the _REV==5
> > > > > check) then we directly do link disable whereas in
Windows 8+ we invoke
> > > > > LKDS() method that puts the link into L2/L3. None of
the fields they
> > > > > access seem to touch the GPU itself.
> > > >
> > > > So that may be where the problem is.
> > > >
> > > > Putting the downstream component into PCI D[1-3] is expected
to put
> > > > the link into L1, so I'm not sure how that plays with
the later
> > > > attempt to put it into L2/L3 Ready.
> > >
> > > That should be fine. What I've seen the link goes into L1
when
> > > downstream component is put to D-state (not D0) and then it is
put back
> > > to L0 when L2/3 ready is propagated. Eventually it goes into L2
or L3.
> >
> > Well, that's the expected behavior, but the observed behavior
isn't as
> > expected. :-)
> >
> > > > Also, L2/L3 Ready is expected to be transient, so finally
power should
> > > > be removed somehow.
> > >
> > > There is GPIO for both power and PERST, I think the line here:
> > >
> > >   \_SB.SGOV (0x01010004, Zero)
> > >
> > > is the one that removes power.
> >
> > OK
> >
> > > > > LKDS() for the first PEG port looks like this:
> > > > >
> > > > >    P0L2 = One
> > > > >    Sleep (0x10)
> > > > >    Local0 = Zero
> > > > >    While (P0L2)
> > > > >    {
> > > > >         If ((Local0 > 0x04))
> > > > >         {
> > > > >             Break
> > > > >         }
> > > > >
> > > > >         Sleep (0x10)
> > > > >         Local0++
> > > > >    }
> > > > >
> > > > > One thing that comes to mind is that the loop can end
even if P0L2 is
> > > > > not cleared as it does only 5 iterations with 16 ms
sleep between. Maybe
> > > > > Sleep() is implemented differently in Windows? I mean
Linux may be
> > > > > "faster" here and return prematurely and if
we leave the port into D0
> > > > > this does not happen, or something. I'm just
throwing out ideas :)
> > > >
> > > > But this actually works for the downstream component in D0,
doesn't it?
> > >
> > > It does and that leaves the link in L0 so it could be that then
the
> > > above AML works better or something.
> >
> > That would be my guess.
> >
> > > That reminds me, ASPM may have something to do with this as well.
> >
> > Not really if D-states are involved.
> >
> > > > Also, if the downstream component is in D0, the port
actually should
> > > > stay in D0 too, so what would happen with the $subject patch
applied?
> > >
> > > Parent port cannot be lower D-state than the child so I agree it
should
> > > stay in D0 as well. However, it seems that what happens is that
the
> > > issue goes away :)
> >
> > Well, at least this is kind of out of the spec.
> >
> > Note that pci_pm_suspend_noirq() won't let the port go into D3 if
the
> > downstream device is in D0, so the $subject patch will not work as
> > expected in the suspend-to-idle case.
> >
> > Also we really should make up our minds on whether or not to force
> > PCIe ports to stay in D0 when downstream devices are in D0 and be
> > consequent about that.  Right now we do one thing during system-wide
> > suspend and the other one in PM-runtime, which is confusing.
> >
> > The current design is mostly based on the PCI PM Spec 1.2, so it would
> > be consequent to follow system-wide suspend in PM-runtime and avoid
> > putting PCIe ports holding devices in D0 into any low-power states.
> > but that would make the approach in the $subject patch ineffective.
> >
> > Moreover, the fact that there are separate branches for "Windows
7"
> > and "Windows 8+" kind of suggest a change in the expected
behavior
> > between Windows 7 and Windows 8, from the AML perspective.  I would
> > guess that Windows 7 followed PCI PM 1.2 and Windows 8 (and later)
> > does something else.  Now, the structure of the "Windows 8+"
branch
> > described by you suggests that, at least in the cases when it is going
> > to remove power from the port eventually, it goes straight for the
> > link preparation (the L2/L3 Ready transition) and power removal
> > without bothering to program the downstream device and port into D3hot
> > (because that's kind of redundant).
> >
> > That hypothetical "Windows 8+" approach may really work
universally,
> > because it doesn't seem to break any rules (going straight from D0
to
> > D3cold is not disallowed and doing that for both a port and a
> > downstream device at the same time is kind of OK either, as long as
> > the link is ready for that).
> >

Rafael J. Wysocki

2019-Nov-22 09:07 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Fri, Nov 22, 2019 at 1:13 AM Karol Herbst <kherbst at redhat.com>
wrote:>
> so while trying to test with d3cold disabled, I noticed that I run
> into the exact same error.
Does this mean that you disabled d3cold on the GPU via sysfs (the
"d3cold_allowed" attribute was 0) and the original problem still
occurred in that configuration?
> And I verified that the
> \_SB.PCI0.PEG0.PG00._STA returns 1, which indicates it should still be
> turned on.
I don't really understand this comment, so can you explain it a bit to
me, please?

Apparently Analagous Threads

Search for more apparently analagous threads

Nouveau - Nov 2019 - [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

Apparently Analagous Threads