thr3ads.net - Nouveau - [Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Mika Westerberg

2019-Nov-21 11:46 UTC

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki
wrote:> On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> <mika.westerberg at intel.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > last week or so I found systems where the GPU was under the
"PCI
> > > > Express Root Port" (name from lspci) and on those
systems all of that
> > > > seems to work. So I am wondering if it's indeed just the
0x1901 one,
> > > > which also explains Mikas case that Thunderbolt stuff works
as devices
> > > > never get populated under this particular bridge controller,
but under
> > > > those "Root Port"s
> > >
> > > It always is a PCIe port, but its location within the SoC may
matter.
> >
> > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > still the same.
> >
> > > Also some custom AML-based power management is involved and that
may
> > > be making specific assumptions on the configuration of the SoC
and the
> > > GPU at the time of its invocation which unfortunately are not
known to
> > > us.
> > >
> > > However, it looks like the AML invoked to power down the GPU from
> > > acpi_pci_set_power_state() gets confused if it is not in PCI D0
at
> > > that point, so it looks like that AML tries to access device
memory on
> > > the GPU (beyond the PCI config space) or similar which is not
> > > accessible in PCI power states below D0.
> >
> > Or the PCI config space of the GPU when the parent root port is in
D3hot
> > (as it is the case here). Also then the GPU config space is not
> > accessible.
> 
> Why would the parent port be in D3hot at that point?  Wouldn't that be
> a suspend ordering violation?
No. We put the GPU into D3hot first, then the root port and then turn
off the power resource (which is attached to the root port) resulting
the topology entering D3cold.
> > I took a look at the HP Omen ACPI tables which has similar problem and
> > there is also check for Windows 7 (but not Linux) so I think one
> > alternative workaround would be to add these devices into
> > acpi_osi_dmi_table[] where .callback is set to dmi_disable_osi_win8
(or
> > pass 'acpi_osi="!Windows 2012"' in the kernel
command line).
> 
> I'd like to understand the facts that have been established so far
> before deciding what to do about them. :-)
Yes, I agree :)

Mika Westerberg

2019-Nov-21 12:52 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg
wrote:> On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > <mika.westerberg at intel.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki
wrote:
> > > > > last week or so I found systems where the GPU was under
the "PCI
> > > > > Express Root Port" (name from lspci) and on those
systems all of that
> > > > > seems to work. So I am wondering if it's indeed
just the 0x1901 one,
> > > > > which also explains Mikas case that Thunderbolt stuff
works as devices
> > > > > never get populated under this particular bridge
controller, but under
> > > > > those "Root Port"s
> > > >
> > > > It always is a PCIe port, but its location within the SoC
may matter.
> > >
> > > Exactly. Intel hardware has PCIe ports on CPU side (these are
called
> > > PEG, PCI Express Graphics, ports), and the PCH side. I think the
IP is
> > > still the same.
> > >
> > > > Also some custom AML-based power management is involved and
that may
> > > > be making specific assumptions on the configuration of the
SoC and the
> > > > GPU at the time of its invocation which unfortunately are
not known to
> > > > us.
> > > >
> > > > However, it looks like the AML invoked to power down the GPU
from
> > > > acpi_pci_set_power_state() gets confused if it is not in PCI
D0 at
> > > > that point, so it looks like that AML tries to access device
memory on
> > > > the GPU (beyond the PCI config space) or similar which is
not
> > > > accessible in PCI power states below D0.
> > >
> > > Or the PCI config space of the GPU when the parent root port is
in D3hot
> > > (as it is the case here). Also then the GPU config space is not
> > > accessible.
> > 
> > Why would the parent port be in D3hot at that point?  Wouldn't
that be
> > a suspend ordering violation?
> 
> No. We put the GPU into D3hot first, then the root port and then turn
> off the power resource (which is attached to the root port) resulting
> the topology entering D3cold.
I don't see that happening in the AML though.

Basically the difference is that when Windows 7 or Linux (the _REV==5
check) then we directly do link disable whereas in Windows 8+ we invoke
LKDS() method that puts the link into L2/L3. None of the fields they
access seem to touch the GPU itself.

LKDS() for the first PEG port looks like this:

   P0L2 = One
   Sleep (0x10)
   Local0 = Zero
   While (P0L2)
   {
	If ((Local0 > 0x04))
	{
	    Break
	}

	Sleep (0x10)
	Local0++
   }

One thing that comes to mind is that the loop can end even if P0L2 is
not cleared as it does only 5 iterations with 16 ms sleep between. Maybe
Sleep() is implemented differently in Windows? I mean Linux may be
"faster" here and return prematurely and if we leave the port into D0
this does not happen, or something. I'm just throwing out ideas :)

Karol Herbst

2019-Nov-21 12:52 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 12:46 PM Mika Westerberg
<mika.westerberg at intel.com> wrote:>
> On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > <mika.westerberg at intel.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki
wrote:
> > > > > last week or so I found systems where the GPU was under
the "PCI
> > > > > Express Root Port" (name from lspci) and on those
systems all of that
> > > > > seems to work. So I am wondering if it's indeed
just the 0x1901 one,
> > > > > which also explains Mikas case that Thunderbolt stuff
works as devices
> > > > > never get populated under this particular bridge
controller, but under
> > > > > those "Root Port"s
> > > >
> > > > It always is a PCIe port, but its location within the SoC
may matter.
> > >
> > > Exactly. Intel hardware has PCIe ports on CPU side (these are
called
> > > PEG, PCI Express Graphics, ports), and the PCH side. I think the
IP is
> > > still the same.
> > >
yeah, I meant the bridge controller with the ID 0x1901 is on the CPU
side. And if the Nvidia GPU is on a port on the PCH side it all seems
to work just fine.
> > > > Also some custom AML-based power management is involved and
that may
> > > > be making specific assumptions on the configuration of the
SoC and the
> > > > GPU at the time of its invocation which unfortunately are
not known to
> > > > us.
> > > >
> > > > However, it looks like the AML invoked to power down the GPU
from
> > > > acpi_pci_set_power_state() gets confused if it is not in PCI
D0 at
> > > > that point, so it looks like that AML tries to access device
memory on
> > > > the GPU (beyond the PCI config space) or similar which is
not
> > > > accessible in PCI power states below D0.
> > >
> > > Or the PCI config space of the GPU when the parent root port is
in D3hot
> > > (as it is the case here). Also then the GPU config space is not
> > > accessible.
> >
> > Why would the parent port be in D3hot at that point?  Wouldn't
that be
> > a suspend ordering violation?
>
> No. We put the GPU into D3hot first, then the root port and then turn
> off the power resource (which is attached to the root port) resulting
> the topology entering D3cold.
>
If the kernel does a D0 -> D3hot -> D0 cycle this works as well, but
the power savings are way lower, so I kind of prefer skipping D3hot
instead of D3cold. Skipping D3hot doesn't seem to make any difference
in power savings in my testing.
> > > I took a look at the HP Omen ACPI tables which has similar
problem and
> > > there is also check for Windows 7 (but not Linux) so I think one
> > > alternative workaround would be to add these devices into
> > > acpi_osi_dmi_table[] where .callback is set to
dmi_disable_osi_win8 (or
> > > pass 'acpi_osi="!Windows 2012"' in the kernel
command line).
> >
> > I'd like to understand the facts that have been established so far
> > before deciding what to do about them. :-)
>
> Yes, I agree :)
>
Yeah.. and I think those would be too many as we know of several
models with this laptops from Lenovo, Dell and HP and random other
models from random other OEMs. I think we won't ever be able to
blacklist all models if we go that way as those might be just way too
many. Also I know from some reports on bumblebee bugs (hitting the
same issue essentially) that the acpi_osi overwrite didn't help every
user.

Karol Herbst

2019-Nov-21 12:56 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
<mika.westerberg at intel.com> wrote:>
> On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > <mika.westerberg at intel.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki
wrote:
> > > > > > last week or so I found systems where the GPU was
under the "PCI
> > > > > > Express Root Port" (name from lspci) and on
those systems all of that
> > > > > > seems to work. So I am wondering if it's
indeed just the 0x1901 one,
> > > > > > which also explains Mikas case that Thunderbolt
stuff works as devices
> > > > > > never get populated under this particular bridge
controller, but under
> > > > > > those "Root Port"s
> > > > >
> > > > > It always is a PCIe port, but its location within the
SoC may matter.
> > > >
> > > > Exactly. Intel hardware has PCIe ports on CPU side (these
are called
> > > > PEG, PCI Express Graphics, ports), and the PCH side. I think
the IP is
> > > > still the same.
> > > >
> > > > > Also some custom AML-based power management is involved
and that may
> > > > > be making specific assumptions on the configuration of
the SoC and the
> > > > > GPU at the time of its invocation which unfortunately
are not known to
> > > > > us.
> > > > >
> > > > > However, it looks like the AML invoked to power down
the GPU from
> > > > > acpi_pci_set_power_state() gets confused if it is not
in PCI D0 at
> > > > > that point, so it looks like that AML tries to access
device memory on
> > > > > the GPU (beyond the PCI config space) or similar which
is not
> > > > > accessible in PCI power states below D0.
> > > >
> > > > Or the PCI config space of the GPU when the parent root port
is in D3hot
> > > > (as it is the case here). Also then the GPU config space is
not
> > > > accessible.
> > >
> > > Why would the parent port be in D3hot at that point? 
Wouldn't that be
> > > a suspend ordering violation?
> >
> > No. We put the GPU into D3hot first, then the root port and then turn
> > off the power resource (which is attached to the root port) resulting
> > the topology entering D3cold.
>
> I don't see that happening in the AML though.
>
> Basically the difference is that when Windows 7 or Linux (the _REV==5
> check) then we directly do link disable whereas in Windows 8+ we invoke
> LKDS() method that puts the link into L2/L3. None of the fields they
> access seem to touch the GPU itself.
>
> LKDS() for the first PEG port looks like this:
>
>    P0L2 = One
>    Sleep (0x10)
>    Local0 = Zero
>    While (P0L2)
>    {
>         If ((Local0 > 0x04))
>         {
>             Break
>         }
>
>         Sleep (0x10)
>         Local0++
>    }
>
> One thing that comes to mind is that the loop can end even if P0L2 is
> not cleared as it does only 5 iterations with 16 ms sleep between. Maybe
> Sleep() is implemented differently in Windows? I mean Linux may be
> "faster" here and return prematurely and if we leave the port
into D0
> this does not happen, or something. I'm just throwing out ideas :)
>
keep in mind, that I am able to hit this bug with my python script:
https://raw.githubusercontent.com/karolherbst/pci-stub-runpm/master/nv_runpm_bug_test.py

Rafael J. Wysocki

2019-Nov-21 15:43 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
<mika.westerberg at intel.com> wrote:>
> On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > <mika.westerberg at intel.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki
wrote:
> > > > > > last week or so I found systems where the GPU was
under the "PCI
> > > > > > Express Root Port" (name from lspci) and on
those systems all of that
> > > > > > seems to work. So I am wondering if it's
indeed just the 0x1901 one,
> > > > > > which also explains Mikas case that Thunderbolt
stuff works as devices
> > > > > > never get populated under this particular bridge
controller, but under
> > > > > > those "Root Port"s
> > > > >
> > > > > It always is a PCIe port, but its location within the
SoC may matter.
> > > >
> > > > Exactly. Intel hardware has PCIe ports on CPU side (these
are called
> > > > PEG, PCI Express Graphics, ports), and the PCH side. I think
the IP is
> > > > still the same.
> > > >
> > > > > Also some custom AML-based power management is involved
and that may
> > > > > be making specific assumptions on the configuration of
the SoC and the
> > > > > GPU at the time of its invocation which unfortunately
are not known to
> > > > > us.
> > > > >
> > > > > However, it looks like the AML invoked to power down
the GPU from
> > > > > acpi_pci_set_power_state() gets confused if it is not
in PCI D0 at
> > > > > that point, so it looks like that AML tries to access
device memory on
> > > > > the GPU (beyond the PCI config space) or similar which
is not
> > > > > accessible in PCI power states below D0.
> > > >
> > > > Or the PCI config space of the GPU when the parent root port
is in D3hot
> > > > (as it is the case here). Also then the GPU config space is
not
> > > > accessible.
> > >
> > > Why would the parent port be in D3hot at that point? 
Wouldn't that be
> > > a suspend ordering violation?
> >
> > No. We put the GPU into D3hot first,
OK

Does this involve any AML, like a _PS3 under the GPU object?
> > then the root port and then turn
> > off the power resource (which is attached to the root port) resulting
> > the topology entering D3cold.
>
> I don't see that happening in the AML though.
Which AML do you mean, specifically?  The _OFF method for the root
port's _PR3 power resource or something else?
> Basically the difference is that when Windows 7 or Linux (the _REV==5
> check) then we directly do link disable whereas in Windows 8+ we invoke
> LKDS() method that puts the link into L2/L3. None of the fields they
> access seem to touch the GPU itself.
So that may be where the problem is.

Putting the downstream component into PCI D[1-3] is expected to put
the link into L1, so I'm not sure how that plays with the later
attempt to put it into L2/L3 Ready.

Also, L2/L3 Ready is expected to be transient, so finally power should
be removed somehow.
> LKDS() for the first PEG port looks like this:
>
>    P0L2 = One
>    Sleep (0x10)
>    Local0 = Zero
>    While (P0L2)
>    {
>         If ((Local0 > 0x04))
>         {
>             Break
>         }
>
>         Sleep (0x10)
>         Local0++
>    }
>
> One thing that comes to mind is that the loop can end even if P0L2 is
> not cleared as it does only 5 iterations with 16 ms sleep between. Maybe
> Sleep() is implemented differently in Windows? I mean Linux may be
> "faster" here and return prematurely and if we leave the port
into D0
> this does not happen, or something. I'm just throwing out ideas :)
But this actually works for the downstream component in D0, doesn't it?

Also, if the downstream component is in D0, the port actually should
stay in D0 too, so what would happen with the $subject patch applied?

Rafael J. Wysocki

2019-Nov-21 15:47 UTC

head link

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

On Thu, Nov 21, 2019 at 1:53 PM Karol Herbst <kherbst at redhat.com>
wrote:>
> On Thu, Nov 21, 2019 at 12:46 PM Mika Westerberg
> <mika.westerberg at intel.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > <mika.westerberg at intel.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki
wrote:
> > > > > > last week or so I found systems where the GPU was
under the "PCI
> > > > > > Express Root Port" (name from lspci) and on
those systems all of that
> > > > > > seems to work. So I am wondering if it's
indeed just the 0x1901 one,
> > > > > > which also explains Mikas case that Thunderbolt
stuff works as devices
> > > > > > never get populated under this particular bridge
controller, but under
> > > > > > those "Root Port"s
> > > > >
> > > > > It always is a PCIe port, but its location within the
SoC may matter.
> > > >
> > > > Exactly. Intel hardware has PCIe ports on CPU side (these
are called
> > > > PEG, PCI Express Graphics, ports), and the PCH side. I think
the IP is
> > > > still the same.
> > > >
>
> yeah, I meant the bridge controller with the ID 0x1901 is on the CPU
> side. And if the Nvidia GPU is on a port on the PCH side it all seems
> to work just fine.
But that may involve different AML too, may it not?
> > > > > Also some custom AML-based power management is involved
and that may
> > > > > be making specific assumptions on the configuration of
the SoC and the
> > > > > GPU at the time of its invocation which unfortunately
are not known to
> > > > > us.
> > > > >
> > > > > However, it looks like the AML invoked to power down
the GPU from
> > > > > acpi_pci_set_power_state() gets confused if it is not
in PCI D0 at
> > > > > that point, so it looks like that AML tries to access
device memory on
> > > > > the GPU (beyond the PCI config space) or similar which
is not
> > > > > accessible in PCI power states below D0.
> > > >
> > > > Or the PCI config space of the GPU when the parent root port
is in D3hot
> > > > (as it is the case here). Also then the GPU config space is
not
> > > > accessible.
> > >
> > > Why would the parent port be in D3hot at that point? 
Wouldn't that be
> > > a suspend ordering violation?
> >
> > No. We put the GPU into D3hot first, then the root port and then turn
> > off the power resource (which is attached to the root port) resulting
> > the topology entering D3cold.
> >
>
> If the kernel does a D0 -> D3hot -> D0 cycle this works as well, but
> the power savings are way lower, so I kind of prefer skipping D3hot
> instead of D3cold. Skipping D3hot doesn't seem to make any difference
> in power savings in my testing.
OK

What exactly did you do to skip D3cold in your testing?

Reasonably Related Threads

Search for more possibly parallel threads

Nouveau - Nov 2019 - [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

[Nouveau] [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

Reasonably Related Threads