Mika Westerberg
2019-Oct-22 12:44 UTC
[Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote:> I think there is something I totally forgot about: > > When there was never a driver bound to the GPU, and if runtime power > management gets enabled on that device, runtime suspend/resume works > as expected (I am not 100% sure on if that always works, but I will > recheck that).AFAIK, if there is no driver bound to the PCI device it is left to D0 regardless of the runtime PM state which could explain why it works in that case (it is never put into D3hot). I looked at the acpidump you sent and there is one thing that may explain the differences between Windows and Linux. Not sure if you were aware of this already, though. The power resource PGOF() method has this: If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05)))) { ... } If I read it right, the later condition tries to detect Linux which fails nowadays but if you have acpi_rev_override in the command line (or the machine is listed in acpi_rev_dmi_table) this check passes and does some magic which is not clear to me. There is similar in PGON() side which is used to turn the device back on. You can check what actually happens when _ON()/_OFF() is called by passing something like below to the kernel command line: acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 acpi.trace_method_name=\_SB.PCI0.PEG0.PG00._ON acpi.trace_state=method (See also Documentation/firmware-guide/acpi/method-tracing.rst). Trace goes to system dmesg.
Karol Herbst
2019-Oct-22 12:51 UTC
[Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
On Tue, Oct 22, 2019 at 2:45 PM Mika Westerberg <mika.westerberg at intel.com> wrote:> > On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote: > > I think there is something I totally forgot about: > > > > When there was never a driver bound to the GPU, and if runtime power > > management gets enabled on that device, runtime suspend/resume works > > as expected (I am not 100% sure on if that always works, but I will > > recheck that). > > AFAIK, if there is no driver bound to the PCI device it is left to D0 > regardless of the runtime PM state which could explain why it works in > that case (it is never put into D3hot). > > I looked at the acpidump you sent and there is one thing that may > explain the differences between Windows and Linux. Not sure if you were > aware of this already, though. The power resource PGOF() method has > this: > > If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05)))) { > ... > } >I think this is the fallback to some older method of runtime suspending the device, and I think it will end up touching different registers on the bridge controller which do not show the broken behaviour. You'll find references to following variables which all cause a link to be powered down: Q0L2 (newest), P0L2, P0LD (oldest, I think). Maybe I remember incorrectly and have to read the code again... okay, the fallback path uses P0LD indeed. That's actually the only register of those being documented by Intel afaik.> If I read it right, the later condition tries to detect Linux which > fails nowadays but if you have acpi_rev_override in the command line (or > the machine is listed in acpi_rev_dmi_table) this check passes and does > some magic which is not clear to me. There is similar in PGON() side > which is used to turn the device back on. > > You can check what actually happens when _ON()/_OFF() is called by > passing something like below to the kernel command line: > > acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 acpi.trace_method_name=\_SB.PCI0.PEG0.PG00._ON acpi.trace_state=method > > (See also Documentation/firmware-guide/acpi/method-tracing.rst). > > Trace goes to system dmesg.This sounds to be very helpful, I'll give it a try.
Mika Westerberg
2019-Oct-23 09:00 UTC
[Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
On Tue, Oct 22, 2019 at 02:51:53PM +0200, Karol Herbst wrote:> On Tue, Oct 22, 2019 at 2:45 PM Mika Westerberg > <mika.westerberg at intel.com> wrote: > > > > On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote: > > > I think there is something I totally forgot about: > > > > > > When there was never a driver bound to the GPU, and if runtime power > > > management gets enabled on that device, runtime suspend/resume works > > > as expected (I am not 100% sure on if that always works, but I will > > > recheck that). > > > > AFAIK, if there is no driver bound to the PCI device it is left to D0 > > regardless of the runtime PM state which could explain why it works in > > that case (it is never put into D3hot). > > > > I looked at the acpidump you sent and there is one thing that may > > explain the differences between Windows and Linux. Not sure if you were > > aware of this already, though. The power resource PGOF() method has > > this: > > > > If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05)))) { > > ... > > } > > > > I think this is the fallback to some older method of runtime > suspending the device, and I think it will end up touching different > registers on the bridge controller which do not show the broken > behaviour.I think it actually tries to identify older Windows and then Linux (the _REV == 0x05 check comes from that). So at least some point Dell people have experiment this on Linux.
Apparently Analagous Threads
- [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges