Karol Herbst
2019-Oct-21 16:41 UTC
[Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
On Mon, Oct 21, 2019 at 5:46 PM Mika Westerberg <mika.westerberg at intel.com> wrote:> > On Mon, Oct 21, 2019 at 04:49:09PM +0200, Karol Herbst wrote: > > On Mon, Oct 21, 2019 at 4:09 PM Mika Westerberg > > <mika.westerberg at intel.com> wrote: > > > > > > On Mon, Oct 21, 2019 at 03:54:09PM +0200, Karol Herbst wrote: > > > > > I really would like to provide you more information about such > > > > > workaround but I'm not aware of any ;-) I have not seen any issues like > > > > > this when D3cold is properly implemented in the platform. That's why > > > > > I'm bit skeptical that this has anything to do with specific Intel PCIe > > > > > ports. More likely it is some power sequence in the _ON/_OFF() methods > > > > > that is run differently on Windows. > > > > > > > > yeah.. maybe. I really don't know what's the actual root cause. I just > > > > know that with this workaround it works perfectly fine on my and some > > > > other systems it was tested on. Do you know who would be best to > > > > approach to get proper documentation about those methods and what are > > > > the actual prerequisites of those methods? > > > > > > Those should be documented in the ACPI spec. Chapter 7 should explain > > > power resources and the device power methods in detail. > > > > either I looked up the wrong spec or the documentation isn't really > > saying much there. > > Well it explains those methods, _PSx, _PRx and _ON()/_OFF(). In case of > PCIe device you also want to check PCIe spec. PCIe 5.0 section 5.8 "PCI > Function Power State Transitions" has a picture about the supported > power state transitions and there we can find that function must be in > D3hot before it can be transitioned into D3cold so if the _OFF() for > example blindly assumes that the device is in D0 when it is called, it > is a bug in the BIOS. > > BTW, where can I find acpidump of such system?I am sure it's uploaded somewhere already. But it's not an issue of just one system. It's essentially hitting every single laptop with a skylake or kaby lake CPU + Nvidia GPU. I didn't see any system where it's actually working right now (and we are pestering nvidia about this issue for over a year already with no solution) I've attached an acpidump from my system. -------------- next part -------------- A non-text attachment was scrubbed... Name: xps_9560.tar.xz Type: application/x-xz Size: 286880 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20191021/a9975e6c/attachment-0001.xz>
Mika Westerberg
2019-Oct-22 12:44 UTC
[Nouveau] [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote:> I think there is something I totally forgot about: > > When there was never a driver bound to the GPU, and if runtime power > management gets enabled on that device, runtime suspend/resume works > as expected (I am not 100% sure on if that always works, but I will > recheck that).AFAIK, if there is no driver bound to the PCI device it is left to D0 regardless of the runtime PM state which could explain why it works in that case (it is never put into D3hot). I looked at the acpidump you sent and there is one thing that may explain the differences between Windows and Linux. Not sure if you were aware of this already, though. The power resource PGOF() method has this: If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05)))) { ... } If I read it right, the later condition tries to detect Linux which fails nowadays but if you have acpi_rev_override in the command line (or the machine is listed in acpi_rev_dmi_table) this check passes and does some magic which is not clear to me. There is similar in PGON() side which is used to turn the device back on. You can check what actually happens when _ON()/_OFF() is called by passing something like below to the kernel command line: acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 acpi.trace_method_name=\_SB.PCI0.PEG0.PG00._ON acpi.trace_state=method (See also Documentation/firmware-guide/acpi/method-tracing.rst). Trace goes to system dmesg.
Reasonably Related Threads
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges
- [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges