Mika Westerberg
2016-May-30 13:09 UTC
[Nouveau] [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
On Mon, May 30, 2016 at 02:20:10PM +0200, Peter Wu wrote:> On Mon, May 30, 2016 at 12:57:09PM +0300, Mika Westerberg wrote: > > +Rafael > > > > On Fri, May 27, 2016 at 01:10:37PM +0200, Peter Wu wrote: > > > On Wed, May 25, 2016 at 04:55:35PM +0300, Mika Westerberg wrote: > > > > On Wed, May 25, 2016 at 12:53:01AM +0200, Peter Wu wrote: > > > > > Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe port > > > > > can be runtime-suspended which disables power resources via ACPI. This > > > > > is incompatible with DSM, resulting in a GPU device which is still in D3 > > > > > and locks up the kernel on resume. > > > > > > > > > > Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi > > > > > debugger trace) and stop using the DSM functions for D3cold when power > > > > > resources are available on the parent PCIe port. > > > > > > > > > > [1]: https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold > > > > > > > > > > Signed-off-by: Peter Wu <peter at lekensteyn.nl> > > > > > --- > > > > > drivers/gpu/drm/nouveau/nouveau_acpi.c | 34 ++++++++++++++++++++++++++++++---- > > > > > 1 file changed, 30 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c > > > > > index df9f73e..e469df7 100644 > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c > > > > > @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv { > > > > > bool dsm_detected; > > > > > bool optimus_detected; > > > > > bool optimus_flags_detected; > > > > > + bool optimus_skip_dsm; > > > > > acpi_handle dhandle; > > > > > acpi_handle rom_handle; > > > > > } nouveau_dsm_priv; > > > > > @@ -212,8 +213,26 @@ static const struct vga_switcheroo_handler nouveau_dsm_handler = { > > > > > .get_client_id = nouveau_dsm_get_client_id, > > > > > }; > > > > > > > > > > +/* Firmware supporting Windows 8 or later do not use _DSM to put the device into > > > > > + * D3cold, they instead rely on disabling power resources on the parent. */ > > > > > +static bool nouveau_pr3_present(struct pci_dev *pdev) > > > > > +{ > > > > > + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); > > > > > + struct acpi_device *ad; > > > > > > > > Nit: please call this adev instead of ad. > > > > > > Will do. > > > > > > > > + > > > > > + if (!parent_pdev) > > > > > + return false; > > > > > + > > > > > + ad = ACPI_COMPANION(&parent_pdev->dev); > > > > > + if (!ad) > > > > > + return false; > > > > > + > > > > > + return ad->power.flags.power_resources; > > > > > > > > Is this sufficient to tell if the parent device has _PR3? I thought it > > > > returns true if it has power resources in general, not necessarily _PR3. > > > > > > > > Otherwise this looks okay to me. > > > > > > It is indeed set whenever there is any _PRx method. I wonder if it is > > > appropriate to access fields directly like this, perhaps this would be > > > more accurate (based on device_pm.c): > > > > > > /* Check whether the _PR3 method is available. */ > > > return adev->power.states[ACPI_STATE_D3_COLD].flags.valid; > > > > > > I am also considering adding a check in case the pcieport driver does > > > not support D3cold via runtime PM, what do you think of this? > > > > > > if (!parent_pdev) > > > return false; > > > /* If the PCIe port does not support D3cold via runtime PM, allow a > > > * fallback to the Optimus DSM method to put the device in D3cold. */ > > > if (parent_pdev->no_d3cold) > > > return false; > > > > > > This is needed to avoid the regression reported in the cover letter, but > > > also allows pre-2015 systems to (still) have the D3cold possibility. > > > > The _DSM method with 0 as index parameter should return a bit field > > telling which functions are supported. Sane BIOS disables that > > particular function if it detects Windows 8 and newer. Have you checked > > if that's the case? > > > > Then you can call _DSM only if it is supported and otherwise expect the > > parent device's power resources to turn off power when runtime > > suspended. > > The _DSM methods (for the Nvidia device) are often still included and > functions are reported as supported. I guess that vendors just check > whether it is working and do not bother removing legacy functions. The > Acer case below seems exceptional. > > I suggested the no_d3cold check such that DSM can still be called even > though the runtime PM on the PCIe port does nothing.Somehow it does not feel right to poke parent device's fields directly. What if you just check if it has the method like: bool no_dsm = acpi_has_method(parent_adev->handle, "_PR3"); That should follow what Windows is doing.> > > Out of curiosity I looked up an pre-2015 laptop (found Acer V5-573G, > > > apparently from November 2013, Windows 8.1) and extracted the ACPI > > > tables from the BIOS images. BIOS 2.28 (2014/05/13) introduces support > > > for power resources on the parent devicea(\_SB.PCI0.PEG0._PR3 and a > > > related NVP3 device) when _OSI("Windows 2013") is true. (This is added > > > as alternative for the old DSM interface.) > > > > > > Maybe 2014 is also an appropriate cutoff date? I wonder if it is > > > feasible to detect firmware use of _OSI("Windows 2013") and use that > > > instead of the BIOS year. > > > > Using BIOS year works even if there is no ACPI available. > > I thought that you need support from ACPI to put a device in D3cold?It is not just about D3cold but D3 in general (which includes also D3hot). Yes, you need platform support to put the device into D3cold.> > What comes to the cutoff date, I discussed with Rafael and it was > > decided that we use the same year Windows 10 was released to be on the > > safe side. Reading the links you provided here: > > > > https://msdn.microsoft.com/fi-fi/windows/hardware/drivers/bringup/device-power-management > > https://msdn.microsoft.com/en-us/library/windows/hardware/hh967709(v=vs.85).aspx > > > > it seems that from Windows 8 they started transitioning devices into > > D3cold during runtime as well. > > My impression from the ACPI tables I have seen so far is that power > resources support is enabled for Windows 2012 (Win8) or newer.
Peter Wu
2016-May-30 16:13 UTC
[Nouveau] [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
On Mon, May 30, 2016 at 04:09:09PM +0300, Mika Westerberg wrote: ...> > > > > > + > > > > > > + if (!parent_pdev) > > > > > > + return false; > > > > > > + > > > > > > + ad = ACPI_COMPANION(&parent_pdev->dev); > > > > > > + if (!ad) > > > > > > + return false; > > > > > > + > > > > > > + return ad->power.flags.power_resources; > > > > > > > > > > Is this sufficient to tell if the parent device has _PR3? I thought it > > > > > returns true if it has power resources in general, not necessarily _PR3. > > > > > > > > > > Otherwise this looks okay to me. > > > > > > > > It is indeed set whenever there is any _PRx method. I wonder if it is > > > > appropriate to access fields directly like this, perhaps this would be > > > > more accurate (based on device_pm.c): > > > > > > > > /* Check whether the _PR3 method is available. */ > > > > return adev->power.states[ACPI_STATE_D3_COLD].flags.valid; > > > > > > > > I am also considering adding a check in case the pcieport driver does > > > > not support D3cold via runtime PM, what do you think of this? > > > > > > > > if (!parent_pdev) > > > > return false; > > > > /* If the PCIe port does not support D3cold via runtime PM, allow a > > > > * fallback to the Optimus DSM method to put the device in D3cold. */ > > > > if (parent_pdev->no_d3cold) > > > > return false; > > > > > > > > This is needed to avoid the regression reported in the cover letter, but > > > > also allows pre-2015 systems to (still) have the D3cold possibility. > > > > > > The _DSM method with 0 as index parameter should return a bit field > > > telling which functions are supported. Sane BIOS disables that > > > particular function if it detects Windows 8 and newer. Have you checked > > > if that's the case? > > > > > > Then you can call _DSM only if it is supported and otherwise expect the > > > parent device's power resources to turn off power when runtime > > > suspended. > > > > The _DSM methods (for the Nvidia device) are often still included and > > functions are reported as supported. I guess that vendors just check > > whether it is working and do not bother removing legacy functions. The > > Acer case below seems exceptional. > > > > I suggested the no_d3cold check such that DSM can still be called even > > though the runtime PM on the PCIe port does nothing. > > Somehow it does not feel right to poke parent device's fields directly. > > What if you just check if it has the method like: > > bool no_dsm = acpi_has_method(parent_adev->handle, "_PR3"); > > That should follow what Windows is doing.Checking for _PR3 was the intention, but it seems that the ACPI core does not really store it somewhere. Your check should be simple enough, I'll use that in the next version. Do you have any suggestions for the case where the pcieport driver refuses to put the bridge in D3 (because the BIOS is too old)? In that case the nouveau driver needs to fallback to the DSM method (but not when runtime PM is deliberately disabled by writing control=on). -- Kind regards, Peter Wu https://lekensteyn.nl
Mika Westerberg
2016-May-31 08:43 UTC
[Nouveau] [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
On Mon, May 30, 2016 at 06:13:51PM +0200, Peter Wu wrote:> Do you have any suggestions for the case where the pcieport driver > refuses to put the bridge in D3 (because the BIOS is too old)? In that > case the nouveau driver needs to fallback to the DSM method (but not > when runtime PM is deliberately disabled by writing control=on).Do you know what Windows does then? I think we should do the same if possible. If user has disabled runtime PM from the root port deliberately, there might be good reason to do so. Why we want to fallback to something that could cause problems? I mean _DSM on such systems is probably not that much tested because everybody runs Windows 8+ and using standard ACPI power resources.
Lukas Wunner
2016-May-31 12:20 UTC
[Nouveau] [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
On Mon, May 30, 2016 at 06:13:51PM +0200, Peter Wu wrote:> Do you have any suggestions for the case where the pcieport driver > refuses to put the bridge in D3 (because the BIOS is too old)? In that > case the nouveau driver needs to fallback to the DSM method (but not > when runtime PM is deliberately disabled by writing control=on).The BIOS cut-off date is meant to avoid issues when suspending ports on older chipsets. However if the port is used for an Optimus GPU and we can clearly identify that, and there's a _PR3 method provided, it's probably safe to say that the port is *intended* to be suspended. So you may want to consider amending pci_bridge_d3_possible() to allow D3 for such ports regardless of the BIOS date, as I've done for Thunderbolt in this commit: https://github.com/l1k/linux/commit/3cb8549cd4e5 Not sure how to uniquely identify such ports though. Perhaps check if there's a device in slot 0 below the port which has (pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY && (pdev->vendor == PCI_VENDOR_ID_NVIDIA || pdev->vendor == PCI_VENDOR_ID_ATI) Best regards, Lukas
Maybe Matching Threads
- [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
- [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
- [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
- [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM
- [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM