Peter Wu
2016-May-27 01:07 UTC
[Nouveau] [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:> nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0, > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally. > We therefore leak a runtime pm ref whenever nouveau is loaded with > runpm=0 and then unloaded. The GPU will subsequently never runtime > suspend even if nouveau is loaded again with runpm=1. > > Fix by taking the runtime pm ref under the same condition that it was > released on driver load. > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)") > Cc: Dave Airlie <airlied at redhat.com> > Reported-by: Karol Herbst <nouveau at karolherbst.de> > Tested-by: Karol Herbst <nouveau at karolherbst.de> > Signed-off-by: Lukas Wunner <lukas at wunner.de>Looks good, I tested this scenario: ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;} ru # reports 1 modprobe nouveau runpm=0 ru # reports 2 rmmod nouveau ru # reports 1 Without runpm=0 the count drops to 0 in the second step and stays 0 in the third step. After applying patch 2/9, this correctly reports 1 as expected (this is the same as manually setting power/control to on). Peter> --- > drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c > index 11f8dd9..faf7438 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev) > { > struct nouveau_drm *drm = nouveau_drm(dev); > > - pm_runtime_get_sync(dev->dev); > + if (nouveau_runtime_pm != 0) { > + pm_runtime_get_sync(dev->dev); > + } > + > nouveau_fbcon_fini(dev); > nouveau_accel_fini(drm); > nouveau_hwmon_fini(dev); > -- > 2.8.1 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau-- Kind regards, Peter Wu https://lekensteyn.nl
Lukas Wunner
2016-May-29 15:50 UTC
[Nouveau] [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
Hi Peter, On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote:> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote: > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0, > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally. > > We therefore leak a runtime pm ref whenever nouveau is loaded with > > runpm=0 and then unloaded. The GPU will subsequently never runtime > > suspend even if nouveau is loaded again with runpm=1. > > > > Fix by taking the runtime pm ref under the same condition that it was > > released on driver load. > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)") > > Cc: Dave Airlie <airlied at redhat.com> > > Reported-by: Karol Herbst <nouveau at karolherbst.de> > > Tested-by: Karol Herbst <nouveau at karolherbst.de> > > Signed-off-by: Lukas Wunner <lukas at wunner.de> > > Looks good, I tested this scenario: > > ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;} > ru # reports 1 > modprobe nouveau runpm=0 > ru # reports 2 > rmmod nouveau > ru # reports 1 > > Without runpm=0 the count drops to 0 in the second step and stays 0 in > the third step. After applying patch 2/9, this correctly reports 1 as > expected (this is the same as manually setting power/control to on).How exactly did you reach the situation where the root port didn't wake up when you tried to load nouveau again? (IRC conversation this week.) What's happening is, the PCI core will keep unbound devices (i.e., without driver) in D0 but the runtime status is allowed to change to "suspended". So it'll appear to the kernel as if it was suspended but in reality it stays in D0. Once runtime pm for PCIe ports gets merged, the root port above the GPU will indeed go to D3 in such a situation because the check pm_children_suspended() (called from rpm_check_suspend_allowed()) returns true. I'm not sure if this is desirable or not. If we keep unbound devices in D0, should we allow ports above them to go to D3? In any case, when nouveau is loaded again, local_pci_probe() will call pm_runtime_get_sync(), which will implicitly set the runtime status to "active" and which should also wake parents. So how did you ever reach a point where you loaded nouveau and the root port stayed asleep? Clearly we have a bug there, question is where. This shouldn't work only if pm_runtime_forbid() was called on driver unload. Thanks for the extensive testing! Lukas> > Peter > > > --- > > drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c > > index 11f8dd9..faf7438 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev) > > { > > struct nouveau_drm *drm = nouveau_drm(dev); > > > > - pm_runtime_get_sync(dev->dev); > > + if (nouveau_runtime_pm != 0) { > > + pm_runtime_get_sync(dev->dev); > > + } > > + > > nouveau_fbcon_fini(dev); > > nouveau_accel_fini(drm); > > nouveau_hwmon_fini(dev); > > -- > > 2.8.1 > > > > _______________________________________________ > > Nouveau mailing list > > Nouveau at lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/nouveau > > -- > Kind regards, > Peter Wu > https://lekensteyn.nl
Peter Wu
2016-May-30 17:03 UTC
[Nouveau] [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:> Hi Peter, > > On Fri, May 27, 2016 at 03:07:33AM +0200, Peter Wu wrote: > > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote: > > > nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0, > > > but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally. > > > We therefore leak a runtime pm ref whenever nouveau is loaded with > > > runpm=0 and then unloaded. The GPU will subsequently never runtime > > > suspend even if nouveau is loaded again with runpm=1. > > > > > > Fix by taking the runtime pm ref under the same condition that it was > > > released on driver load. > > > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)") > > > Cc: Dave Airlie <airlied at redhat.com> > > > Reported-by: Karol Herbst <nouveau at karolherbst.de> > > > Tested-by: Karol Herbst <nouveau at karolherbst.de> > > > Signed-off-by: Lukas Wunner <lukas at wunner.de> > > > > Looks good, I tested this scenario: > > > > ru(){ cat /sys/bus/pci/devices/0000\:01:00.0/power/runtime_usage;} > > ru # reports 1 > > modprobe nouveau runpm=0 > > ru # reports 2 > > rmmod nouveau > > ru # reports 1 > > > > Without runpm=0 the count drops to 0 in the second step and stays 0 in > > the third step. After applying patch 2/9, this correctly reports 1 as > > expected (this is the same as manually setting power/control to on). > > How exactly did you reach the situation where the root port didn't wake > up when you tried to load nouveau again? (IRC conversation this week.)Ensure that the pci/pm patches are applied, then: 0. Unload nouveau (I have blacklisted it for testing). 1. Enable rpm for the root port and children (control = auto). 2. Verify in the kernel logs that the devices are sleeping: pcieport 0000:00:01.0: power state changed by ACPI to D3cold 3. (Optional, to rule out issues with delays:) Disable rpm for the Nvidia device (control = on). 4. modprobe nouveau. The above test with v4.6 + 4 pci/pm patches (8b71f565) gives: 50.245795 MXM: GUID detected in BIOS 50.245948 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246044 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246110 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246289 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246457 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported 50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle 50.247084 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D 50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0 50.491893 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D 50.492285 pcieport 0000:00:01.0: PME# disabled 50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff) 50.492687 nouveau: probe of 0000:01:00.0 failed with error -12 50.501990 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._S0W] at AML address ffffc90000010a8e length 2 50.502403 pcieport 0000:00:01.0: PME# enabled 50.502601 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D 50.513005 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._OFF] at AML address ffffc90000010994 length 6D 50.533258 pcieport 0000:00:01.0: power state changed by ACPI to D3cold (Note that this patch is not included.) When nouveau is operating normally, I see that _PS0 is also called (which does not happen above). If you think that mixing power resources with DSM causes this issue, I also tried to apply my power resources work for nouveau but it gives the same problem: 20.183306 MXM: GUID detected in BIOS 20.183606 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.184158 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.184547 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 20.185152 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported 20.185351 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle 20.185384 nouveau: detected PR support, will not use DSM 20.185552 nouveau 0000:01:00.0: enabling device (0000 -> 0003) 20.185873 nouveau 0000:01:00.0: unknown chipset (ffffffff) 20.185946 nouveau: probe of 0000:01:00.0 failed with error -12> What's happening is, the PCI core will keep unbound devices (i.e., > without driver) in D0 but the runtime status is allowed to change > to "suspended". So it'll appear to the kernel as if it was suspended > but in reality it stays in D0. > > Once runtime pm for PCIe ports gets merged, the root port above the > GPU will indeed go to D3 in such a situation because the check > pm_children_suspended() (called from rpm_check_suspend_allowed()) > returns true. > > I'm not sure if this is desirable or not. If we keep unbound devices > in D0, should we allow ports above them to go to D3?Maybe Rafael (linux-pm / linux-pci) can answer this question better? The comments in local_pci_probe, pci_pm_runtime_suspend and pci_pm_runtime_resume suggest that unbound devices are assumed in D0 which is apparently not the case when runtime PM is enabled.> In any case, when nouveau is loaded again, local_pci_probe() will > call pm_runtime_get_sync(), which will implicitly set the runtime > status to "active" and which should also wake parents. So how did > you ever reach a point where you loaded nouveau and the root port > stayed asleep? Clearly we have a bug there, question is where. > This shouldn't work only if pm_runtime_forbid() was called on > driver unload. > > Thanks for the extensive testing! > LukasBoth devices (root port and Nvidia) were resumed, but somehow the Nvidia card was not fully initialized/ready (as you can see in the above logs). Peter> > > > Peter > > > > > --- > > > drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > index 11f8dd9..faf7438 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > @@ -498,7 +498,10 @@ nouveau_drm_unload(struct drm_device *dev) > > > { > > > struct nouveau_drm *drm = nouveau_drm(dev); > > > > > > - pm_runtime_get_sync(dev->dev); > > > + if (nouveau_runtime_pm != 0) { > > > + pm_runtime_get_sync(dev->dev); > > > + } > > > + > > > nouveau_fbcon_fini(dev); > > > nouveau_accel_fini(drm); > > > nouveau_hwmon_fini(dev); > > > -- > > > 2.8.1 > > > > > > _______________________________________________ > > > Nouveau mailing list > > > Nouveau at lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/nouveau
Reasonably Related Threads
- [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
- [PATCH 1/9] drm/nouveau: Don't leak runtime pm ref on driver unload
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- [Bug 98240] New: Kernel module fails to load on HP Pavilion V3A33AV laptop.
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups