Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 0/9] drm/nouveau/kms: A bunch of runtime_pm fixes
While working on refactoring how we handle connector hotplugging/probing
+ a bunch of misc DP stuff, I found a bunch of runtime_pm errors that
were mostly introduced by me at one point. Oops. A lot of these aren't
triggered consistently on a lot of systems, so it's not terribly
surprising these got swept under the rug.
Lyude Paul (9):
  drm/nouveau/kms: Handle -EINPROGRESS in nouveau_connector_hotplug()
  drm/nouveau/kms: Fix rpm leak in nouveau_connector_hotplug()
  drm/nouveau/kms/fbcon: Correct pm_runtime calls in
    nouveau_fbcon_release()
  drm/nouveau/kms/fbcon: Fix pm_runtime calls in
    nouveau_fbcon_output_poll_changed()
  drm/nouveau/kms/fbcon: Use pm_runtime_put_autosuspend() in suspend
    work
  drm/nouveau/kms: Use pm_runtime_put_autosuspend() in hpd_work
  drm/nouveau/kms: Invert conditionals in nouveau_display_acpi_ntfy()
  drm/nouveau/kms: Fix runtime PM leak in nouveau_display_acpi_ntfy()
  drm/nouveau/kms: Handle -EINPROGRESS in nouveau_display_acpi_ntfy()
 drivers/gpu/drm/nouveau/nouveau_connector.c |  5 +-
 drivers/gpu/drm/nouveau/nouveau_display.c   | 52 ++++++++++-----------
 drivers/gpu/drm/nouveau/nouveau_fbcon.c     | 18 ++++---
 3 files changed, 41 insertions(+), 34 deletions(-)
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 1/9] drm/nouveau/kms: Handle -EINPROGRESS in nouveau_connector_hotplug()
Looks like that we forgot to handle -EINPROGRESS being returned by
pm_runtime_get(), which can happen if multiple callers try to
asynchronously resume the GPU before it wakes up. This is perfectly
normal and OK, so fix this by treating -EINPROGRESS as success.
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: 3e1a12754d4d ("drm/nouveau: Fix deadlocks in
nouveau_connector_detect()")
Cc: stable at vger.kernel.org
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 7674025a4bfe8..38e226b8cfd05 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1173,7 +1173,7 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 	}
 
 	ret = pm_runtime_get(drm->dev->dev);
-	if (ret == 0) {
+	if (ret == 0 || ret == -EINPROGRESS) {
 		/* We can't block here if there's a pending PM request
 		 * running, as we'll deadlock nouveau_display_fini() when it
 		 * calls nvif_put() on our nvif_notify struct. So, simply
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 2/9] drm/nouveau/kms: Fix rpm leak in nouveau_connector_hotplug()
Found another one, we forget to drop the runtime PM reference we grab
here in the event of a failure. So, do that.
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: 3e1a12754d4d ("drm/nouveau: Fix deadlocks in
nouveau_connector_detect()")
Cc: stable at vger.kernel.org
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 38e226b8cfd05..ab35153e31b72 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1188,6 +1188,9 @@ nouveau_connector_hotplug(struct nvif_notify *notify)
 	} else if (ret != 1 && ret != -EACCES) {
 		NV_WARN(drm, "HPD on %s dropped due to RPM failure: %d\n",
 			name, ret);
+
+		pm_runtime_mark_last_busy(drm->dev->dev);
+		pm_runtime_put_autosuspend(drm->dev->dev);
 		return NVIF_NOTIFY_DROP;
 	}
 
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 3/9] drm/nouveau/kms/fbcon: Correct pm_runtime calls in nouveau_fbcon_release()
We want to update the last busy timer for our device and use
pm_runtime_put_autosuspend() here instead so that our GPU can
autosuspend when we're done.
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: f231976c2e89 ("drm/nouveau/fbcon: take runpm reference when
userspace has an open fd")
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: stable at vger.kernel.org
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.3+
---
 drivers/gpu/drm/nouveau/nouveau_fbcon.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index fad8030ec1f81..4d9f3b5ae72d2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -200,8 +200,11 @@ static int
 nouveau_fbcon_release(struct fb_info *info, int user)
 {
 	struct nouveau_fbdev *fbcon = info->par;
-	struct nouveau_drm *drm = nouveau_drm(fbcon->helper.dev);
-	pm_runtime_put(drm->dev->dev);
+	struct drm_device *dev = fbcon->helper.dev;
+
+	pm_runtime_mark_last_busy(dev->dev);
+	pm_runtime_put_autosuspend(dev->dev);
+
 	return 0;
 }
 
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 4/9] drm/nouveau/kms/fbcon: Fix pm_runtime calls in nouveau_fbcon_output_poll_changed()
Noticed two problems here:
* We're not dropping our runtime PM refs after getting an error
* We're not backing off when pm_runtime_get() indicates that there's
  already a resume in progress (-EINPROGRESS) (after which any delayed
  fbcon events will get handled anyway)
So, let's fix those.
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: 7fec8f5379fb ("drm/nouveau/drm/nouveau: Fix deadlock with fb_helper
with async RPM requests")
Cc: stable at vger.kernel.org
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_fbcon.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 4d9f3b5ae72d2..b936bf1c14dec 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -503,10 +503,7 @@ nouveau_fbcon_output_poll_changed(struct drm_device *dev)
 	ret = pm_runtime_get(dev->dev);
 	if (ret == 1 || ret == -EACCES) {
 		drm_fb_helper_hotplug_event(&fbcon->helper);
-
-		pm_runtime_mark_last_busy(dev->dev);
-		pm_runtime_put_autosuspend(dev->dev);
-	} else if (ret == 0) {
+	} else if (ret == 0 || ret == -EINPROGRESS) {
 		/* If the GPU was already in the process of suspending before
 		 * this event happened, then we can't block here as we'll
 		 * deadlock the runtime pmops since they wait for us to
@@ -516,11 +513,15 @@ nouveau_fbcon_output_poll_changed(struct drm_device *dev)
 		NV_DEBUG(drm, "fbcon HPD event deferred until runtime resume\n");
 		fbcon->hotplug_waiting = true;
 		pm_runtime_put_noidle(drm->dev->dev);
+		goto out;
 	} else {
 		DRM_WARN("fbcon HPD event lost due to RPM failure: %d\n",
 			 ret);
 	}
 
+	pm_runtime_mark_last_busy(dev->dev);
+	pm_runtime_put_autosuspend(dev->dev);
+out:
 	mutex_unlock(&fbcon->hotplug_lock);
 }
 
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:36 UTC
[Nouveau] [PATCH 5/9] drm/nouveau/kms/fbcon: Use pm_runtime_put_autosuspend() in suspend work
While I don't know of any problems this has caused, it's definitely not
a great idea for us to potentially block in
nouveau_fbcon_set_suspend_work(). We don't really need to anyway, and
want to simply trigger the autosuspend timer instead.
Signed-off-by: Lyude Paul <lyude at redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fbcon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index b936bf1c14dec..2234f012ca0c3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -468,7 +468,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
 	if (state == FBINFO_STATE_RUNNING) {
 		nouveau_fbcon_hotplug_resume(drm->fbcon);
 		pm_runtime_mark_last_busy(drm->dev->dev);
-		pm_runtime_put_sync(drm->dev->dev);
+		pm_runtime_put_autosuspend(drm->dev->dev);
 	}
 }
 
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:37 UTC
[Nouveau] [PATCH 6/9] drm/nouveau/kms: Use pm_runtime_put_autosuspend() in hpd_work
Again, we don't have any need to suspend the device synchronously here, and doing so could in theory lead to a deadlock (although it's unlikely since we've called pm_runtime_mark_last_busy() before-hand). Signed-off-by: Lyude Paul <lyude at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 63c58f12458c9..cdc5a2200f95e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -444,7 +444,7 @@ nouveau_display_hpd_work(struct work_struct *work) drm_helper_hpd_irq_event(drm->dev); pm_runtime_mark_last_busy(drm->dev->dev); - pm_runtime_put_sync(drm->dev->dev); + pm_runtime_put_autosuspend(drm->dev->dev); } #ifdef CONFIG_ACPI -- 2.26.2
Lyude Paul
2020-Jul-29  21:37 UTC
[Nouveau] [PATCH 7/9] drm/nouveau/kms: Invert conditionals in nouveau_display_acpi_ntfy()
No functional changes here, just a drive-by cleanup.
Signed-off-by: Lyude Paul <lyude at redhat.com>
[cc'd to stable since the next fix needs this patch to apply]
Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD
events too early")
Cc: stable at vger.kernel.org
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 50 +++++++++++------------
 1 file changed, 24 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c
b/drivers/gpu/drm/nouveau/nouveau_display.c
index cdc5a2200f95e..96c9e7f550537 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -457,34 +457,32 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb,
unsigned long val,
 	struct acpi_bus_event *info = data;
 	int ret;
 
-	if (!strcmp(info->device_class, ACPI_VIDEO_CLASS)) {
-		if (info->type == ACPI_VIDEO_NOTIFY_PROBE) {
-			ret = pm_runtime_get(drm->dev->dev);
-			if (ret == 1 || ret == -EACCES) {
-				/* If the GPU is already awake, or in a state
-				 * where we can't wake it up, it can handle
-				 * it's own hotplug events.
-				 */
-				pm_runtime_put_autosuspend(drm->dev->dev);
-			} else if (ret == 0) {
-				/* This may be the only indication we receive
-				 * of a connector hotplug on a runtime
-				 * suspended GPU, schedule hpd_work to check.
-				 */
-				NV_DEBUG(drm, "ACPI requested connector reprobe\n");
-				schedule_work(&drm->hpd_work);
-				pm_runtime_put_noidle(drm->dev->dev);
-			} else {
-				NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n",
-					ret);
-			}
-
-			/* acpi-video should not generate keypresses for this */
-			return NOTIFY_BAD;
-		}
+	if (strcmp(info->device_class, ACPI_VIDEO_CLASS) ||
+	    info->type != ACPI_VIDEO_NOTIFY_PROBE)
+		return NOTIFY_DONE;
+
+	ret = pm_runtime_get(drm->dev->dev);
+	if (ret == 1 || ret == -EACCES) {
+		/* If the GPU is already awake, or in a state
+		 * where we can't wake it up, it can handle
+		 * it's own hotplug events.
+		 */
+		pm_runtime_put_autosuspend(drm->dev->dev);
+	} else if (ret == 0) {
+		/* This may be the only indication we receive
+		 * of a connector hotplug on a runtime
+		 * suspended GPU, schedule hpd_work to check.
+		 */
+		NV_DEBUG(drm, "ACPI requested connector reprobe\n");
+		schedule_work(&drm->hpd_work);
+		pm_runtime_put_noidle(drm->dev->dev);
+	} else {
+		NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n",
+			ret);
 	}
 
-	return NOTIFY_DONE;
+	/* acpi-video should not generate keypresses for this */
+	return NOTIFY_BAD;
 }
 #endif
 
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:37 UTC
[Nouveau] [PATCH 8/9] drm/nouveau/kms: Fix runtime PM leak in nouveau_display_acpi_ntfy()
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD
events too early")
Cc: stable at vger.kernel.org
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 96c9e7f550537..ad63d4f052deb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -479,6 +479,8 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb,
unsigned long val,
 	} else {
 		NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n",
 			ret);
+		pm_runtime_mark_last_busy(drm->dev->dev);
+		pm_runtime_put_autosuspend(drm->dev->dev);
 	}
 
 	/* acpi-video should not generate keypresses for this */
-- 
2.26.2
Lyude Paul
2020-Jul-29  21:37 UTC
[Nouveau] [PATCH 9/9] drm/nouveau/kms: Handle -EINPROGRESS in nouveau_display_acpi_ntfy()
This isn't an error, this just means there's multiple asynchronous
resume requests going at the same time. Treat it like a success.
Signed-off-by: Lyude Paul <lyude at redhat.com>
Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD
events too early")
Cc: stable at vger.kernel.org
Cc: Karol Herbst <kherbst at redhat.com>
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: dri-devel at lists.freedesktop.org
Cc: nouveau at lists.freedesktop.org
Cc: <stable at vger.kernel.org> # v4.19+
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c
b/drivers/gpu/drm/nouveau/nouveau_display.c
index ad63d4f052deb..691bb8d37759e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -462,10 +462,10 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb,
unsigned long val,
 		return NOTIFY_DONE;
 
 	ret = pm_runtime_get(drm->dev->dev);
-	if (ret == 1 || ret == -EACCES) {
-		/* If the GPU is already awake, or in a state
-		 * where we can't wake it up, it can handle
-		 * it's own hotplug events.
+	if (ret == 1 || ret == -EACCES || ret == -EINPROGRESS) {
+		/* If the GPU is already awake, is waking up, or is in a state
+		 * where we can't wake it up, it can handle its own hotplug
+		 * events.
 		 */
 		pm_runtime_put_autosuspend(drm->dev->dev);
 	} else if (ret == 0) {
-- 
2.26.2
Reasonably Related Threads
- [PATCH v7 0/5] Fix connector probing deadlocks from RPM bugs
 - [PATCH v8 0/5] Fix connector probing deadlocks from RPM bugs
 - [PATCH v6 0/5] Fix connector probing deadlocks from RPM bugs
 - [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs
 - [PATCH v7 3/5] drm/nouveau: Fix deadlock with fb_helper with async RPM requests