Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 0/9] drm/nouveau/kms: A bunch of runtime_pm fixes
While working on refactoring how we handle connector hotplugging/probing + a bunch of misc DP stuff, I found a bunch of runtime_pm errors that were mostly introduced by me at one point. Oops. A lot of these aren't triggered consistently on a lot of systems, so it's not terribly surprising these got swept under the rug. Lyude Paul (9): drm/nouveau/kms: Handle -EINPROGRESS in nouveau_connector_hotplug() drm/nouveau/kms: Fix rpm leak in nouveau_connector_hotplug() drm/nouveau/kms/fbcon: Correct pm_runtime calls in nouveau_fbcon_release() drm/nouveau/kms/fbcon: Fix pm_runtime calls in nouveau_fbcon_output_poll_changed() drm/nouveau/kms/fbcon: Use pm_runtime_put_autosuspend() in suspend work drm/nouveau/kms: Use pm_runtime_put_autosuspend() in hpd_work drm/nouveau/kms: Invert conditionals in nouveau_display_acpi_ntfy() drm/nouveau/kms: Fix runtime PM leak in nouveau_display_acpi_ntfy() drm/nouveau/kms: Handle -EINPROGRESS in nouveau_display_acpi_ntfy() drivers/gpu/drm/nouveau/nouveau_connector.c | 5 +- drivers/gpu/drm/nouveau/nouveau_display.c | 52 ++++++++++----------- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 18 ++++--- 3 files changed, 41 insertions(+), 34 deletions(-) -- 2.26.2
Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 1/9] drm/nouveau/kms: Handle -EINPROGRESS in nouveau_connector_hotplug()
Looks like that we forgot to handle -EINPROGRESS being returned by pm_runtime_get(), which can happen if multiple callers try to asynchronously resume the GPU before it wakes up. This is perfectly normal and OK, so fix this by treating -EINPROGRESS as success. Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: 3e1a12754d4d ("drm/nouveau: Fix deadlocks in nouveau_connector_detect()") Cc: stable at vger.kernel.org Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 7674025a4bfe8..38e226b8cfd05 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -1173,7 +1173,7 @@ nouveau_connector_hotplug(struct nvif_notify *notify) } ret = pm_runtime_get(drm->dev->dev); - if (ret == 0) { + if (ret == 0 || ret == -EINPROGRESS) { /* We can't block here if there's a pending PM request * running, as we'll deadlock nouveau_display_fini() when it * calls nvif_put() on our nvif_notify struct. So, simply -- 2.26.2
Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 2/9] drm/nouveau/kms: Fix rpm leak in nouveau_connector_hotplug()
Found another one, we forget to drop the runtime PM reference we grab here in the event of a failure. So, do that. Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: 3e1a12754d4d ("drm/nouveau: Fix deadlocks in nouveau_connector_detect()") Cc: stable at vger.kernel.org Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_connector.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 38e226b8cfd05..ab35153e31b72 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -1188,6 +1188,9 @@ nouveau_connector_hotplug(struct nvif_notify *notify) } else if (ret != 1 && ret != -EACCES) { NV_WARN(drm, "HPD on %s dropped due to RPM failure: %d\n", name, ret); + + pm_runtime_mark_last_busy(drm->dev->dev); + pm_runtime_put_autosuspend(drm->dev->dev); return NVIF_NOTIFY_DROP; } -- 2.26.2
Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 3/9] drm/nouveau/kms/fbcon: Correct pm_runtime calls in nouveau_fbcon_release()
We want to update the last busy timer for our device and use pm_runtime_put_autosuspend() here instead so that our GPU can autosuspend when we're done. Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: f231976c2e89 ("drm/nouveau/fbcon: take runpm reference when userspace has an open fd") Cc: Ben Skeggs <bskeggs at redhat.com> Cc: stable at vger.kernel.org Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.3+ --- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c index fad8030ec1f81..4d9f3b5ae72d2 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c @@ -200,8 +200,11 @@ static int nouveau_fbcon_release(struct fb_info *info, int user) { struct nouveau_fbdev *fbcon = info->par; - struct nouveau_drm *drm = nouveau_drm(fbcon->helper.dev); - pm_runtime_put(drm->dev->dev); + struct drm_device *dev = fbcon->helper.dev; + + pm_runtime_mark_last_busy(dev->dev); + pm_runtime_put_autosuspend(dev->dev); + return 0; } -- 2.26.2
Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 4/9] drm/nouveau/kms/fbcon: Fix pm_runtime calls in nouveau_fbcon_output_poll_changed()
Noticed two problems here: * We're not dropping our runtime PM refs after getting an error * We're not backing off when pm_runtime_get() indicates that there's already a resume in progress (-EINPROGRESS) (after which any delayed fbcon events will get handled anyway) So, let's fix those. Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: 7fec8f5379fb ("drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests") Cc: stable at vger.kernel.org Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c index 4d9f3b5ae72d2..b936bf1c14dec 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c @@ -503,10 +503,7 @@ nouveau_fbcon_output_poll_changed(struct drm_device *dev) ret = pm_runtime_get(dev->dev); if (ret == 1 || ret == -EACCES) { drm_fb_helper_hotplug_event(&fbcon->helper); - - pm_runtime_mark_last_busy(dev->dev); - pm_runtime_put_autosuspend(dev->dev); - } else if (ret == 0) { + } else if (ret == 0 || ret == -EINPROGRESS) { /* If the GPU was already in the process of suspending before * this event happened, then we can't block here as we'll * deadlock the runtime pmops since they wait for us to @@ -516,11 +513,15 @@ nouveau_fbcon_output_poll_changed(struct drm_device *dev) NV_DEBUG(drm, "fbcon HPD event deferred until runtime resume\n"); fbcon->hotplug_waiting = true; pm_runtime_put_noidle(drm->dev->dev); + goto out; } else { DRM_WARN("fbcon HPD event lost due to RPM failure: %d\n", ret); } + pm_runtime_mark_last_busy(dev->dev); + pm_runtime_put_autosuspend(dev->dev); +out: mutex_unlock(&fbcon->hotplug_lock); } -- 2.26.2
Lyude Paul
2020-Jul-29 21:36 UTC
[Nouveau] [PATCH 5/9] drm/nouveau/kms/fbcon: Use pm_runtime_put_autosuspend() in suspend work
While I don't know of any problems this has caused, it's definitely not a great idea for us to potentially block in nouveau_fbcon_set_suspend_work(). We don't really need to anyway, and want to simply trigger the autosuspend timer instead. Signed-off-by: Lyude Paul <lyude at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c index b936bf1c14dec..2234f012ca0c3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c @@ -468,7 +468,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work) if (state == FBINFO_STATE_RUNNING) { nouveau_fbcon_hotplug_resume(drm->fbcon); pm_runtime_mark_last_busy(drm->dev->dev); - pm_runtime_put_sync(drm->dev->dev); + pm_runtime_put_autosuspend(drm->dev->dev); } } -- 2.26.2
Lyude Paul
2020-Jul-29 21:37 UTC
[Nouveau] [PATCH 6/9] drm/nouveau/kms: Use pm_runtime_put_autosuspend() in hpd_work
Again, we don't have any need to suspend the device synchronously here, and doing so could in theory lead to a deadlock (although it's unlikely since we've called pm_runtime_mark_last_busy() before-hand). Signed-off-by: Lyude Paul <lyude at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 63c58f12458c9..cdc5a2200f95e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -444,7 +444,7 @@ nouveau_display_hpd_work(struct work_struct *work) drm_helper_hpd_irq_event(drm->dev); pm_runtime_mark_last_busy(drm->dev->dev); - pm_runtime_put_sync(drm->dev->dev); + pm_runtime_put_autosuspend(drm->dev->dev); } #ifdef CONFIG_ACPI -- 2.26.2
Lyude Paul
2020-Jul-29 21:37 UTC
[Nouveau] [PATCH 7/9] drm/nouveau/kms: Invert conditionals in nouveau_display_acpi_ntfy()
No functional changes here, just a drive-by cleanup. Signed-off-by: Lyude Paul <lyude at redhat.com> [cc'd to stable since the next fix needs this patch to apply] Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early") Cc: stable at vger.kernel.org Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_display.c | 50 +++++++++++------------ 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index cdc5a2200f95e..96c9e7f550537 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -457,34 +457,32 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb, unsigned long val, struct acpi_bus_event *info = data; int ret; - if (!strcmp(info->device_class, ACPI_VIDEO_CLASS)) { - if (info->type == ACPI_VIDEO_NOTIFY_PROBE) { - ret = pm_runtime_get(drm->dev->dev); - if (ret == 1 || ret == -EACCES) { - /* If the GPU is already awake, or in a state - * where we can't wake it up, it can handle - * it's own hotplug events. - */ - pm_runtime_put_autosuspend(drm->dev->dev); - } else if (ret == 0) { - /* This may be the only indication we receive - * of a connector hotplug on a runtime - * suspended GPU, schedule hpd_work to check. - */ - NV_DEBUG(drm, "ACPI requested connector reprobe\n"); - schedule_work(&drm->hpd_work); - pm_runtime_put_noidle(drm->dev->dev); - } else { - NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n", - ret); - } - - /* acpi-video should not generate keypresses for this */ - return NOTIFY_BAD; - } + if (strcmp(info->device_class, ACPI_VIDEO_CLASS) || + info->type != ACPI_VIDEO_NOTIFY_PROBE) + return NOTIFY_DONE; + + ret = pm_runtime_get(drm->dev->dev); + if (ret == 1 || ret == -EACCES) { + /* If the GPU is already awake, or in a state + * where we can't wake it up, it can handle + * it's own hotplug events. + */ + pm_runtime_put_autosuspend(drm->dev->dev); + } else if (ret == 0) { + /* This may be the only indication we receive + * of a connector hotplug on a runtime + * suspended GPU, schedule hpd_work to check. + */ + NV_DEBUG(drm, "ACPI requested connector reprobe\n"); + schedule_work(&drm->hpd_work); + pm_runtime_put_noidle(drm->dev->dev); + } else { + NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n", + ret); } - return NOTIFY_DONE; + /* acpi-video should not generate keypresses for this */ + return NOTIFY_BAD; } #endif -- 2.26.2
Lyude Paul
2020-Jul-29 21:37 UTC
[Nouveau] [PATCH 8/9] drm/nouveau/kms: Fix runtime PM leak in nouveau_display_acpi_ntfy()
Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early") Cc: stable at vger.kernel.org Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 96c9e7f550537..ad63d4f052deb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -479,6 +479,8 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb, unsigned long val, } else { NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n", ret); + pm_runtime_mark_last_busy(drm->dev->dev); + pm_runtime_put_autosuspend(drm->dev->dev); } /* acpi-video should not generate keypresses for this */ -- 2.26.2
Lyude Paul
2020-Jul-29 21:37 UTC
[Nouveau] [PATCH 9/9] drm/nouveau/kms: Handle -EINPROGRESS in nouveau_display_acpi_ntfy()
This isn't an error, this just means there's multiple asynchronous resume requests going at the same time. Treat it like a success. Signed-off-by: Lyude Paul <lyude at redhat.com> Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early") Cc: stable at vger.kernel.org Cc: Karol Herbst <kherbst at redhat.com> Cc: Ben Skeggs <bskeggs at redhat.com> Cc: dri-devel at lists.freedesktop.org Cc: nouveau at lists.freedesktop.org Cc: <stable at vger.kernel.org> # v4.19+ --- drivers/gpu/drm/nouveau/nouveau_display.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index ad63d4f052deb..691bb8d37759e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -462,10 +462,10 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb, unsigned long val, return NOTIFY_DONE; ret = pm_runtime_get(drm->dev->dev); - if (ret == 1 || ret == -EACCES) { - /* If the GPU is already awake, or in a state - * where we can't wake it up, it can handle - * it's own hotplug events. + if (ret == 1 || ret == -EACCES || ret == -EINPROGRESS) { + /* If the GPU is already awake, is waking up, or is in a state + * where we can't wake it up, it can handle its own hotplug + * events. */ pm_runtime_put_autosuspend(drm->dev->dev); } else if (ret == 0) { -- 2.26.2
Possibly Parallel Threads
- [PATCH v7 0/5] Fix connector probing deadlocks from RPM bugs
- [PATCH v8 0/5] Fix connector probing deadlocks from RPM bugs
- [PATCH v6 0/5] Fix connector probing deadlocks from RPM bugs
- [PATCH v5 00/13] Fix connector probing deadlocks from RPM bugs
- [PATCH v7 3/5] drm/nouveau: Fix deadlock with fb_helper with async RPM requests