thr3ads.net - Nouveau - [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Lukas Wunner

2018-Feb-11 09:38 UTC

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

Fix a deadlock on hybrid graphics laptops that's been present since 2013:

DRM drivers poll connectors in 10 sec intervals.  The poll worker is
stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
the poll worker invokes the DRM drivers' ->detect callbacks, which call
pm_runtime_get_sync().  If the poll worker starts after runtime suspend
has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
with the intention of runtime resuming the device afterwards.  The result
is a circular wait between poll worker and autosuspend worker.

I'm seeing this deadlock so often it's not funny anymore. I've also
found
3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
[4/5].)

One theoretical solution would be to add a flag to the ->detect callback
to tell it that it's running in the poll worker's context.  In that case
it doesn't need to call pm_runtime_get_sync() because the poll worker is
only enabled while runtime active and we know that ->runtime_suspend
waits for it to finish.  But this approach would require touching every
single DRM driver's ->detect hook.  Moreover the ->detect hook is
called
from numerous other places, both in the DRM library as well as in each
driver, making it necessary to trace every possible call chain and check
if it's coming from the poll worker or not.  I've tried to do that for
nouveau (see the call sites listed in the commit message of patch [3/5])
and concluded that this approach is a nightmare to implement.

Another reason for the infeasibility of this approach is that ->detect
is called from within the DRM library without driver involvement, e.g.
via DRM's sysfs interface.  In those cases, pm_runtime_get_sync() would
have to be called by the DRM library, but the DRM library is supposed to
stay generic, wheras runtime PM is driver-specific.

So instead, I've come up with this much simpler solution which gleans
from the current task's flags if it's a workqueue worker, retrieves the
work_struct and checks if it's the poll worker.  All that's needed is
one small helper in the workqueue code and another small helper in the
DRM library.  This solution lends itself nicely to stable-backporting.

The patches for radeon and amdgpu are compile-tested only, I only have a
MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
"msleep(12*1000);" at the top of the driver's ->runtime_suspend
hook.
This ensures that the poll worker runs after ->runtime_suspend has begun.
Wait 12 sec after the GPU has begun runtime suspend, then check
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
series, the status will be stuck at "suspending" and you'll get
hung task
errors in dmesg after a few minutes.

i915, malidp and msm "solved" this issue by not stopping the poll
worker
on runtime suspend.  But this results in the GPU bouncing back and forth
between D0 and D3 continuously.  That's a total no-go for GPUs which
runtime suspend to D3cold since every suspend/resume cycle costs a
significant amount of time and energy.  (i915 and malidp do not seem
to acquire a runtime PM ref in the ->detect callbacks, which seems
questionable.  msm however does and would also deadlock if it disabled
the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)

Please review.  Thanks,

Lukas

Lukas Wunner (5):
  workqueue: Allow retrieval of current task's work struct
  drm: Allow determining if current task is output poll worker
  drm/nouveau: Fix deadlock on runtime suspend
  drm/radeon: Fix deadlock on runtime suspend
  drm/amdgpu: Fix deadlock on runtime suspend

 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
 drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
 drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
 drivers/gpu/drm/radeon/radeon_connectors.c     | 74 +++++++++++++++++---------
 include/drm/drm_crtc_helper.h                  |  1 +
 include/linux/workqueue.h                      |  1 +
 kernel/workqueue.c                             | 16 ++++++
 7 files changed, 132 insertions(+), 50 deletions(-)

-- 
2.15.1

Lukas Wunner

2018-Feb-11 09:38 UTC

head link

[Nouveau] [PATCH 4/5] drm/radeon: Fix deadlock on runtime suspend

radeon's ->runtime_suspend hook calls drm_kms_helper_poll_disable(),
which waits for the output poll worker to finish if it's running.

The output poll worker meanwhile calls pm_runtime_get_sync() in
radeon's ->detect hooks, which waits for the ongoing suspend to finish,
causing a deadlock.

Fix by not acquiring a runtime PM ref if the ->detect hooks are called
in the output poll worker's context.  This is safe because the poll
worker is only enabled while runtime active and we know that
->runtime_suspend waits for it to finish.

Stack trace for posterity:

  INFO: task kworker/0:3:31847 blocked for more than 120 seconds
  Workqueue: events output_poll_execute [drm_kms_helper]
  Call Trace:
   schedule+0x3c/0x90
   rpm_resume+0x1e2/0x690
   __pm_runtime_resume+0x3f/0x60
   radeon_lvds_detect+0x39/0xf0 [radeon]
   output_poll_execute+0xda/0x1e0 [drm_kms_helper]
   process_one_work+0x14b/0x440
   worker_thread+0x48/0x4a0

  INFO: task kworker/2:0:10493 blocked for more than 120 seconds.
  Workqueue: pm pm_runtime_work
  Call Trace:
   schedule+0x3c/0x90
   schedule_timeout+0x1b3/0x240
   wait_for_common+0xc2/0x180
   wait_for_completion+0x1d/0x20
   flush_work+0xfc/0x1a0
   __cancel_work_timer+0xa5/0x1d0
   cancel_delayed_work_sync+0x13/0x20
   drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
   radeon_pmops_runtime_suspend+0x3d/0xa0 [radeon]
   pci_pm_runtime_suspend+0x61/0x1a0
   vga_switcheroo_runtime_suspend+0x21/0x70
   __rpm_callback+0x32/0x70
   rpm_callback+0x24/0x80
   rpm_suspend+0x12b/0x640
   pm_runtime_work+0x6f/0xb0
   process_one_work+0x14b/0x440
   worker_thread+0x48/0x4a0

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94147
Fixes: 10ebc0bc0934 ("drm/radeon: add runtime PM support (v2)")
Cc: stable at vger.kernel.org # v3.13+: 1234567890ab: workqueue: Allow retrieval
of current task's work struct
Cc: stable at vger.kernel.org # v3.13+: 1234567890ab: drm: Allow determining if
current task is output poll worker
Cc: Ismo Toijala <ismo.toijala at gmail.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Cc: Dave Airlie <airlied at redhat.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/radeon/radeon_connectors.c | 74 ++++++++++++++++++++----------
 1 file changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c
b/drivers/gpu/drm/radeon/radeon_connectors.c
index 5012f5e47a1e..2e2ca3c6b47d 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -899,9 +899,11 @@ radeon_lvds_detect(struct drm_connector *connector, bool
force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	int r;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (encoder) {
 		struct radeon_encoder *radeon_encoder = to_radeon_encoder(encoder);
@@ -924,8 +926,12 @@ radeon_lvds_detect(struct drm_connector *connector, bool
force)
 	/* check acpi lid status ??? */
 
 	radeon_connector_update_scratch_regs(connector, ret);
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
+
 	return ret;
 }
 
@@ -1039,9 +1045,11 @@ radeon_vga_detect(struct drm_connector *connector, bool
force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	int r;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	encoder = radeon_best_single_encoder(connector);
 	if (!encoder)
@@ -1108,8 +1116,10 @@ radeon_vga_detect(struct drm_connector *connector, bool
force)
 	radeon_connector_update_scratch_regs(connector, ret);
 
 out:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
@@ -1173,9 +1183,11 @@ radeon_tv_detect(struct drm_connector *connector, bool
force)
 	if (!radeon_connector->dac_load_detect)
 		return ret;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	encoder = radeon_best_single_encoder(connector);
 	if (!encoder)
@@ -1187,8 +1199,12 @@ radeon_tv_detect(struct drm_connector *connector, bool
force)
 	if (ret == connector_status_connected)
 		ret = radeon_connector_analog_encoder_conflict_solve(connector, encoder, ret,
false);
 	radeon_connector_update_scratch_regs(connector, ret);
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
+
 	return ret;
 }
 
@@ -1251,9 +1267,11 @@ radeon_dvi_detect(struct drm_connector *connector, bool
force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	bool dret = false, broken_edid = false;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (radeon_connector->detected_hpd_without_ddc) {
 		force = true;
@@ -1436,8 +1454,10 @@ radeon_dvi_detect(struct drm_connector *connector, bool
force)
 	}
 
 exit:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
@@ -1688,9 +1708,11 @@ radeon_dp_detect(struct drm_connector *connector, bool
force)
 	if (radeon_dig_connector->is_mst)
 		return connector_status_disconnected;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (!force && radeon_check_hpd_status_unchanged(connector)) {
 		ret = connector->status;
@@ -1777,8 +1799,10 @@ radeon_dp_detect(struct drm_connector *connector, bool
force)
 	}
 
 out:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
-- 
2.15.1

Lukas Wunner

2018-Feb-11 09:38 UTC

head link

[Nouveau] [PATCH 3/5] drm/nouveau: Fix deadlock on runtime suspend

nouveau's ->runtime_suspend hook calls drm_kms_helper_poll_disable(),
which waits for the output poll worker to finish if it's running.

The output poll worker meanwhile calls pm_runtime_get_sync() in
nouveau_connector_detect() which waits for the ongoing suspend to finish,
causing a deadlock.

Fix by not acquiring a runtime PM ref if nouveau_connector_detect() is
called in the output poll worker's context.  This is safe because
the poll worker is only enabled while runtime active and we know that
->runtime_suspend waits for it to finish.

Other contexts calling nouveau_connector_detect() do require a runtime
PM ref, these comprise:

  status_store() drm sysfs interface
  ->fill_modes drm callback
  drm_fb_helper_probe_connector_modes()
  drm_mode_getconnector()
  nouveau_connector_hotplug()
  nouveau_display_hpd_work()
  nv17_tv_set_property()

Stack trace for posterity:

  INFO: task kworker/0:1:58 blocked for more than 120 seconds.
  Workqueue: events output_poll_execute [drm_kms_helper]
  Call Trace:
   schedule+0x28/0x80
   rpm_resume+0x107/0x6e0
   __pm_runtime_resume+0x47/0x70
   nouveau_connector_detect+0x7e/0x4a0 [nouveau]
   nouveau_connector_detect_lvds+0x132/0x180 [nouveau]
   drm_helper_probe_detect_ctx+0x85/0xd0 [drm_kms_helper]
   output_poll_execute+0x11e/0x1c0 [drm_kms_helper]
   process_one_work+0x184/0x380
   worker_thread+0x2e/0x390

  INFO: task kworker/0:2:252 blocked for more than 120 seconds.
  Workqueue: pm pm_runtime_work
  Call Trace:
   schedule+0x28/0x80
   schedule_timeout+0x1e3/0x370
   wait_for_completion+0x123/0x190
   flush_work+0x142/0x1c0
   nouveau_pmops_runtime_suspend+0x7e/0xd0 [nouveau]
   pci_pm_runtime_suspend+0x5c/0x180
   vga_switcheroo_runtime_suspend+0x1e/0xa0
   __rpm_callback+0xc1/0x200
   rpm_callback+0x1f/0x70
   rpm_suspend+0x13c/0x640
   pm_runtime_work+0x6e/0x90
   process_one_work+0x184/0x380
   worker_thread+0x2e/0x390

Bugzilla: https://bugs.archlinux.org/task/53497
Bugzilla: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870523
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388#c33
Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
Cc: stable at vger.kernel.org # v3.12+: 1234567890ab: workqueue: Allow retrieval
of current task's work struct
Cc: stable at vger.kernel.org # v3.12+: 1234567890ab: drm: Allow determining if
current task is output poll worker
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: Dave Airlie <airlied at redhat.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 69d6e61a01ec..6ed9cb053dfa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -570,9 +570,15 @@ nouveau_connector_detect(struct drm_connector *connector,
bool force)
 		nv_connector->edid = NULL;
 	}
 
-	ret = pm_runtime_get_sync(connector->dev->dev);
-	if (ret < 0 && ret != -EACCES)
-		return conn_status;
+	/* Outputs are only polled while runtime active, so acquiring a
+	 * runtime PM ref here is unnecessary (and would deadlock upon
+	 * runtime suspend because it waits for polling to finish).
+	 */
+	if (!drm_kms_helper_is_poll_worker()) {
+		ret = pm_runtime_get_sync(connector->dev->dev);
+		if (ret < 0 && ret != -EACCES)
+			return conn_status;
+	}
 
 	nv_encoder = nouveau_connector_ddc_detect(connector);
 	if (nv_encoder && (i2c = nv_encoder->i2c) != NULL) {
@@ -647,8 +653,10 @@ nouveau_connector_detect(struct drm_connector *connector,
bool force)
 
  out:
 
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return conn_status;
 }
-- 
2.15.1

Lukas Wunner

2018-Feb-11 09:38 UTC

head link

[Nouveau] [PATCH 5/5] drm/amdgpu: Fix deadlock on runtime suspend

amdgpu's ->runtime_suspend hook calls drm_kms_helper_poll_disable(),
which waits for the output poll worker to finish if it's running.

The output poll worker meanwhile calls pm_runtime_get_sync() in
amdgpu's ->detect hooks, which waits for the ongoing suspend to finish,
causing a deadlock.

Fix by not acquiring a runtime PM ref if the ->detect hooks are called
in the output poll worker's context.  This is safe because the poll
worker is only enabled while runtime active and we know that
->runtime_suspend waits for it to finish.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Cc: stable at vger.kernel.org # v4.2+: 1234567890ab: workqueue: Allow retrieval
of current task's work struct
Cc: stable at vger.kernel.org # v4.2+: 1234567890ab: drm: Allow determining if
current task is output poll worker
Cc: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++++++---------
 1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 8ca3783f2deb..74d2efaec52f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -736,9 +736,11 @@ amdgpu_connector_lvds_detect(struct drm_connector
*connector, bool force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	int r;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (encoder) {
 		struct amdgpu_encoder *amdgpu_encoder = to_amdgpu_encoder(encoder);
@@ -757,8 +759,12 @@ amdgpu_connector_lvds_detect(struct drm_connector
*connector, bool force)
 	/* check acpi lid status ??? */
 
 	amdgpu_connector_update_scratch_regs(connector, ret);
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
+
 	return ret;
 }
 
@@ -868,9 +874,11 @@ amdgpu_connector_vga_detect(struct drm_connector
*connector, bool force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	int r;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	encoder = amdgpu_connector_best_single_encoder(connector);
 	if (!encoder)
@@ -924,8 +932,10 @@ amdgpu_connector_vga_detect(struct drm_connector
*connector, bool force)
 	amdgpu_connector_update_scratch_regs(connector, ret);
 
 out:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
@@ -988,9 +998,11 @@ amdgpu_connector_dvi_detect(struct drm_connector
*connector, bool force)
 	enum drm_connector_status ret = connector_status_disconnected;
 	bool dret = false, broken_edid = false;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (!force && amdgpu_connector_check_hpd_status_unchanged(connector))
{
 		ret = connector->status;
@@ -1115,8 +1127,10 @@ amdgpu_connector_dvi_detect(struct drm_connector
*connector, bool force)
 	amdgpu_connector_update_scratch_regs(connector, ret);
 
 exit:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
@@ -1359,9 +1373,11 @@ amdgpu_connector_dp_detect(struct drm_connector
*connector, bool force)
 	struct drm_encoder *encoder = amdgpu_connector_best_single_encoder(connector);
 	int r;
 
-	r = pm_runtime_get_sync(connector->dev->dev);
-	if (r < 0)
-		return connector_status_disconnected;
+	if (!drm_kms_helper_is_poll_worker()) {
+		r = pm_runtime_get_sync(connector->dev->dev);
+		if (r < 0)
+			return connector_status_disconnected;
+	}
 
 	if (!force && amdgpu_connector_check_hpd_status_unchanged(connector))
{
 		ret = connector->status;
@@ -1429,8 +1445,10 @@ amdgpu_connector_dp_detect(struct drm_connector
*connector, bool force)
 
 	amdgpu_connector_update_scratch_regs(connector, ret);
 out:
-	pm_runtime_mark_last_busy(connector->dev->dev);
-	pm_runtime_put_autosuspend(connector->dev->dev);
+	if (!drm_kms_helper_is_poll_worker()) {
+		pm_runtime_mark_last_busy(connector->dev->dev);
+		pm_runtime_put_autosuspend(connector->dev->dev);
+	}
 
 	return ret;
 }
-- 
2.15.1

Lukas Wunner

2018-Feb-11 09:38 UTC

head link

[Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker

Introduce a helper to determine if the current task is an output poll
worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for the output poll worker
to finish and the worker in turn calls a ->detect callback which waits
for runtime suspend to finish.  The ->detect callback is invoked from
multiple call sites and waiting for runtime suspend to finish is the
correct thing to do except if it's executing in the context of the
worker.

Cc: Dave Airlie <airlied at redhat.com>
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/drm_probe_helper.c | 14 ++++++++++++++
 include/drm/drm_crtc_helper.h      |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 555fbe54d6e2..019881d15ce1 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -653,6 +653,20 @@ static void output_poll_execute(struct work_struct *work)
 		schedule_delayed_work(delayed_work, DRM_OUTPUT_POLL_PERIOD);
 }
 
+/**
+ * drm_kms_helper_is_poll_worker - is %current task an output poll worker?
+ *
+ * Determine if %current task is an output poll worker.  This can be used
+ * to select distinct code paths for output polling versus other contexts.
+ */
+bool drm_kms_helper_is_poll_worker(void)
+{
+	struct work_struct *work = current_work();
+
+	return work && work->func == output_poll_execute;
+}
+EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
+
 /**
  * drm_kms_helper_poll_disable - disable output polling
  * @dev: drm_device
diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h
index 76e237bd989b..6914633037a5 100644
--- a/include/drm/drm_crtc_helper.h
+++ b/include/drm/drm_crtc_helper.h
@@ -77,5 +77,6 @@ void drm_kms_helper_hotplug_event(struct drm_device *dev);
 
 void drm_kms_helper_poll_disable(struct drm_device *dev);
 void drm_kms_helper_poll_enable(struct drm_device *dev);
+bool drm_kms_helper_is_poll_worker(void);
 
 #endif
-- 
2.15.1

Lukas Wunner

2018-Feb-11 09:38 UTC

head link

[Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

Introduce a helper to retrieve the current task's work struct if it is
a workqueue worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for a specific worker to
finish and that worker in turn calls a function which waits for runtime
suspend to finish.  That function is invoked from multiple call sites
and waiting for runtime suspend to finish is the correct thing to do
except if it's executing in the context of the worker.

Cc: Tejun Heo <tj at kernel.org>
Cc: Lai Jiangshan <jiangshanlai at gmail.com>
Cc: Dave Airlie <airlied at redhat.com>
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 include/linux/workqueue.h |  1 +
 kernel/workqueue.c        | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 4a54ef96aff5..bc0cda180c8b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -465,6 +465,7 @@ extern bool cancel_delayed_work_sync(struct delayed_work
*dwork);
 
 extern void workqueue_set_max_active(struct workqueue_struct *wq,
 				     int max_active);
+extern struct work_struct *current_work(void);
 extern bool current_is_workqueue_rescuer(void);
 extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
 extern unsigned int work_busy(struct work_struct *work);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 017044c26233..bb9a519cbf50 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4179,6 +4179,22 @@ void workqueue_set_max_active(struct workqueue_struct
*wq, int max_active)
 }
 EXPORT_SYMBOL_GPL(workqueue_set_max_active);
 
+/**
+ * current_work - retrieve %current task's work struct
+ *
+ * Determine if %current task is a workqueue worker and what it's working
on.
+ * Useful to find out the context that the %current task is running in.
+ *
+ * Return: work struct if %current task is a workqueue worker, %NULL otherwise.
+ */
+struct work_struct *current_work(void)
+{
+	struct worker *worker = current_wq_worker();
+
+	return worker ? worker->current_work : NULL;
+}
+EXPORT_SYMBOL(current_work);
+
 /**
  * current_is_workqueue_rescuer - is %current workqueue rescuer?
  *
-- 
2.15.1

Mike Lothian

2018-Feb-11 18:58 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On 11 February 2018 at 09:38, Lukas Wunner <lukas at wunner.de>
wrote:> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
>
> DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> the poll worker invokes the DRM drivers' ->detect callbacks, which
call
> pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards.  The result
> is a circular wait between poll worker and autosuspend worker.
>
> I'm seeing this deadlock so often it's not funny anymore. I've
also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
>
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context.  In that
case
> it doesn't need to call pm_runtime_get_sync() because the poll worker
is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish.  But this approach would require touching every
> single DRM driver's ->detect hook.  Moreover the ->detect hook is
called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not.  I've tried to do that
for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
>
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface.  In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
>
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves
the
> work_struct and checks if it's the poll worker.  All that's needed
is
> one small helper in the workqueue code and another small helper in the
> DRM library.  This solution lends itself nicely to stable-backporting.
>
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
> "msleep(12*1000);" at the top of the driver's
->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has
begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
> series, the status will be stuck at "suspending" and you'll
get hung task
> errors in dmesg after a few minutes.
>
> i915, malidp and msm "solved" this issue by not stopping the poll
worker
> on runtime suspend.  But this results in the GPU bouncing back and forth
> between D0 and D3 continuously.  That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy.  (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable.  msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
>
> Please review.  Thanks,
>
> Lukas
>
> Lukas Wunner (5):
>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
>  drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
>  drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
>  drivers/gpu/drm/radeon/radeon_connectors.c     | 74
+++++++++++++++++---------
>  include/drm/drm_crtc_helper.h                  |  1 +
>  include/linux/workqueue.h                      |  1 +
>  kernel/workqueue.c                             | 16 ++++++
>  7 files changed, 132 insertions(+), 50 deletions(-)
>
> --
> 2.15.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

I wasn't quite sure where to add that msleep. I've tested the patches
as is on top of agd5f's wip branch without ill effects

I've had a radeon and now a amdgpu PRIME setup and don't believe
I've
ever seen this issue

If you could pop a patch together for the msleep I'll give it a test on
amdgpu

Cheers

Mike

Lukas Wunner

2018-Feb-11 19:23 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Sun, Feb 11, 2018 at 06:58:11PM +0000, Mike Lothian
wrote:> On 11 February 2018 at 09:38, Lukas Wunner <lukas at wunner.de>
wrote:
> > The patches for radeon and amdgpu are compile-tested only, I only have
a
> > MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
> > "msleep(12*1000);" at the top of the driver's
->runtime_suspend hook.
> > This ensures that the poll worker runs after ->runtime_suspend has
begun.
> > Wait 12 sec after the GPU has begun runtime suspend, then check
> > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
> > series, the status will be stuck at "suspending" and
you'll get hung task
> > errors in dmesg after a few minutes.
> 
> I wasn't quite sure where to add that msleep. I've tested the
patches
> as is on top of agd5f's wip branch without ill effects
> 
> I've had a radeon and now a amdgpu PRIME setup and don't believe
I've
> ever seen this issue
> 
> If you could pop a patch together for the msleep I'll give it a test on
> amdgpu
Here you go, this is for all 3 drivers.
Should deadlock without the series.
Thanks!


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 50afcf6..eefa0d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -718,6 +718,9 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
+	printk("waiting 12 sec\n");
+	msleep(12*1000);
+	printk("done waiting 12 sec\n");
 	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 	drm_kms_helper_poll_disable(drm_dev);
 	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF);
diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 555fbe5..ee7cf0d 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -586,6 +586,7 @@ static void output_poll_execute(struct work_struct *work)
 		repoll = true;
 		goto out;
 	}
+	dev_info(&dev->pdev->dev, "begin poll\n");
 
 	drm_connector_list_iter_begin(dev, &conn_iter);
 	drm_for_each_connector_iter(connector, &conn_iter) {
@@ -651,6 +652,7 @@ static void output_poll_execute(struct work_struct *work)
 
 	if (repoll)
 		schedule_delayed_work(delayed_work, DRM_OUTPUT_POLL_PERIOD);
+	dev_info(&dev->pdev->dev, "end poll\n");
 }
 
 /**
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 3e29302..f9da5bc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -855,6 +855,9 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
 		return -EBUSY;
 	}
 
+	printk("waiting 12 sec\n");
+	msleep(12*1000);
+	printk("done waiting 12 sec\n");
 	drm_kms_helper_poll_disable(drm_dev);
 	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF);
 	nouveau_switcheroo_optimus_dsm();
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c
b/drivers/gpu/drm/radeon/radeon_drv.c
index 31dd04f..707b8aa 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -413,6 +413,9 @@ static int radeon_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
+	printk("waiting 12 sec\n");
+	msleep(12*1000);
+	printk("done waiting 12 sec\n");
 	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 	drm_kms_helper_poll_disable(drm_dev);
 	vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF);

Imre Deak

2018-Feb-12 13:04 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:> [...]
> i915, malidp and msm "solved" this issue by not stopping the poll
worker
> on runtime suspend.  But this results in the GPU bouncing back and forth
> between D0 and D3 continuously.  That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy.  (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable.  msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
In i915 polling is on during runtime suspend only if there are outputs
without hotplug interrupt support. A special case is when an output has
working HPD interrupts when in D0, but no interrupts when runtime
suspended. For these we start polling (from a scheduled work) in the
runtime suspend hook and stop it in the runtime resume hook (again from
a scheduled work).

Imo whether to leave polling on or not for the above purpose is a policy
question (configurable with the drm_kms_helper.poll param).

--Imre
> 
> Please review.  Thanks,
> 
> Lukas
> 
> Lukas Wunner (5):
>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
>  drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
>  drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
>  drivers/gpu/drm/radeon/radeon_connectors.c     | 74
+++++++++++++++++---------
>  include/drm/drm_crtc_helper.h                  |  1 +
>  include/linux/workqueue.h                      |  1 +
>  kernel/workqueue.c                             | 16 ++++++
>  7 files changed, 132 insertions(+), 50 deletions(-)
> 
> -- 
> 2.15.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Tejun Heo

2018-Feb-12 17:07 UTC

head link

[Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

Hello,

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:> Introduce a helper to retrieve the current task's work struct if it is
> a workqueue worker.
> 
> This allows us to fix a long-standing deadlock in several DRM drivers
> wherein the ->runtime_suspend callback waits for a specific worker to
> finish and that worker in turn calls a function which waits for runtime
> suspend to finish.  That function is invoked from multiple call sites
> and waiting for runtime suspend to finish is the correct thing to do
> except if it's executing in the context of the worker.
> 
> Cc: Tejun Heo <tj at kernel.org>
> Cc: Lai Jiangshan <jiangshanlai at gmail.com>
> Cc: Dave Airlie <airlied at redhat.com>
> Cc: Ben Skeggs <bskeggs at redhat.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Signed-off-by: Lukas Wunner <lukas at wunner.de>
I wonder whether it's too generic a name but there are other functions
named in a similar fashion and AFAICS current_work isn't used by
anyone in the tree, so it seems okay.

Acked-by: Tejun Heo <tj at kernel.org>

Please feel free to route as you see fit.

Thanks.

-- 
tejun

Lyude Paul

2018-Feb-12 17:43 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

This patchset is a no-brainer for me. I'm ashamed to say I think I might
have
seen this issue before, but polling gets used so rarely nowadays it slipped
under the rug by accident.

Anyway, for the whole series  (except patch #2, which I will respond to with a
short comment in just a moment):

Reviewed-by: Lyude Paul <lyude at redhat.com>

On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote:> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
> 
> DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> the poll worker invokes the DRM drivers' ->detect callbacks, which
call
> pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards.  The result
> is a circular wait between poll worker and autosuspend worker.
> 
> I'm seeing this deadlock so often it's not funny anymore. I've
also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
> 
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context.  In that
case
> it doesn't need to call pm_runtime_get_sync() because the poll worker
is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish.  But this approach would require touching every
> single DRM driver's ->detect hook.  Moreover the ->detect hook is
called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not.  I've tried to do that
for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
> 
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface.  In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
> 
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves
the
> work_struct and checks if it's the poll worker.  All that's needed
is
> one small helper in the workqueue code and another small helper in the
> DRM library.  This solution lends itself nicely to stable-backporting.
> 
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
> "msleep(12*1000);" at the top of the driver's
->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has
begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
> series, the status will be stuck at "suspending" and you'll
get hung task
> errors in dmesg after a few minutes.
> 
> i915, malidp and msm "solved" this issue by not stopping the poll
worker
> on runtime suspend.  But this results in the GPU bouncing back and forth
> between D0 and D3 continuously.  That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy.  (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable.  msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
> 
> Please review.  Thanks,
> 
> Lukas
> 
> Lukas Wunner (5):
>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
>  drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
>  drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
>  drivers/gpu/drm/radeon/radeon_connectors.c     | 74 +++++++++++++++++----
> -----
>  include/drm/drm_crtc_helper.h                  |  1 +
>  include/linux/workqueue.h                      |  1 +
>  kernel/workqueue.c                             | 16 ++++++
>  7 files changed, 132 insertions(+), 50 deletions(-)
> -- 
Cheers,
	Lyude Paul

Lyude Paul

2018-Feb-12 17:46 UTC

head link

[Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker

On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote:> Introduce a helper to determine if the current task is an output poll
> worker.
> 
> This allows us to fix a long-standing deadlock in several DRM drivers
> wherein the ->runtime_suspend callback waits for the output poll worker
> to finish and the worker in turn calls a ->detect callback which waits
> for runtime suspend to finish.  The ->detect callback is invoked from
> multiple call sites and waiting for runtime suspend to finish is the
> correct thing to do except if it's executing in the context of the
> worker.
> 
> Cc: Dave Airlie <airlied at redhat.com>
> Cc: Ben Skeggs <bskeggs at redhat.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Signed-off-by: Lukas Wunner <lukas at wunner.de>
> ---
>  drivers/gpu/drm/drm_probe_helper.c | 14 ++++++++++++++
>  include/drm/drm_crtc_helper.h      |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_probe_helper.c
> b/drivers/gpu/drm/drm_probe_helper.c
> index 555fbe54d6e2..019881d15ce1 100644
> --- a/drivers/gpu/drm/drm_probe_helper.c
> +++ b/drivers/gpu/drm/drm_probe_helper.c
> @@ -653,6 +653,20 @@ static void output_poll_execute(struct work_struct
> *work)
>  		schedule_delayed_work(delayed_work,
> DRM_OUTPUT_POLL_PERIOD);
>  }
>  
> +/**
> + * drm_kms_helper_is_poll_worker - is %current task an output poll worker?
> + *
> + * Determine if %current task is an output poll worker.  This can be used
> + * to select distinct code paths for output polling versus other contexts.
> + */For this, it would be worth explicitly noting in the comments herethat this
should be called by DRM drivers in order to prevent racing with hotplug
polling workers, so that new drivers in the future can avoid implementing this
race condition in their driver.
> +bool drm_kms_helper_is_poll_worker(void)
> +{
> +	struct work_struct *work = current_work();
> +
> +	return work && work->func == output_poll_execute;
> +}
> +EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
> +
>  /**
>   * drm_kms_helper_poll_disable - disable output polling
>   * @dev: drm_device
> diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h
> index 76e237bd989b..6914633037a5 100644
> --- a/include/drm/drm_crtc_helper.h
> +++ b/include/drm/drm_crtc_helper.h
> @@ -77,5 +77,6 @@ void drm_kms_helper_hotplug_event(struct drm_device
*dev);
>  
>  void drm_kms_helper_poll_disable(struct drm_device *dev);
>  void drm_kms_helper_poll_enable(struct drm_device *dev);
> +bool drm_kms_helper_is_poll_worker(void);
>  
>  #endif-- 
Cheers,
	Lyude Paul

Liviu Dudau

2018-Feb-13 10:55 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

Hi Lukas,

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
> 
> DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> the poll worker invokes the DRM drivers' ->detect callbacks, which
call
> pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards.  The result
> is a circular wait between poll worker and autosuspend worker.
> 
> I'm seeing this deadlock so often it's not funny anymore. I've
also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
> 
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context.  In that
case
> it doesn't need to call pm_runtime_get_sync() because the poll worker
is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish.  But this approach would require touching every
> single DRM driver's ->detect hook.  Moreover the ->detect hook is
called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not.  I've tried to do that
for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
> 
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface.  In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
> 
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves
the
> work_struct and checks if it's the poll worker.  All that's needed
is
> one small helper in the workqueue code and another small helper in the
> DRM library.  This solution lends itself nicely to stable-backporting.
> 
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
> "msleep(12*1000);" at the top of the driver's
->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has
begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
> series, the status will be stuck at "suspending" and you'll
get hung task
> errors in dmesg after a few minutes.
> 
> i915, malidp and msm "solved" this issue by not stopping the poll
worker
> on runtime suspend.  But this results in the GPU bouncing back and forth
> between D0 and D3 continuously.  That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy.  (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable.  msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
I think I understand the problem you are trying to solve, but I'm
struggling to understand where malidp makes any specific mistakes. First
of all, malidp is only a display engine, so there is no GPU attached to
it, but that is only a small clarification. Second, malidp doesn't use
directly any of the callbacks that you are referring to, it uses the
drm_cma_xxxx() API plus the generic drm_xxxx() call. So if there are any
issues there (as they might well be) I think they would apply to a lot
more drivers and the fix will involve more than just malidp, i915 and
msm.

Would love to get any clarifications on what I might have misunderstood.

Best regards,
Liviu

> 
> Please review.  Thanks,
> 
> Lukas
> 
> Lukas Wunner (5):
>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
>  drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
>  drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
>  drivers/gpu/drm/radeon/radeon_connectors.c     | 74
+++++++++++++++++---------
>  include/drm/drm_crtc_helper.h                  |  1 +
>  include/linux/workqueue.h                      |  1 +
>  kernel/workqueue.c                             | 16 ++++++
>  7 files changed, 132 insertions(+), 50 deletions(-)
> 
> -- 
> 2.15.1
> 
-- 
===================| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

Lukas Wunner

2018-Feb-13 11:52 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Tue, Feb 13, 2018 at 10:55:06AM +0000, Liviu Dudau
wrote:> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> > stopped on ->runtime_suspend with cancel_delayed_work_sync(). 
However
> > the poll worker invokes the DRM drivers' ->detect callbacks,
which call
> > pm_runtime_get_sync().  If the poll worker starts after runtime
suspend
> > has begun, pm_runtime_get_sync() will wait for runtime suspend to
finish
> > with the intention of runtime resuming the device afterwards.  The
result
> > is a circular wait between poll worker and autosuspend worker.
> 
> I think I understand the problem you are trying to solve, but I'm
> struggling to understand where malidp makes any specific mistakes. First
> of all, malidp is only a display engine, so there is no GPU attached to
> it, but that is only a small clarification. Second, malidp doesn't use
> directly any of the callbacks that you are referring to, it uses the
> drm_cma_xxxx() API plus the generic drm_xxxx() call. So if there are any
> issues there (as they might well be) I think they would apply to a lot
> more drivers and the fix will involve more than just malidp, i915 and
> msm.
I don't know if malidp makes any specific mistakes and didn't mean to
cast it in a negative light, sorry.

So a lot of DRM drivers acquire a runtime PM ref in the connector ->detect
hook because they can't probe connectors while runtime suspended.
E.g. for a PCI device, probing might require mmio access, which isn't
possible outside of power state D0.  There are no ->detect hooks declared
in drivers/gpu/drm/arm/, so it's unclear to me whether you're able to
probe
during runtime suspend.

hdlcd_drv.c and malidp_drv.c both enable output polling.  Output polling
is only necessary if you don't get HPD interrupts.

You're not disabling polling upon runtime suspend.  Thus, if a runtime PM
ref is acquired during polling (such as in a ->detect hook), the GPU will
be runtime resumed every 10 secs.  You may want to verify that's not the
case.  If you decide that you do want to stop polling during runtime
suspend because it runtime resumes the GPU continuously, you'll need the
helper introduced in this series.  So by cc'ing you I just wanted to keep
you in the loop about an issue that may potentially affect your driver.

Let me know if there are any questions.

Thanks,

Lukas

Lukas Wunner

2018-Feb-14 07:41 UTC

head link

[Nouveau] [PATCH v2] drm: Allow determining if current task is output poll worker

Introduce a helper to determine if the current task is an output poll
worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for the output poll worker
to finish and the worker in turn calls a ->detect callback which waits
for runtime suspend to finish.  The ->detect callback is invoked from
multiple call sites and waiting for runtime suspend to finish is the
correct thing to do except if it's executing in the context of the
worker.

v2: Expand kerneldoc to specifically mention deadlock between
    output poll worker and autosuspend worker as use case. (Lyude)

Cc: Dave Airlie <airlied at redhat.com>
Cc: Ben Skeggs <bskeggs at redhat.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Reviewed-by: Lyude Paul <lyude at redhat.com>
Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/drm_probe_helper.c | 20 ++++++++++++++++++++
 include/drm/drm_crtc_helper.h      |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 6dc2dde5b672..7a6b2dc08913 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -654,6 +654,26 @@ static void output_poll_execute(struct work_struct *work)
 		schedule_delayed_work(delayed_work, DRM_OUTPUT_POLL_PERIOD);
 }
 
+/**
+ * drm_kms_helper_is_poll_worker - is %current task an output poll worker?
+ *
+ * Determine if %current task is an output poll worker.  This can be used
+ * to select distinct code paths for output polling versus other contexts.
+ *
+ * One use case is to avoid a deadlock between the output poll worker and
+ * the autosuspend worker wherein the latter waits for polling to finish
+ * upon calling drm_kms_helper_poll_disable(), while the former waits for
+ * runtime suspend to finish upon calling pm_runtime_get_sync() in a
+ * connector ->detect hook.
+ */
+bool drm_kms_helper_is_poll_worker(void)
+{
+	struct work_struct *work = current_work();
+
+	return work && work->func == output_poll_execute;
+}
+EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
+
 /**
  * drm_kms_helper_poll_disable - disable output polling
  * @dev: drm_device
diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h
index 76e237bd989b..6914633037a5 100644
--- a/include/drm/drm_crtc_helper.h
+++ b/include/drm/drm_crtc_helper.h
@@ -77,5 +77,6 @@ void drm_kms_helper_hotplug_event(struct drm_device *dev);
 
 void drm_kms_helper_poll_disable(struct drm_device *dev);
 void drm_kms_helper_poll_enable(struct drm_device *dev);
+bool drm_kms_helper_is_poll_worker(void);
 
 #endif
-- 
2.15.1

Lukas Wunner

2018-Feb-14 08:46 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

Dear drm-misc maintainers,

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
This series has been reviewed, consent has been expressed by the most
interested parties, patch [1/5] which touches files outside drivers/gpu
has been acked and I've just out a v2 addressing the only objection
raised.  My plan is thus to wait another two days for comments and,
barring further objections, push to drm-misc this weekend.

However I'm struggling with the decision whether to push to next or
fixes.  The series is marked for stable, however the number of
affected machines is limited and for an issue that's been present
for 5 years it probably doesn't matter if it soaks another two months
in linux-next befor it gets backported.  Hence I tend to err on the
side of caution and push to next, however a case could be made that
fixes is more appropriate.

I'm lacking experience making such decisions and would be interested
to learn how you'd handle this.

Thanks,

Lukas

Maarten Lankhorst

2018-Feb-14 09:26 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

Op 14-02-18 om 09:46 schreef Lukas Wunner:> Dear drm-misc maintainers,
>
> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
>> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
> This series has been reviewed, consent has been expressed by the most
> interested parties, patch [1/5] which touches files outside drivers/gpu
> has been acked and I've just out a v2 addressing the only objection
> raised.  My plan is thus to wait another two days for comments and,
> barring further objections, push to drm-misc this weekend.
>
> However I'm struggling with the decision whether to push to next or
> fixes.  The series is marked for stable, however the number of
> affected machines is limited and for an issue that's been present
> for 5 years it probably doesn't matter if it soaks another two months
> in linux-next befor it gets backported.  Hence I tend to err on the
> side of caution and push to next, however a case could be made that
> fixes is more appropriate.
>
> I'm lacking experience making such decisions and would be interested
> to learn how you'd handle this.
>
> Thanks,
>
> Lukas
I would say fixes, it doesn't look particularly scary. :)

~Maarten

Lukas Wunner

2018-Feb-17 10:38 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
Pushed to drm-misc-fixes, thanks a lot everyone for the acks,
reviews, testing and comments.

drm-misc maintainers, heads-up:

drm-misc-fixes is still based on 4.15-rc8.  The present series
applies cleanly to both 4.15 and 4.16, so I had no need to have
4.16-rc1 backmerged, but that may be necessary sooner or later.
I did a local test pull into drm-fixes, the shortlog looked sane
and it merged and compiled cleanly.

Please note two other commits are still pending in drm-misc-fixes:

    commit 745fd50f3b044db6a3922e1718306555613164b0
    Author: Daniel Vetter <daniel.vetter at ffwll.ch>
    Date:   Wed Jan 31 12:04:50 2018 +0100

    drm/cirrus: Load lut in crtc_commit

Gustavo sent a pull request for this one on Jan 31 but erroneously
based it on the wrong commit and it ended up not being pulled by Dave.

    commit 54f809cfbd6b4a43959039f5d33596ed3297ce16
    Author: Leo (Sunpeng) Li <sunpeng.li at amd.com>
    Date:   Wed Jan 17 12:51:08 2018 +0100

    drm/atomic: Fix memleak on ERESTARTSYS during non-blocking commits

This one has already been pulled by Dave into drm-next for 4.17
as commit 1c6ceeee6ebb but Maarten subsequently cherry-picked
it onto drm-misc-fixes.

Let me know if I've made any mistakes.

Thanks,

Lukas

Daniel Vetter

2018-Feb-19 11:34 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner
wrote:> Fix a deadlock on hybrid graphics laptops that's been present since
2013:
> 
> DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> the poll worker invokes the DRM drivers' ->detect callbacks, which
call
> pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> with the intention of runtime resuming the device afterwards.  The result
> is a circular wait between poll worker and autosuspend worker.
Don't shut down the poll worker when runtime suspending, that'
doesn't
work. If you need the poll work, then that means waking up the gpu every
few seconds. If you don't need it, then make sure the DRM_CONNECTOR_POLL
flags are set correctly (you can update them at runtime, the poll worker
will pick that up).

That should fix the deadlock, and it's how we do it in i915 (where igt in
CI totally hammers the runtime pm support, and it seems to hold up).

And I guess we need to improve the poll worker docs about this.
> I'm seeing this deadlock so often it's not funny anymore. I've
also found
> 3 nouveau bugzillas about it and 1 radeon bugzilla. (See patch [3/5] and
> [4/5].)
> 
> One theoretical solution would be to add a flag to the ->detect callback
> to tell it that it's running in the poll worker's context.  In that
case
> it doesn't need to call pm_runtime_get_sync() because the poll worker
is
> only enabled while runtime active and we know that ->runtime_suspend
> waits for it to finish.  But this approach would require touching every
> single DRM driver's ->detect hook.  Moreover the ->detect hook is
called
> from numerous other places, both in the DRM library as well as in each
> driver, making it necessary to trace every possible call chain and check
> if it's coming from the poll worker or not.  I've tried to do that
for
> nouveau (see the call sites listed in the commit message of patch [3/5])
> and concluded that this approach is a nightmare to implement.
> 
> Another reason for the infeasibility of this approach is that ->detect
> is called from within the DRM library without driver involvement, e.g.
> via DRM's sysfs interface.  In those cases, pm_runtime_get_sync() would
> have to be called by the DRM library, but the DRM library is supposed to
> stay generic, wheras runtime PM is driver-specific.
> 
> So instead, I've come up with this much simpler solution which gleans
> from the current task's flags if it's a workqueue worker, retrieves
the
> work_struct and checks if it's the poll worker.  All that's needed
is
> one small helper in the workqueue code and another small helper in the
> DRM library.  This solution lends itself nicely to stable-backporting.
> 
> The patches for radeon and amdgpu are compile-tested only, I only have a
> MacBook Pro with an Nvidia GK107 to test.  To test the patches, add an
> "msleep(12*1000);" at the top of the driver's
->runtime_suspend hook.
> This ensures that the poll worker runs after ->runtime_suspend has
begun.
> Wait 12 sec after the GPU has begun runtime suspend, then check
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status.  Without this
> series, the status will be stuck at "suspending" and you'll
get hung task
> errors in dmesg after a few minutes.
> 
> i915, malidp and msm "solved" this issue by not stopping the poll
worker
> on runtime suspend.  But this results in the GPU bouncing back and forth
> between D0 and D3 continuously.  That's a total no-go for GPUs which
> runtime suspend to D3cold since every suspend/resume cycle costs a
> significant amount of time and energy.  (i915 and malidp do not seem
> to acquire a runtime PM ref in the ->detect callbacks, which seems
> questionable.  msm however does and would also deadlock if it disabled
> the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
Well, userspace expects hotplug events, even when we runtime suspend
stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a
pretty serious policy decision which is ok in the context of
vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes
up if you plug something in, even with all the runtime pm stuff enabled.
Same for sata and everything else.
-Daniel
> Please review.  Thanks,
> 
> Lukas
> 
> Lukas Wunner (5):
>   workqueue: Allow retrieval of current task's work struct
>   drm: Allow determining if current task is output poll worker
>   drm/nouveau: Fix deadlock on runtime suspend
>   drm/radeon: Fix deadlock on runtime suspend
>   drm/amdgpu: Fix deadlock on runtime suspend
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 58 +++++++++++++-------
>  drivers/gpu/drm/drm_probe_helper.c             | 14 +++++
>  drivers/gpu/drm/nouveau/nouveau_connector.c    | 18 +++++--
>  drivers/gpu/drm/radeon/radeon_connectors.c     | 74
+++++++++++++++++---------
>  include/drm/drm_crtc_helper.h                  |  1 +
>  include/linux/workqueue.h                      |  1 +
>  kernel/workqueue.c                             | 16 ++++++
>  7 files changed, 132 insertions(+), 50 deletions(-)
> 
> -- 
> 2.15.1
> 
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Lukas Wunner

2018-Feb-19 11:58 UTC

head link

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter
wrote:> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > Fix a deadlock on hybrid graphics laptops that's been present
since 2013:
> > 
> > DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> > stopped on ->runtime_suspend with cancel_delayed_work_sync(). 
However
> > the poll worker invokes the DRM drivers' ->detect callbacks,
which call
> > pm_runtime_get_sync().  If the poll worker starts after runtime
suspend
> > has begun, pm_runtime_get_sync() will wait for runtime suspend to
finish
> > with the intention of runtime resuming the device afterwards.  The
result
> > is a circular wait between poll worker and autosuspend worker.
> 
> Don't shut down the poll worker when runtime suspending, that'
doesn't
> work. If you need the poll work, then that means waking up the gpu every
> few seconds. If you don't need it, then make sure the
DRM_CONNECTOR_POLL
> flags are set correctly (you can update them at runtime, the poll worker
> will pick that up).
> 
> That should fix the deadlock, and it's how we do it in i915 (where igt
in
> CI totally hammers the runtime pm support, and it seems to hold up).
It would fix the deadlock but it's not an option on dual GPU laptops.
Power consumption of the discrete GPU is massive (9 W on my machine).

> > i915, malidp and msm "solved" this issue by not stopping the
poll worker
> > on runtime suspend.  But this results in the GPU bouncing back and
forth
> > between D0 and D3 continuously.  That's a total no-go for GPUs
which
> > runtime suspend to D3cold since every suspend/resume cycle costs a
> > significant amount of time and energy.  (i915 and malidp do not seem
> > to acquire a runtime PM ref in the ->detect callbacks, which seems
> > questionable.  msm however does and would also deadlock if it disabled
> > the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
> 
> Well, userspace expects hotplug events, even when we runtime suspend
> stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a
> pretty serious policy decision which is ok in the context of
> vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes
> up if you plug something in, even with all the runtime pm stuff enabled.
> Same for sata and everything else.
On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into
the gmux controller, which sends an interrupt on hotplug even if the GPU
is powered down.

Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.

For the rare cases where an external VGA or DVI-A port is present, I guess
it's reasonable to have the user wake up the machine manually.

I'm not sure why nouveau polls ports on my laptop, the GK107 only has an
LVDS and three DP ports, need to investigate.

Thanks,

Lukas

Reasonably Related Threads

Search for more reasonably related threads

Nouveau - Feb 2018 - [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 4/5] drm/radeon: Fix deadlock on runtime suspend

[Nouveau] [PATCH 3/5] drm/nouveau: Fix deadlock on runtime suspend

[Nouveau] [PATCH 5/5] drm/amdgpu: Fix deadlock on runtime suspend

[Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker

[Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH v2] drm: Allow determining if current task is output poll worker

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

Reasonably Related Threads