thr3ads.net - Nouveau - [Nouveau] [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Lyude Paul

2018-Jul-16 23:59 UTC

[Nouveau] [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks

This fixes quite a number of runtime PM bugs I found that have been
causing some pretty nasty issues such as:
 - Deadlocking on boot
 - Connector probing potentially not working while the GPU is in runtime
   suspend
 - i2c char dev not working while the GPU is in runtime suspend
 - aux char dev not working while the GPU is in runtime suspend

There's definitely more parts of nouveau that need to be fixed to use
runtime power management correctly, such as the hwmon portions, but this
series just handles the more important fixes that should get into stable
for the time being.

Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org

Lyude Paul (5):
  drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
  drm/nouveau: Grab RPM ref while probing outputs
  drm/nouveau: Add missing RPM get/put() when probing connectors
  drm/nouveau: Grab RPM ref when i2c bus is in use
  drm/nouveau: Grab RPM ref when aux bus is in use

 drivers/gpu/drm/nouveau/dispnv50/disp.c       | 12 +++++++++--
 drivers/gpu/drm/nouveau/nouveau_connector.c   | 21 +++++++++++++++++--
 drivers/gpu/drm/nouveau/nouveau_connector.h   |  3 +++
 drivers/gpu/drm/nouveau/nouveau_drm.c         | 10 ++++++++-
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 12 ++++++++++-
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 ++++++++++-
 6 files changed, 63 insertions(+), 7 deletions(-)

-- 
2.17.1

Lyude Paul

2018-Jul-16 23:59 UTC

head link

[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths

In order to fix all of the spots that need to have runtime PM get/puts()
added, we need to ensure that it's possible for us to call
pm_runtime_get/put() in any context, regardless of how deep, since
almost all of the spots that are currently missing refs can potentially
get called in the runtime suspend/resume path. Otherwise, we'll try to
resume the GPU as we're trying to resume the GPU (and vice-versa) and
cause the kernel to deadlock.

With this, it should be safe to call the pm runtime functions in any
context in nouveau with one condition: any point in the driver that
calls pm_runtime_get*() cannot hold any locks owned by nouveau that
would be acquired anywhere inside nouveau_pmops_runtime_resume().
This includes modesetting locks, i2c bus locks, etc.

Signed-off-by: Lyude Paul <lyude at redhat.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index c7ec86d6c3c9..e851ef7b6373 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
+	dev->power.disable_depth++;
+
 	drm_kms_helper_poll_disable(drm_dev);
 	nouveau_switcheroo_optimus_dsm();
 	ret = nouveau_do_suspend(drm_dev, true);
@@ -843,6 +845,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 	pci_ignore_hotplug(pdev);
 	pci_set_power_state(pdev, PCI_D3cold);
 	drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
+
+	dev->power.disable_depth--;
 	return ret;
 }
 
@@ -859,11 +863,13 @@ nouveau_pmops_runtime_resume(struct device *dev)
 		return -EBUSY;
 	}
 
+	dev->power.disable_depth++;
+
 	pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
 	ret = pci_enable_device(pdev);
 	if (ret)
-		return ret;
+		goto out;
 	pci_set_master(pdev);
 
 	ret = nouveau_do_resume(drm_dev, true);
@@ -875,6 +881,8 @@ nouveau_pmops_runtime_resume(struct device *dev)
 	/* Monitors may have been connected / disconnected during suspend */
 	schedule_work(&nouveau_drm(drm_dev)->hpd_work);
 
+out:
+	dev->power.disable_depth--;
 	return ret;
 }
 
-- 
2.17.1

Lyude Paul

2018-Jul-16 23:59 UTC

head link

[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs

When DP MST hubs get confused, they can occasionally stop responding for
a good bit of time up until the point where the DRM driver manages to
do the right DPCD accesses to get it to start responding again. In a
worst case scenario however, this process can take upwards of 10+
seconds.

Currently we use the default output_poll_changed handler
drm_fb_helper_output_poll_changed() to handle output polling, which
doesn't happen to grab any power references on the device when polling.
If we're unlucky enough to have a hub (such as Lenovo's infamous laptop
docks for the P5x/P7x series) that's easily startled and confused, this
can lead to a pretty nasty deadlock situation that looks like this:

- Hotplug event from hub happens, we enter
  drm_fb_helper_output_poll_changed() and start communicating with the
  hub
- While we're in drm_fb_helper_output_poll_changed() and attempting to
  communicate with the hub, we end up confusing it and cause it to stop
  responding for at least 10 seconds
- After 5 seconds of being in drm_fb_helper_output_poll_changed(), the
  pm core attempts to put the GPU into autosuspend, which ends up
  calling drm_kms_helper_poll_disable()
- While the runtime PM core is waiting in drm_kms_helper_poll_disable()
  for the output poll to finish, we end up finally detecting an MST
  display
- We notice the new display and tries to enable it, which triggers
  an atomic commit which triggers a call to pm_runtime_get_sync()
- the output poll thread deadlocks the pm core waiting for the pm core
  to finish the autosuspend request while the pm core waits for the
  output poll thread to finish

Sample:
[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work
[drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0
[drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
               Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.},
at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461
((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.},
at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at:
drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at:
process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4
((work_completion)(&dev->power.work)){+.+.}, at:
process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at:
debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959
((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134
((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at:
drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at:
restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at:
drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.},
at: n_tty_read+0xc1/0x870

[  246.775522] ============================================
So, to fix this (and any other possible deadlock issues like this that
could occur in the output_poll_changed patch) we make sure that
nouveau's output_poll_changed functions grab a runtime power ref before
sending any hotplug events, and hold it until we're finished.

This fixes deadlock issues when in fbcon with nouveau on my P50, and
should fix it for everyone else's as well!

Signed-off-by: Lyude Paul <lyude at redhat.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index eb3e41a78806..ea2a886854fe 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2012,10 +2012,18 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev)
 	return &atom->state;
 }
 
+static void
+nouveau_output_poll_changed(struct drm_device *dev)
+{
+	pm_runtime_get_sync(dev->dev);
+	drm_fb_helper_hotplug_event(dev->fb_helper);
+	pm_runtime_put_autosuspend(dev->dev);
+}
+
 static const struct drm_mode_config_funcs
 nv50_disp_func = {
 	.fb_create = nouveau_user_framebuffer_create,
-	.output_poll_changed = drm_fb_helper_output_poll_changed,
+	.output_poll_changed = nouveau_output_poll_changed,
 	.atomic_check = nv50_disp_atomic_check,
 	.atomic_commit = nv50_disp_atomic_commit,
 	.atomic_state_alloc = nv50_disp_atomic_state_alloc,
-- 
2.17.1

Lyude Paul

2018-Jul-16 23:59 UTC

head link

[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors

While the GPU is guaranteed to be on when a hotplug has been received,
the same assertion does not hold true if a connector probe has been
started by userspace without having had received a sysfs event. So
ensure that any connector probing keeps the GPU alive for the duration
of the probe.

Signed-off-by: Lyude Paul <lyude at redhat.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c     |  2 +-
 drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++++--
 drivers/gpu/drm/nouveau/nouveau_connector.h |  3 +++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index ea2a886854fe..0f283ca75189 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -858,7 +858,7 @@ static const struct drm_connector_funcs
 nv50_mstc = {
 	.reset = nouveau_conn_reset,
 	.detect = nv50_mstc_detect,
-	.fill_modes = drm_helper_probe_single_connector_modes,
+	.fill_modes = nouveau_connector_probe_single_connector_modes,
 	.destroy = nv50_mstc_destroy,
 	.atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
 	.atomic_destroy_state = nouveau_conn_atomic_destroy_state,
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 2a45b4c2ceb0..feb142fb7a8a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -770,6 +770,23 @@ nouveau_connector_force(struct drm_connector *connector)
 	nouveau_connector_set_encoder(connector, nv_encoder);
 }
 
+int
+nouveau_connector_probe_single_connector_modes(struct drm_connector *connector,
+					       uint32_t maxX, uint32_t maxY)
+{
+	struct device *dev = connector->dev->dev;
+	int ret;
+
+	ret = pm_runtime_get_sync(dev);
+	if (WARN_ON(ret < 0 && ret != -EACCES))
+		return 0;
+
+	ret = drm_helper_probe_single_connector_modes(connector, maxX, maxY);
+
+	pm_runtime_put_autosuspend(dev);
+	return ret;
+}
+
 static int
 nouveau_connector_set_property(struct drm_connector *connector,
 			       struct drm_property *property, uint64_t value)
@@ -1088,7 +1105,7 @@ nouveau_connector_funcs = {
 	.reset = nouveau_conn_reset,
 	.detect = nouveau_connector_detect,
 	.force = nouveau_connector_force,
-	.fill_modes = drm_helper_probe_single_connector_modes,
+	.fill_modes = nouveau_connector_probe_single_connector_modes,
 	.set_property = nouveau_connector_set_property,
 	.destroy = nouveau_connector_destroy,
 	.atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
@@ -1103,7 +1120,7 @@ nouveau_connector_funcs_lvds = {
 	.reset = nouveau_conn_reset,
 	.detect = nouveau_connector_detect_lvds,
 	.force = nouveau_connector_force,
-	.fill_modes = drm_helper_probe_single_connector_modes,
+	.fill_modes = nouveau_connector_probe_single_connector_modes,
 	.set_property = nouveau_connector_set_property,
 	.destroy = nouveau_connector_destroy,
 	.atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.h
b/drivers/gpu/drm/nouveau/nouveau_connector.h
index 2d9d35a146a4..e9ffc6eebda2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.h
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.h
@@ -106,6 +106,9 @@ nouveau_crtc_connector_get(struct nouveau_crtc *nv_crtc)
 
 struct drm_connector *
 nouveau_connector_create(struct drm_device *, const struct dcb_output *);
+int
+nouveau_connector_probe_single_connector_modes(struct drm_connector *,
+					       uint32_t, uint32_t);
 
 extern int nouveau_tv_disable;
 extern int nouveau_ignorelid;
-- 
2.17.1

Lyude Paul

2018-Jul-16 23:59 UTC

head link

[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use

The i2c bus can be both accessed by DRM itself, along with any of it's
devnodes (/sys/class/i2c). So, we need to make sure that all codepaths
using the i2c bus keep the GPU resumed.

Signed-off-by: Lyude Paul <lyude at redhat.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
index 807a2b67bd64..1de48c990b80 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
@@ -119,18 +119,28 @@ nvkm_i2c_bus_release(struct nvkm_i2c_bus *bus)
 	BUS_TRACE(bus, "release");
 	nvkm_i2c_pad_release(pad);
 	mutex_unlock(&bus->mutex);
+	pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev);
 }
 
 int
 nvkm_i2c_bus_acquire(struct nvkm_i2c_bus *bus)
 {
 	struct nvkm_i2c_pad *pad = bus->pad;
+	struct device *dev = pad->i2c->subdev.device->dev;
 	int ret;
+
 	BUS_TRACE(bus, "acquire");
+
+	ret = pm_runtime_get_sync(dev);
+	if (ret < 0 && ret != -EACCES)
+		return ret;
+
 	mutex_lock(&bus->mutex);
 	ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_I2C);
-	if (ret)
+	if (ret) {
 		mutex_unlock(&bus->mutex);
+		pm_runtime_put_autosuspend(dev);
+	}
 	return ret;
 }
 
-- 
2.17.1

Lyude Paul

2018-Jul-16 23:59 UTC

head link

[Nouveau] [PATCH 5/5] drm/nouveau: Grab RPM ref when aux bus is in use

DP AUX busses can both be accessed by DRM, and through any of the
userspace dev nodes in /dev/drm_dp_auxN. We need to make sure the GPU
stays on in all of these codepaths.

Signed-off-by: Lyude Paul <lyude at redhat.com>
Cc: Karol Herbst <karolherbst at gmail.com>
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
index 4c1f547da463..6276b113065c 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
@@ -98,18 +98,28 @@ nvkm_i2c_aux_release(struct nvkm_i2c_aux *aux)
 	AUX_TRACE(aux, "release");
 	nvkm_i2c_pad_release(pad);
 	mutex_unlock(&aux->mutex);
+	pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev);
 }
 
 int
 nvkm_i2c_aux_acquire(struct nvkm_i2c_aux *aux)
 {
 	struct nvkm_i2c_pad *pad = aux->pad;
+	struct device *dev = pad->i2c->subdev.device->dev;
 	int ret;
+
 	AUX_TRACE(aux, "acquire");
+
+	ret = pm_runtime_get_sync(dev);
+	if (ret < 0 && ret != -EACCES)
+		return ret;
+
 	mutex_lock(&aux->mutex);
 	ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_AUX);
-	if (ret)
+	if (ret) {
 		mutex_unlock(&aux->mutex);
+		pm_runtime_put_autosuspend(dev);
+	}
 	return ret;
 }
 
-- 
2.17.1

Lukas Wunner

2018-Jul-17 07:16 UTC

head link

[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths

[cc += linux-pm]

Hi Lyude,

First of all, thanks a lot for looking into this. 

On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul
wrote:> In order to fix all of the spots that need to have runtime PM get/puts()
> added, we need to ensure that it's possible for us to call
> pm_runtime_get/put() in any context, regardless of how deep, since
> almost all of the spots that are currently missing refs can potentially
> get called in the runtime suspend/resume path. Otherwise, we'll try to
> resume the GPU as we're trying to resume the GPU (and vice-versa) and
> cause the kernel to deadlock.
> 
> With this, it should be safe to call the pm runtime functions in any
> context in nouveau with one condition: any point in the driver that
> calls pm_runtime_get*() cannot hold any locks owned by nouveau that
> would be acquired anywhere inside nouveau_pmops_runtime_resume().
> This includes modesetting locks, i2c bus locks, etc.
[snip]> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>  		return -EBUSY;
>  	}
>  
> +	dev->power.disable_depth++;
> +
I'm not sure if that variable is actually private to the PM core.
Grepping through the tree I only find a single occurrence where it's
accessed outside the PM core and that's in amdgpu.  So this looks
a little fishy TBH.  It may make sense to cc such patches to linux-pm
to get Rafael & other folks involved with the PM core to comment.

Also, the disable_depth variable only exists if the kernel was
compiled with CONFIG_PM enabled, but I can't find a "depends on
PM"
or something like that in nouveau's Kconfig.  Actually, if PM is
not selected, all the nouveau_pmops_*() functions should be #ifdef'ed
away, but oddly there's no #ifdef CONFIG_PM anywhere in nouveau_drm.c.

Anywayn, if I understand the commit message correctly, you're hitting a
pm_runtime_get_sync() in a code path that itself is called during a
pm_runtime_get_sync().  Could you include stack traces in the commit
message?  My gut feeling is that this patch masks a deeper issue,
e.g. if the runtime_resume code path does in fact directly poll outputs,
that would seem wrong.  Runtime resume should merely make the card
accessible, i.e. reinstate power if necessary, put into PCI_D0,
restore registers, etc.  Output polling should be scheduled
asynchronously.

Thanks,

Lukas

Lukas Wunner

2018-Jul-17 07:21 UTC

head link

[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs

On Mon, Jul 16, 2018 at 07:59:26PM -0400, Lyude Paul
wrote:> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> @@ -2012,10 +2012,18 @@ nv50_disp_atomic_state_alloc(struct drm_device
*dev)
>  	return &atom->state;
>  }
>  
> +static void
> +nouveau_output_poll_changed(struct drm_device *dev)
> +{
> +	pm_runtime_get_sync(dev->dev);
> +	drm_fb_helper_hotplug_event(dev->fb_helper);
> +	pm_runtime_put_autosuspend(dev->dev);
> +}
> +
>  static const struct drm_mode_config_funcs
>  nv50_disp_func = {
>  	.fb_create = nouveau_user_framebuffer_create,
> -	.output_poll_changed = drm_fb_helper_output_poll_changed,
> +	.output_poll_changed = nouveau_output_poll_changed,
It might make sense to provide a generic DRM helper for this.
Same for patch 3 in this series.

Thanks,

Lukas

Karol Herbst

2018-Jul-17 10:11 UTC

head link

[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors

Reviewed-by: Karol Herbst <karolherbst at gmail.com>

On Tue, Jul 17, 2018 at 1:59 AM, Lyude Paul <lyude at redhat.com>
wrote:> While the GPU is guaranteed to be on when a hotplug has been received,
> the same assertion does not hold true if a connector probe has been
> started by userspace without having had received a sysfs event. So
> ensure that any connector probing keeps the GPU alive for the duration
> of the probe.
>
> Signed-off-by: Lyude Paul <lyude at redhat.com>
> Cc: Karol Herbst <karolherbst at gmail.com>
> Cc: stable at vger.kernel.org
> ---
>  drivers/gpu/drm/nouveau/dispnv50/disp.c     |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++++--
>  drivers/gpu/drm/nouveau/nouveau_connector.h |  3 +++
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> index ea2a886854fe..0f283ca75189 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> @@ -858,7 +858,7 @@ static const struct drm_connector_funcs
>  nv50_mstc = {
>         .reset = nouveau_conn_reset,
>         .detect = nv50_mstc_detect,
> -       .fill_modes = drm_helper_probe_single_connector_modes,
> +       .fill_modes = nouveau_connector_probe_single_connector_modes,
>         .destroy = nv50_mstc_destroy,
>         .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
>         .atomic_destroy_state = nouveau_conn_atomic_destroy_state,
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c
b/drivers/gpu/drm/nouveau/nouveau_connector.c
> index 2a45b4c2ceb0..feb142fb7a8a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
> @@ -770,6 +770,23 @@ nouveau_connector_force(struct drm_connector
*connector)
>         nouveau_connector_set_encoder(connector, nv_encoder);
>  }
>
> +int
> +nouveau_connector_probe_single_connector_modes(struct drm_connector
*connector,
> +                                              uint32_t maxX, uint32_t
maxY)
> +{
> +       struct device *dev = connector->dev->dev;
> +       int ret;
> +
> +       ret = pm_runtime_get_sync(dev);
> +       if (WARN_ON(ret < 0 && ret != -EACCES))
> +               return 0;
> +
> +       ret = drm_helper_probe_single_connector_modes(connector, maxX,
maxY);
> +
> +       pm_runtime_put_autosuspend(dev);
> +       return ret;
> +}
> +
>  static int
>  nouveau_connector_set_property(struct drm_connector *connector,
>                                struct drm_property *property, uint64_t
value)
> @@ -1088,7 +1105,7 @@ nouveau_connector_funcs = {
>         .reset = nouveau_conn_reset,
>         .detect = nouveau_connector_detect,
>         .force = nouveau_connector_force,
> -       .fill_modes = drm_helper_probe_single_connector_modes,
> +       .fill_modes = nouveau_connector_probe_single_connector_modes,
>         .set_property = nouveau_connector_set_property,
>         .destroy = nouveau_connector_destroy,
>         .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
> @@ -1103,7 +1120,7 @@ nouveau_connector_funcs_lvds = {
>         .reset = nouveau_conn_reset,
>         .detect = nouveau_connector_detect_lvds,
>         .force = nouveau_connector_force,
> -       .fill_modes = drm_helper_probe_single_connector_modes,
> +       .fill_modes = nouveau_connector_probe_single_connector_modes,
>         .set_property = nouveau_connector_set_property,
>         .destroy = nouveau_connector_destroy,
>         .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state,
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.h
b/drivers/gpu/drm/nouveau/nouveau_connector.h
> index 2d9d35a146a4..e9ffc6eebda2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.h
> @@ -106,6 +106,9 @@ nouveau_crtc_connector_get(struct nouveau_crtc
*nv_crtc)
>
>  struct drm_connector *
>  nouveau_connector_create(struct drm_device *, const struct dcb_output *);
> +int
> +nouveau_connector_probe_single_connector_modes(struct drm_connector *,
> +                                              uint32_t, uint32_t);
>
>  extern int nouveau_tv_disable;
>  extern int nouveau_ignorelid;
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Karol Herbst

2018-Jul-17 10:17 UTC

head link

[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use

mhh, we shouldn't call to Linux APIs from within of nvkm. Rather gaurd
the Linux glue code to the i2c stuff instead, but this is all done
from inside of nvkm. I think we should move it out into
drm/nouveau/nouveau_i2c.c and do the handling there.

On Tue, Jul 17, 2018 at 1:59 AM, Lyude Paul <lyude at redhat.com>
wrote:> The i2c bus can be both accessed by DRM itself, along with any of it's
> devnodes (/sys/class/i2c). So, we need to make sure that all codepaths
> using the i2c bus keep the GPU resumed.
>
> Signed-off-by: Lyude Paul <lyude at redhat.com>
> Cc: Karol Herbst <karolherbst at gmail.com>
> Cc: stable at vger.kernel.org
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
> index 807a2b67bd64..1de48c990b80 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c
> @@ -119,18 +119,28 @@ nvkm_i2c_bus_release(struct nvkm_i2c_bus *bus)
>         BUS_TRACE(bus, "release");
>         nvkm_i2c_pad_release(pad);
>         mutex_unlock(&bus->mutex);
> +       pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev);
>  }
>
>  int
>  nvkm_i2c_bus_acquire(struct nvkm_i2c_bus *bus)
>  {
>         struct nvkm_i2c_pad *pad = bus->pad;
> +       struct device *dev = pad->i2c->subdev.device->dev;
>         int ret;
> +
>         BUS_TRACE(bus, "acquire");
> +
> +       ret = pm_runtime_get_sync(dev);
> +       if (ret < 0 && ret != -EACCES)
> +               return ret;
> +
>         mutex_lock(&bus->mutex);
>         ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_I2C);
> -       if (ret)
> +       if (ret) {
>                 mutex_unlock(&bus->mutex);
> +               pm_runtime_put_autosuspend(dev);
> +       }
>         return ret;
>  }
>
> --
> 2.17.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Possibly Parallel Threads

Search for more maybe matching threads

Nouveau - Jul 2018 - [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks

[Nouveau] [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks

[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths

[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs

[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors

[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use

[Nouveau] [PATCH 5/5] drm/nouveau: Grab RPM ref when aux bus is in use

[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths

[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs

[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors

[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use

Possibly Parallel Threads