Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks
This fixes quite a number of runtime PM bugs I found that have been causing some pretty nasty issues such as: - Deadlocking on boot - Connector probing potentially not working while the GPU is in runtime suspend - i2c char dev not working while the GPU is in runtime suspend - aux char dev not working while the GPU is in runtime suspend There's definitely more parts of nouveau that need to be fixed to use runtime power management correctly, such as the hwmon portions, but this series just handles the more important fixes that should get into stable for the time being. Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org Lyude Paul (5): drm/nouveau: Prevent RPM callback recursion in suspend/resume paths drm/nouveau: Grab RPM ref while probing outputs drm/nouveau: Add missing RPM get/put() when probing connectors drm/nouveau: Grab RPM ref when i2c bus is in use drm/nouveau: Grab RPM ref when aux bus is in use drivers/gpu/drm/nouveau/dispnv50/disp.c | 12 +++++++++-- drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_connector.h | 3 +++ drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++++++++- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 12 ++++++++++- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 ++++++++++- 6 files changed, 63 insertions(+), 7 deletions(-) -- 2.17.1
Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
In order to fix all of the spots that need to have runtime PM get/puts() added, we need to ensure that it's possible for us to call pm_runtime_get/put() in any context, regardless of how deep, since almost all of the spots that are currently missing refs can potentially get called in the runtime suspend/resume path. Otherwise, we'll try to resume the GPU as we're trying to resume the GPU (and vice-versa) and cause the kernel to deadlock. With this, it should be safe to call the pm runtime functions in any context in nouveau with one condition: any point in the driver that calls pm_runtime_get*() cannot hold any locks owned by nouveau that would be acquired anywhere inside nouveau_pmops_runtime_resume(). This includes modesetting locks, i2c bus locks, etc. Signed-off-by: Lyude Paul <lyude at redhat.com> Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index c7ec86d6c3c9..e851ef7b6373 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) return -EBUSY; } + dev->power.disable_depth++; + drm_kms_helper_poll_disable(drm_dev); nouveau_switcheroo_optimus_dsm(); ret = nouveau_do_suspend(drm_dev, true); @@ -843,6 +845,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) pci_ignore_hotplug(pdev); pci_set_power_state(pdev, PCI_D3cold); drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF; + + dev->power.disable_depth--; return ret; } @@ -859,11 +863,13 @@ nouveau_pmops_runtime_resume(struct device *dev) return -EBUSY; } + dev->power.disable_depth++; + pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); ret = pci_enable_device(pdev); if (ret) - return ret; + goto out; pci_set_master(pdev); ret = nouveau_do_resume(drm_dev, true); @@ -875,6 +881,8 @@ nouveau_pmops_runtime_resume(struct device *dev) /* Monitors may have been connected / disconnected during suspend */ schedule_work(&nouveau_drm(drm_dev)->hpd_work); +out: + dev->power.disable_depth--; return ret; } -- 2.17.1
Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs
When DP MST hubs get confused, they can occasionally stop responding for a good bit of time up until the point where the DRM driver manages to do the right DPCD accesses to get it to start responding again. In a worst case scenario however, this process can take upwards of 10+ seconds. Currently we use the default output_poll_changed handler drm_fb_helper_output_poll_changed() to handle output polling, which doesn't happen to grab any power references on the device when polling. If we're unlucky enough to have a hub (such as Lenovo's infamous laptop docks for the P5x/P7x series) that's easily startled and confused, this can lead to a pretty nasty deadlock situation that looks like this: - Hotplug event from hub happens, we enter drm_fb_helper_output_poll_changed() and start communicating with the hub - While we're in drm_fb_helper_output_poll_changed() and attempting to communicate with the hub, we end up confusing it and cause it to stop responding for at least 10 seconds - After 5 seconds of being in drm_fb_helper_output_poll_changed(), the pm core attempts to put the GPU into autosuspend, which ends up calling drm_kms_helper_poll_disable() - While the runtime PM core is waiting in drm_kms_helper_poll_disable() for the output poll to finish, we end up finally detecting an MST display - We notice the new display and tries to enable it, which triggers an atomic commit which triggers a call to pm_runtime_get_sync() - the output poll thread deadlocks the pm core waiting for the pm core to finish the autosuspend request while the pm core waits for the output poll thread to finish Sample: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================ So, to fix this (and any other possible deadlock issues like this that could occur in the output_poll_changed patch) we make sure that nouveau's output_poll_changed functions grab a runtime power ref before sending any hotplug events, and hold it until we're finished. This fixes deadlock issues when in fbcon with nouveau on my P50, and should fix it for everyone else's as well! Signed-off-by: Lyude Paul <lyude at redhat.com> Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index eb3e41a78806..ea2a886854fe 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2012,10 +2012,18 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev) return &atom->state; } +static void +nouveau_output_poll_changed(struct drm_device *dev) +{ + pm_runtime_get_sync(dev->dev); + drm_fb_helper_hotplug_event(dev->fb_helper); + pm_runtime_put_autosuspend(dev->dev); +} + static const struct drm_mode_config_funcs nv50_disp_func = { .fb_create = nouveau_user_framebuffer_create, - .output_poll_changed = drm_fb_helper_output_poll_changed, + .output_poll_changed = nouveau_output_poll_changed, .atomic_check = nv50_disp_atomic_check, .atomic_commit = nv50_disp_atomic_commit, .atomic_state_alloc = nv50_disp_atomic_state_alloc, -- 2.17.1
Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors
While the GPU is guaranteed to be on when a hotplug has been received, the same assertion does not hold true if a connector probe has been started by userspace without having had received a sysfs event. So ensure that any connector probing keeps the GPU alive for the duration of the probe. Signed-off-by: Lyude Paul <lyude at redhat.com> Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +- drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_connector.h | 3 +++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index ea2a886854fe..0f283ca75189 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -858,7 +858,7 @@ static const struct drm_connector_funcs nv50_mstc = { .reset = nouveau_conn_reset, .detect = nv50_mstc_detect, - .fill_modes = drm_helper_probe_single_connector_modes, + .fill_modes = nouveau_connector_probe_single_connector_modes, .destroy = nv50_mstc_destroy, .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, .atomic_destroy_state = nouveau_conn_atomic_destroy_state, diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 2a45b4c2ceb0..feb142fb7a8a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -770,6 +770,23 @@ nouveau_connector_force(struct drm_connector *connector) nouveau_connector_set_encoder(connector, nv_encoder); } +int +nouveau_connector_probe_single_connector_modes(struct drm_connector *connector, + uint32_t maxX, uint32_t maxY) +{ + struct device *dev = connector->dev->dev; + int ret; + + ret = pm_runtime_get_sync(dev); + if (WARN_ON(ret < 0 && ret != -EACCES)) + return 0; + + ret = drm_helper_probe_single_connector_modes(connector, maxX, maxY); + + pm_runtime_put_autosuspend(dev); + return ret; +} + static int nouveau_connector_set_property(struct drm_connector *connector, struct drm_property *property, uint64_t value) @@ -1088,7 +1105,7 @@ nouveau_connector_funcs = { .reset = nouveau_conn_reset, .detect = nouveau_connector_detect, .force = nouveau_connector_force, - .fill_modes = drm_helper_probe_single_connector_modes, + .fill_modes = nouveau_connector_probe_single_connector_modes, .set_property = nouveau_connector_set_property, .destroy = nouveau_connector_destroy, .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, @@ -1103,7 +1120,7 @@ nouveau_connector_funcs_lvds = { .reset = nouveau_conn_reset, .detect = nouveau_connector_detect_lvds, .force = nouveau_connector_force, - .fill_modes = drm_helper_probe_single_connector_modes, + .fill_modes = nouveau_connector_probe_single_connector_modes, .set_property = nouveau_connector_set_property, .destroy = nouveau_connector_destroy, .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.h b/drivers/gpu/drm/nouveau/nouveau_connector.h index 2d9d35a146a4..e9ffc6eebda2 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.h +++ b/drivers/gpu/drm/nouveau/nouveau_connector.h @@ -106,6 +106,9 @@ nouveau_crtc_connector_get(struct nouveau_crtc *nv_crtc) struct drm_connector * nouveau_connector_create(struct drm_device *, const struct dcb_output *); +int +nouveau_connector_probe_single_connector_modes(struct drm_connector *, + uint32_t, uint32_t); extern int nouveau_tv_disable; extern int nouveau_ignorelid; -- 2.17.1
Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use
The i2c bus can be both accessed by DRM itself, along with any of it's devnodes (/sys/class/i2c). So, we need to make sure that all codepaths using the i2c bus keep the GPU resumed. Signed-off-by: Lyude Paul <lyude at redhat.com> Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c index 807a2b67bd64..1de48c990b80 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c @@ -119,18 +119,28 @@ nvkm_i2c_bus_release(struct nvkm_i2c_bus *bus) BUS_TRACE(bus, "release"); nvkm_i2c_pad_release(pad); mutex_unlock(&bus->mutex); + pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev); } int nvkm_i2c_bus_acquire(struct nvkm_i2c_bus *bus) { struct nvkm_i2c_pad *pad = bus->pad; + struct device *dev = pad->i2c->subdev.device->dev; int ret; + BUS_TRACE(bus, "acquire"); + + ret = pm_runtime_get_sync(dev); + if (ret < 0 && ret != -EACCES) + return ret; + mutex_lock(&bus->mutex); ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_I2C); - if (ret) + if (ret) { mutex_unlock(&bus->mutex); + pm_runtime_put_autosuspend(dev); + } return ret; } -- 2.17.1
Lyude Paul
2018-Jul-16 23:59 UTC
[Nouveau] [PATCH 5/5] drm/nouveau: Grab RPM ref when aux bus is in use
DP AUX busses can both be accessed by DRM, and through any of the userspace dev nodes in /dev/drm_dp_auxN. We need to make sure the GPU stays on in all of these codepaths. Signed-off-by: Lyude Paul <lyude at redhat.com> Cc: Karol Herbst <karolherbst at gmail.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c index 4c1f547da463..6276b113065c 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c @@ -98,18 +98,28 @@ nvkm_i2c_aux_release(struct nvkm_i2c_aux *aux) AUX_TRACE(aux, "release"); nvkm_i2c_pad_release(pad); mutex_unlock(&aux->mutex); + pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev); } int nvkm_i2c_aux_acquire(struct nvkm_i2c_aux *aux) { struct nvkm_i2c_pad *pad = aux->pad; + struct device *dev = pad->i2c->subdev.device->dev; int ret; + AUX_TRACE(aux, "acquire"); + + ret = pm_runtime_get_sync(dev); + if (ret < 0 && ret != -EACCES) + return ret; + mutex_lock(&aux->mutex); ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_AUX); - if (ret) + if (ret) { mutex_unlock(&aux->mutex); + pm_runtime_put_autosuspend(dev); + } return ret; } -- 2.17.1
Lukas Wunner
2018-Jul-17 07:16 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
[cc += linux-pm] Hi Lyude, First of all, thanks a lot for looking into this. On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote:> In order to fix all of the spots that need to have runtime PM get/puts() > added, we need to ensure that it's possible for us to call > pm_runtime_get/put() in any context, regardless of how deep, since > almost all of the spots that are currently missing refs can potentially > get called in the runtime suspend/resume path. Otherwise, we'll try to > resume the GPU as we're trying to resume the GPU (and vice-versa) and > cause the kernel to deadlock. > > With this, it should be safe to call the pm runtime functions in any > context in nouveau with one condition: any point in the driver that > calls pm_runtime_get*() cannot hold any locks owned by nouveau that > would be acquired anywhere inside nouveau_pmops_runtime_resume(). > This includes modesetting locks, i2c bus locks, etc.[snip]> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) > return -EBUSY; > } > > + dev->power.disable_depth++; > +I'm not sure if that variable is actually private to the PM core. Grepping through the tree I only find a single occurrence where it's accessed outside the PM core and that's in amdgpu. So this looks a little fishy TBH. It may make sense to cc such patches to linux-pm to get Rafael & other folks involved with the PM core to comment. Also, the disable_depth variable only exists if the kernel was compiled with CONFIG_PM enabled, but I can't find a "depends on PM" or something like that in nouveau's Kconfig. Actually, if PM is not selected, all the nouveau_pmops_*() functions should be #ifdef'ed away, but oddly there's no #ifdef CONFIG_PM anywhere in nouveau_drm.c. Anywayn, if I understand the commit message correctly, you're hitting a pm_runtime_get_sync() in a code path that itself is called during a pm_runtime_get_sync(). Could you include stack traces in the commit message? My gut feeling is that this patch masks a deeper issue, e.g. if the runtime_resume code path does in fact directly poll outputs, that would seem wrong. Runtime resume should merely make the card accessible, i.e. reinstate power if necessary, put into PCI_D0, restore registers, etc. Output polling should be scheduled asynchronously. Thanks, Lukas
Lukas Wunner
2018-Jul-17 07:21 UTC
[Nouveau] [PATCH 2/5] drm/nouveau: Grab RPM ref while probing outputs
On Mon, Jul 16, 2018 at 07:59:26PM -0400, Lyude Paul wrote:> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c > @@ -2012,10 +2012,18 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev) > return &atom->state; > } > > +static void > +nouveau_output_poll_changed(struct drm_device *dev) > +{ > + pm_runtime_get_sync(dev->dev); > + drm_fb_helper_hotplug_event(dev->fb_helper); > + pm_runtime_put_autosuspend(dev->dev); > +} > + > static const struct drm_mode_config_funcs > nv50_disp_func = { > .fb_create = nouveau_user_framebuffer_create, > - .output_poll_changed = drm_fb_helper_output_poll_changed, > + .output_poll_changed = nouveau_output_poll_changed,It might make sense to provide a generic DRM helper for this. Same for patch 3 in this series. Thanks, Lukas
Karol Herbst
2018-Jul-17 10:11 UTC
[Nouveau] [PATCH 3/5] drm/nouveau: Add missing RPM get/put() when probing connectors
Reviewed-by: Karol Herbst <karolherbst at gmail.com> On Tue, Jul 17, 2018 at 1:59 AM, Lyude Paul <lyude at redhat.com> wrote:> While the GPU is guaranteed to be on when a hotplug has been received, > the same assertion does not hold true if a connector probe has been > started by userspace without having had received a sysfs event. So > ensure that any connector probing keeps the GPU alive for the duration > of the probe. > > Signed-off-by: Lyude Paul <lyude at redhat.com> > Cc: Karol Herbst <karolherbst at gmail.com> > Cc: stable at vger.kernel.org > --- > drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++++-- > drivers/gpu/drm/nouveau/nouveau_connector.h | 3 +++ > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c > index ea2a886854fe..0f283ca75189 100644 > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c > @@ -858,7 +858,7 @@ static const struct drm_connector_funcs > nv50_mstc = { > .reset = nouveau_conn_reset, > .detect = nv50_mstc_detect, > - .fill_modes = drm_helper_probe_single_connector_modes, > + .fill_modes = nouveau_connector_probe_single_connector_modes, > .destroy = nv50_mstc_destroy, > .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, > .atomic_destroy_state = nouveau_conn_atomic_destroy_state, > diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c > index 2a45b4c2ceb0..feb142fb7a8a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_connector.c > +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c > @@ -770,6 +770,23 @@ nouveau_connector_force(struct drm_connector *connector) > nouveau_connector_set_encoder(connector, nv_encoder); > } > > +int > +nouveau_connector_probe_single_connector_modes(struct drm_connector *connector, > + uint32_t maxX, uint32_t maxY) > +{ > + struct device *dev = connector->dev->dev; > + int ret; > + > + ret = pm_runtime_get_sync(dev); > + if (WARN_ON(ret < 0 && ret != -EACCES)) > + return 0; > + > + ret = drm_helper_probe_single_connector_modes(connector, maxX, maxY); > + > + pm_runtime_put_autosuspend(dev); > + return ret; > +} > + > static int > nouveau_connector_set_property(struct drm_connector *connector, > struct drm_property *property, uint64_t value) > @@ -1088,7 +1105,7 @@ nouveau_connector_funcs = { > .reset = nouveau_conn_reset, > .detect = nouveau_connector_detect, > .force = nouveau_connector_force, > - .fill_modes = drm_helper_probe_single_connector_modes, > + .fill_modes = nouveau_connector_probe_single_connector_modes, > .set_property = nouveau_connector_set_property, > .destroy = nouveau_connector_destroy, > .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, > @@ -1103,7 +1120,7 @@ nouveau_connector_funcs_lvds = { > .reset = nouveau_conn_reset, > .detect = nouveau_connector_detect_lvds, > .force = nouveau_connector_force, > - .fill_modes = drm_helper_probe_single_connector_modes, > + .fill_modes = nouveau_connector_probe_single_connector_modes, > .set_property = nouveau_connector_set_property, > .destroy = nouveau_connector_destroy, > .atomic_duplicate_state = nouveau_conn_atomic_duplicate_state, > diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.h b/drivers/gpu/drm/nouveau/nouveau_connector.h > index 2d9d35a146a4..e9ffc6eebda2 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_connector.h > +++ b/drivers/gpu/drm/nouveau/nouveau_connector.h > @@ -106,6 +106,9 @@ nouveau_crtc_connector_get(struct nouveau_crtc *nv_crtc) > > struct drm_connector * > nouveau_connector_create(struct drm_device *, const struct dcb_output *); > +int > +nouveau_connector_probe_single_connector_modes(struct drm_connector *, > + uint32_t, uint32_t); > > extern int nouveau_tv_disable; > extern int nouveau_ignorelid; > -- > 2.17.1 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Karol Herbst
2018-Jul-17 10:17 UTC
[Nouveau] [PATCH 4/5] drm/nouveau: Grab RPM ref when i2c bus is in use
mhh, we shouldn't call to Linux APIs from within of nvkm. Rather gaurd the Linux glue code to the i2c stuff instead, but this is all done from inside of nvkm. I think we should move it out into drm/nouveau/nouveau_i2c.c and do the handling there. On Tue, Jul 17, 2018 at 1:59 AM, Lyude Paul <lyude at redhat.com> wrote:> The i2c bus can be both accessed by DRM itself, along with any of it's > devnodes (/sys/class/i2c). So, we need to make sure that all codepaths > using the i2c bus keep the GPU resumed. > > Signed-off-by: Lyude Paul <lyude at redhat.com> > Cc: Karol Herbst <karolherbst at gmail.com> > Cc: stable at vger.kernel.org > --- > drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c > index 807a2b67bd64..1de48c990b80 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c > @@ -119,18 +119,28 @@ nvkm_i2c_bus_release(struct nvkm_i2c_bus *bus) > BUS_TRACE(bus, "release"); > nvkm_i2c_pad_release(pad); > mutex_unlock(&bus->mutex); > + pm_runtime_put_autosuspend(pad->i2c->subdev.device->dev); > } > > int > nvkm_i2c_bus_acquire(struct nvkm_i2c_bus *bus) > { > struct nvkm_i2c_pad *pad = bus->pad; > + struct device *dev = pad->i2c->subdev.device->dev; > int ret; > + > BUS_TRACE(bus, "acquire"); > + > + ret = pm_runtime_get_sync(dev); > + if (ret < 0 && ret != -EACCES) > + return ret; > + > mutex_lock(&bus->mutex); > ret = nvkm_i2c_pad_acquire(pad, NVKM_I2C_PAD_I2C); > - if (ret) > + if (ret) { > mutex_unlock(&bus->mutex); > + pm_runtime_put_autosuspend(dev); > + } > return ret; > } > > -- > 2.17.1 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Apparently Analagous Threads
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 0/5] drm/nouveau: Fix a lot of nasty RPM bugs and deadlocks
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH v8 0/5] Fix connector probing deadlocks from RPM bugs