Lukas Wunner
2018-Feb-19 11:58 UTC
[Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote:> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > > Fix a deadlock on hybrid graphics laptops that's been present since 2013: > > > > DRM drivers poll connectors in 10 sec intervals. The poll worker is > > stopped on ->runtime_suspend with cancel_delayed_work_sync(). However > > the poll worker invokes the DRM drivers' ->detect callbacks, which call > > pm_runtime_get_sync(). If the poll worker starts after runtime suspend > > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish > > with the intention of runtime resuming the device afterwards. The result > > is a circular wait between poll worker and autosuspend worker. > > Don't shut down the poll worker when runtime suspending, that' doesn't > work. If you need the poll work, then that means waking up the gpu every > few seconds. If you don't need it, then make sure the DRM_CONNECTOR_POLL > flags are set correctly (you can update them at runtime, the poll worker > will pick that up). > > That should fix the deadlock, and it's how we do it in i915 (where igt in > CI totally hammers the runtime pm support, and it seems to hold up).It would fix the deadlock but it's not an option on dual GPU laptops. Power consumption of the discrete GPU is massive (9 W on my machine).> > i915, malidp and msm "solved" this issue by not stopping the poll worker > > on runtime suspend. But this results in the GPU bouncing back and forth > > between D0 and D3 continuously. That's a total no-go for GPUs which > > runtime suspend to D3cold since every suspend/resume cycle costs a > > significant amount of time and energy. (i915 and malidp do not seem > > to acquire a runtime PM ref in the ->detect callbacks, which seems > > questionable. msm however does and would also deadlock if it disabled > > the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx) > > Well, userspace expects hotplug events, even when we runtime suspend > stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a > pretty serious policy decision which is ok in the context of > vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes > up if you plug something in, even with all the runtime pm stuff enabled. > Same for sata and everything else.On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into the gmux controller, which sends an interrupt on hotplug even if the GPU is powered down. Apparently Optimus has a similar functionality, cf. 3a6536c51d5d. For the rare cases where an external VGA or DVI-A port is present, I guess it's reasonable to have the user wake up the machine manually. I'm not sure why nouveau polls ports on my laptop, the GK107 only has an LVDS and three DP ports, need to investigate. Thanks, Lukas
Daniel Vetter
2018-Feb-19 14:05 UTC
[Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote:> On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote: > > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > > > Fix a deadlock on hybrid graphics laptops that's been present since 2013: > > > > > > DRM drivers poll connectors in 10 sec intervals. The poll worker is > > > stopped on ->runtime_suspend with cancel_delayed_work_sync(). However > > > the poll worker invokes the DRM drivers' ->detect callbacks, which call > > > pm_runtime_get_sync(). If the poll worker starts after runtime suspend > > > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish > > > with the intention of runtime resuming the device afterwards. The result > > > is a circular wait between poll worker and autosuspend worker. > > > > Don't shut down the poll worker when runtime suspending, that' doesn't > > work. If you need the poll work, then that means waking up the gpu every > > few seconds. If you don't need it, then make sure the DRM_CONNECTOR_POLL > > flags are set correctly (you can update them at runtime, the poll worker > > will pick that up). > > > > That should fix the deadlock, and it's how we do it in i915 (where igt in > > CI totally hammers the runtime pm support, and it seems to hold up). > > It would fix the deadlock but it's not an option on dual GPU laptops. > Power consumption of the discrete GPU is massive (9 W on my machine). > > > > > i915, malidp and msm "solved" this issue by not stopping the poll worker > > > on runtime suspend. But this results in the GPU bouncing back and forth > > > between D0 and D3 continuously. That's a total no-go for GPUs which > > > runtime suspend to D3cold since every suspend/resume cycle costs a > > > significant amount of time and energy. (i915 and malidp do not seem > > > to acquire a runtime PM ref in the ->detect callbacks, which seems > > > questionable. msm however does and would also deadlock if it disabled > > > the poll worker on runtime suspend. cc += Archit, Liviu, intel-gfx) > > > > Well, userspace expects hotplug events, even when we runtime suspend > > stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a > > pretty serious policy decision which is ok in the context of > > vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes > > up if you plug something in, even with all the runtime pm stuff enabled. > > Same for sata and everything else. > > On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into > the gmux controller, which sends an interrupt on hotplug even if the GPU > is powered down. > > Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly makes sense. I think ideally we'd stop polling in the gmux handler somehow (maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright stopping it all). But not when runtime suspending the entire gpu (e.g. idle system that shuts down the screen and everything, before it decides a few minutes later to do a full system suspend). I also think that this approach would lead to cleaner code, having explicit checks for the (locking) execution context all over the place tends to result in regrets eventually ime.> For the rare cases where an external VGA or DVI-A port is present, I guess > it's reasonable to have the user wake up the machine manually. > > I'm not sure why nouveau polls ports on my laptop, the GK107 only has an > LVDS and three DP ports, need to investigate.Yeah, that'd be good to figure out. The probe helpers should shut down the worker if there's no connector that needs probing. We use that to enable/disable the poll worker when there's a hotplug storm on the irq line. Once that's fixed we can perhaps also untangle the poll-vs-gmux story. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Lukas Wunner
2018-Feb-19 14:47 UTC
[Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Mon, Feb 19, 2018 at 03:05:53PM +0100, Daniel Vetter wrote:> On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote: > > On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote: > > > Well, userspace expects hotplug events, even when we runtime suspend > > > stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a > > > pretty serious policy decision which is ok in the context of > > > vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes > > > up if you plug something in, even with all the runtime pm stuff enabled. > > > Same for sata and everything else. > > > > On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into > > the gmux controller, which sends an interrupt on hotplug even if the GPU > > is powered down. > > > > Apparently Optimus has a similar functionality, cf. 3a6536c51d5d. > > Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly > makes sense. I think ideally we'd stop polling in the gmux handler somehow > (maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright > stopping it all). But not when runtime suspending the entire gpu (e.g. > idle system that shuts down the screen and everything, before it decides > a few minutes later to do a full system suspend).nouveau, radeon and amdgpu currently use runtime PM *only* on hybrid graphics laptops. Should the drivers later be extended to also use runtime PM in other scenarios (desktop machines, eGPUs), they can easily detect whether to disable polling on runtime suspend by calling apple_gmux_present() on Macs or the equivalent for Optimus/ATPX. Thanks, Lukas
Seemingly Similar Threads
- [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
- [PATCH v6 1/2] vga_switcheroo: Add helper for deferred probing
- [PATCH v5] vga_switcheroo: Add helper for deferred probing
- [PATCH v5 00/12] Enable GPU switching on pre-retina MacBook Pro
- [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers