Lukas Wunner
2018-Jul-18 08:25 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
On Wed, Jul 18, 2018 at 09:38:41AM +0200, Rafael J. Wysocki wrote:> On Tue, Jul 17, 2018 at 8:20 PM, Lukas Wunner <lukas at wunner.de> wrote: > > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() > > wants it in resumed state, so is waiting forever for the device to > > runtime suspend in order to resume it again immediately afterwards. > > > > The deadlock in the stack trace you've posted could be resolved using > > the technique I used in d61a5c106351 by adding the following to > > include/linux/pm_runtime.h: > > > > static inline bool pm_runtime_status_suspending(struct device *dev) > > { > > return dev->power.runtime_status == RPM_SUSPENDING; > > } > > > > static inline bool is_pm_work(struct device *dev) > > { > > struct work_struct *work = current_work(); > > > > return work && work->func == dev->power.work; > > } > > > > Then adding this to nvkm_i2c_aux_acquire(): > > > > struct device *dev = pad->i2c->subdev.device->dev; > > > > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { > > ret = pm_runtime_get_sync(dev); > > if (ret < 0 && ret != -EACCES) > > return ret; > > }[snip]> > For the record, I don't quite like this approach as it seems to be > working around a broken dependency graph. > > If you need to resume device A from within the runtime resume callback > of device B, then clearly B depends on A and there should be a link > between them. > > That said, I do realize that it may be the path of least resistance, > but then I wonder if we can do better than this.The GPU contains an i2c subdevice for each connector with DDC lines. I believe those are modelled as children of the GPU's PCI device as they're accessed via mmio of the PCI device. The problem here is that when the GPU's PCI device runtime suspends, its i2c child device needs to be runtime active to suspend the MST topology. Catch-22. I don't know whether or not it's necessary to suspend the MST topology. I'm not an expert on DisplayPort MultiStream transport. BTW Lyude, in patch 4 and 5 of this series, you're runtime resuming pad->i2c->subdev.device->dev. Is this the PCI device or is it the i2c device? I'm always confused by nouveau's structs. In nvkm_i2c_bus_ctor() I can see that the device you're runtime resuming is the parent of the i2c_adapter: struct nvkm_device *device = pad->i2c->subdev.device; [...] bus->i2c.dev.parent = device->dev; If the i2c_adapter is a child of the PCI device, it's sufficient to runtime resume the i2c_adapter, i.e. bus->i2c.dev, and this will implicitly runtime resume its parent. Thanks, Lukas
Rafael J. Wysocki
2018-Jul-18 08:35 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
On Wed, Jul 18, 2018 at 10:25 AM, Lukas Wunner <lukas at wunner.de> wrote:> On Wed, Jul 18, 2018 at 09:38:41AM +0200, Rafael J. Wysocki wrote: >> On Tue, Jul 17, 2018 at 8:20 PM, Lukas Wunner <lukas at wunner.de> wrote: >> > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() >> > wants it in resumed state, so is waiting forever for the device to >> > runtime suspend in order to resume it again immediately afterwards. >> > >> > The deadlock in the stack trace you've posted could be resolved using >> > the technique I used in d61a5c106351 by adding the following to >> > include/linux/pm_runtime.h: >> > >> > static inline bool pm_runtime_status_suspending(struct device *dev) >> > { >> > return dev->power.runtime_status == RPM_SUSPENDING; >> > } >> > >> > static inline bool is_pm_work(struct device *dev) >> > { >> > struct work_struct *work = current_work(); >> > >> > return work && work->func == dev->power.work; >> > } >> > >> > Then adding this to nvkm_i2c_aux_acquire(): >> > >> > struct device *dev = pad->i2c->subdev.device->dev; >> > >> > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { >> > ret = pm_runtime_get_sync(dev); >> > if (ret < 0 && ret != -EACCES) >> > return ret; >> > } > [snip] >> >> For the record, I don't quite like this approach as it seems to be >> working around a broken dependency graph. >> >> If you need to resume device A from within the runtime resume callback >> of device B, then clearly B depends on A and there should be a link >> between them. >> >> That said, I do realize that it may be the path of least resistance, >> but then I wonder if we can do better than this. > > The GPU contains an i2c subdevice for each connector with DDC lines. > I believe those are modelled as children of the GPU's PCI device as > they're accessed via mmio of the PCI device. > > The problem here is that when the GPU's PCI device runtime suspends, > its i2c child device needs to be runtime active to suspend the MST > topology. Catch-22.I see. This sounds like a case for the ignore_children flag, maybe in a slightly modified form, that will allow the parent to be suspended regardless of the state of the children. I wonder what happens to the I2C subdevices when the PCI device goes into D3. They are not accessible through MMIO any more then, so how can they be suspended then? Or do they need to be suspended at all?> I don't know whether or not it's necessary to suspend the MST topology. > I'm not an expert on DisplayPort MultiStream transport.Me neither. :-)
Lukas Wunner
2018-Jul-18 08:36 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
On Wed, Jul 18, 2018 at 10:25:05AM +0200, Lukas Wunner wrote:> The GPU contains an i2c subdevice for each connector with DDC lines. > I believe those are modelled as children of the GPU's PCI device as > they're accessed via mmio of the PCI device. > > The problem here is that when the GPU's PCI device runtime suspends, > its i2c child device needs to be runtime active to suspend the MST > topology. Catch-22. > > I don't know whether or not it's necessary to suspend the MST topology. > I'm not an expert on DisplayPort MultiStream transport. > > BTW Lyude, in patch 4 and 5 of this series, you're runtime resuming > pad->i2c->subdev.device->dev. Is this the PCI device or is it the i2c > device? I'm always confused by nouveau's structs. In nvkm_i2c_bus_ctor() > I can see that the device you're runtime resuming is the parent of the > i2c_adapter: > > struct nvkm_device *device = pad->i2c->subdev.device; > [...] > bus->i2c.dev.parent = device->dev; > > If the i2c_adapter is a child of the PCI device, it's sufficient > to runtime resume the i2c_adapter, i.e. bus->i2c.dev, and this will > implicitly runtime resume its parent.Actually, having written all this I just remembered that we have this in the documentation: 8. "No-Callback" Devices Some "devices" are only logical sub-devices of their parent and cannot be power-managed on their own. [...] Subsystems can tell the PM core about these devices by calling pm_runtime_no_callbacks(). So it might actually be sufficient to just call pm_runtime_no_callbacks() for the i2c devices... Lukas
Lyude Paul
2018-Jul-18 20:11 UTC
[Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
On Wed, 2018-07-18 at 10:36 +0200, Lukas Wunner wrote:> On Wed, Jul 18, 2018 at 10:25:05AM +0200, Lukas Wunner wrote: > > The GPU contains an i2c subdevice for each connector with DDC lines. > > I believe those are modelled as children of the GPU's PCI device as > > they're accessed via mmio of the PCI device. > > > > The problem here is that when the GPU's PCI device runtime suspends, > > its i2c child device needs to be runtime active to suspend the MST > > topology. Catch-22. > > > > I don't know whether or not it's necessary to suspend the MST topology. > > I'm not an expert on DisplayPort MultiStream transport. > > > > BTW Lyude, in patch 4 and 5 of this series, you're runtime resuming > > pad->i2c->subdev.device->dev. Is this the PCI device or is it the i2c > > device? I'm always confused by nouveau's structs. In nvkm_i2c_bus_ctor() > > I can see that the device you're runtime resuming is the parent of the > > i2c_adapter: > > > > struct nvkm_device *device = pad->i2c->subdev.device; > > [...] > > bus->i2c.dev.parent = device->dev; > > > > If the i2c_adapter is a child of the PCI device, it's sufficient > > to runtime resume the i2c_adapter, i.e. bus->i2c.dev, and this will > > implicitly runtime resume its parent. > > Actually, having written all this I just remembered that we have this > in the documentation: > > 8. "No-Callback" Devices > > Some "devices" are only logical sub-devices of their parent and cannot > be > power-managed on their own. [...] > > Subsystems can tell the PM core about these devices by calling > pm_runtime_no_callbacks(). > > So it might actually be sufficient to just call pm_runtime_no_callbacks()I would have hoped so, but unfortunately it seems that pm_runtime_no_callbacks() is already called by default for i2c adapters in i2c_register_adapter(). Unfortunately this really can't fix the problem though, because it will still try to runtime resume the parent device of the i2c adapter, which still leads to deadlocking in the runtime suspend/resume path. Additionally; I did play around with ignore_children, but unfortunately this isn't good enough either as it just means that our i2c devices won't wake the GPU up on access. I'm pretty stumped here on trying to figure out any clean way to handle this in the PM core if recursive resume calls are off the table. The only possible solution I could see to this is if we could disable execution of runtime callbacks in the context of a certain task (while all other tasks have to honor the runtime PM callbacks), do what we need to do in suspend, then re- enable them> for the i2c devices... > > Lukas-- Cheers, Lyude Paul
Seemingly Similar Threads
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
- [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths