Danilo Krummrich
2024-Jul-28 21:34 UTC
[PATCH v2 02/37] drm/nouveau: handle pci/tegra drm_dev_{alloc, register} from common code
On Sun, Jul 28, 2024 at 03:13:08PM -0300, Jason Gunthorpe wrote:> On Fri, Jul 26, 2024 at 11:07:19PM +1000, Ben Skeggs wrote: > > > > Right, I think I took that too literally. > > > > > > The lifetime of the DRM device (or more precisely one of its references) is > > > bound to the binding between the parent device and its corresponding driver. > > > > > > But the lifetime of the parent device itself is bound to the DRM device. > > > > > > So, yes this doesn't work, and proves the point that initializing the DRM device > > > with the parent's parent is just a workaround. > > > > You're greatly overstating the "complexity" that's added here. It's a minor > > inconvenience that doesn't require much code at all to implement, and is > > essentially irrelevant outside of module load/unload. > > > > I agree it's not ideal, and userspace should gain auxiliary bus support > > before a new driver implements a similar architecture, but it's really not > > that big a deal. > > Ben asked me to share what other places are doing this stuff. > > To recap, when converting a legacy driver into an aux split we've > found in several places that there is existing userspace that has > hardwired certain sysfs paths. ie an assumption that an infiniband > device appears under the sys/../pci/ directory. > > Argubaly this userspace is not in good shape, but we have to preserve > it. > > So the approach is to make the sysfs visible elements tied to the > original sysfs location (ie the pci device) and continue to use aux > otherwise for discovery, probing and tying subsystems together. > > Obviously you have to be careful about the difference between the > sysfs parent (for owning a subordinate struct device, sysfs files, > etc) and the probe time parent (for owning devres, and other tasks)I think we're on the same page with all that. As clarified in [1], that's not a big concern, I was referring to the changes required to integrate the auxbus stuff. [1] https://lore.kernel.org/nouveau/ZqRTY1GjPE6CZqL3 at pollux.localdomain/> > We've been fortunate enough that subsystems so far have had a clean > enough setup that this is easy enough to do. It sounds like DRM is the > same if it just requires calling a put in .remove() - that is prettyYes, I already mentioned that that DRM devices should still work with this workaround.> normal (though most subsystems would call that unregister, not put)A DRM device is reference counted and can out-live the driver, hence the drm_dev_put() call in .remove(). There is also a special drm_dev_unplug() function, which does not only unregister the DRM device, but also sets a guard to be able prevent HW accesses after the HW is accessible anymore.> > Jason >
Jason Gunthorpe
2024-Jul-28 23:04 UTC
[PATCH v2 02/37] drm/nouveau: handle pci/tegra drm_dev_{alloc, register} from common code
On Sun, Jul 28, 2024 at 11:34:14PM +0200, Danilo Krummrich wrote:> On Sun, Jul 28, 2024 at 03:13:08PM -0300, Jason Gunthorpe wrote:> I think we're on the same page with all that. As clarified in [1], that's not a > big concern, I was referring to the changes required to integrate the auxbus > stuff.Well, I see this thread having the realization that things are not setup proeprly to use devres. To be fair devres creates almost as many bugs as it solves :\ cleanup.h is possibly a better option for most simple things and harder to misuse...> > normal (though most subsystems would call that unregister, not put) > > A DRM device is reference counted and can out-live the driver, hence the > drm_dev_put() call in .remove(). There is also a special drm_dev_unplug() > function, which does not only unregister the DRM device, but also sets a guard > to be able prevent HW accesses after the HW is accessible anymore.Every subsystem has a refcounted object, struct device is inherently refcounted. You call the thing driver calls during .remove() 'unregister' because it is special. Once it returns the subsystem has to promise no more code is running in driver callbacks and the driver is permitted to start destroying anything it might need to use when processing any callbacks. This is really tricky and people routinely misunderstand the requirements and get this wrong. The consequence is UAF problems in obscure cases with unbind races (that few actually care about), but getting it right starts with labeling things properly :) We went through this long ago in RDMA because someone actually had a usecase of live driver unbind, making that work reliably under a full active work load took some thoughtfulness. Jason
Apparently Analagous Threads
- [PATCH v2 02/37] drm/nouveau: handle pci/tegra drm_dev_{alloc, register} from common code
- [PATCH v2 02/37] drm/nouveau: handle pci/tegra drm_dev_{alloc, register} from common code
- [PATCH v2 02/37] drm/nouveau: handle pci/tegra drm_dev_{alloc, register} from common code
- [PATCH v4] drm/bochs: add drm_driver.release callback.
- [PATCH v4] drm/bochs: add drm_driver.release callback.