Danilo Krummrich
2025-Feb-26 01:16 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Tue, Feb 25, 2025 at 08:49:16PM -0400, Jason Gunthorpe wrote:> I'm pointing out the fundamental different in approachs. The typical > widely used pattern results in __device_release_driver() completing > with no concurrent driver code running.Typically yes, but there are exceptions, such as DRM.> > DRM achieves this, in part, by using drm_dev_unplug().No, DRM can have concurrent driver code running, which is why drm_dev_enter() returns whether the device is unplugged already, such that subsequent operations, (e.g. I/O) can be omitted.> > The Rust approach ends up with __device_release_driver() completing > and leaving driver code still running in other threads.No, this has nothing to do with Rust device / driver or I/O abstractions. It entirely depends on the driver you implement. If you register a DRM device, then yes, there may be concurrent driver code running after __device_release_driver() completes. But this is specific to the DRM implementation, *not* to Rust. Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have the driver potentially keep the pci::Bar alive (or even access it) after the device is unbound. A driver can also be unbound without the module being removed, and if the driver would be able to keep the pci::Bar alive, it would mean that the resource region is not freed and the MMIO mapping is not unmapped, because the resource region and the MMIO mapping is bound to the lifetime of the pci::Bar object. This would not be acceptable for a Rust driver. That this also comes in handy for subsystems like DRM, where we could have attempts to access to the pci::Bar object after the device is unbound by design, can be seen as a nice side effect.
Jason Gunthorpe
2025-Feb-26 17:21 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Wed, Feb 26, 2025 at 02:16:58AM +0100, Danilo Krummrich wrote:> > DRM achieves this, in part, by using drm_dev_unplug(). > > No, DRM can have concurrent driver code running, which is why drm_dev_enter() > returns whether the device is unplugged already, such that subsequent > operations, (e.g. I/O) can be omitted.Ah, I did notice that the driver was the one providing the file_operations struct so of course the core code can't protect the driver ops. Yuk :\> Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have > the driver potentially keep the pci::Bar alive (or even access it) after the > device is unbound.My impression is that nobody has yet come up with a Rust way to implement the normal kernel design pattern of revoke threads then free objects in safe rust. Yes, this is a peculiar lifetime model, but it is pretty important in the kernel. I'm not convinced you can just fully ignore it in Rust as a design pattern. We use it pretty much everywhere a function pointer is involved. For instance, I'm looking at workqueue.rs and wondering why is it safe against Execute After Free races. I see none of the C functions I would expect to be used to prevent those races in the code. Even the simple example: //! fn print_later(val: Arc<MyStruct>) { //! let _ = workqueue::system().enqueue(val); //! } Seems to be missing the EAF prevention ie WorkItem::run() is in .text of THIS_MODULE and I see nothing is preventing THIS_MODULE from being unloaded. The expectation of work queues is to follow the above revoke threads then free pattern. A module should do that sequence in the driver remove() or module __exit function. Jason