Danilo Krummrich
2025-Feb-26 21:31 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Wed, Feb 26, 2025 at 01:21:20PM -0400, Jason Gunthorpe wrote:> On Wed, Feb 26, 2025 at 02:16:58AM +0100, Danilo Krummrich wrote: > > Again, the reason a pci::Bar needs to be revocable in Rust is that we can't have > > the driver potentially keep the pci::Bar alive (or even access it) after the > > device is unbound. > > My impression is that nobody has yet come up with a Rust way to > implement the normal kernel design pattern of revoke threads then free > objects in safe rust.I get where you're coming from (and I agree), but that is a different issue. Let's take a step back and look again why we have Devres (and Revocable) for e.g. pci::Bar. The device / driver model requires that device resources are only held by a driver, as long as the driver is bound to the device. For instance, in C we achieve this by calling pci_iounmap() pci_release_region() from remove(). We rely on this, we trust drivers to actually do this. We also trust drivers that they don't access the pointer originally returned by pci_iomap() after remove(). Typically, drivers do this by shutting down all asynchronous execution paths, e.g. workqueues. Some other drivers might still run code after remove() and hence needs some synchronization, like DRM. In Rust pci_iounmap() and pci_release_region() are called when the pci::Bar object is dropped. But we don't want to trust the driver to actually do this. Instead, we want to ensure that the driver can *not* do something that is not allowed by the device / driver model. Therefore, we never hand out a raw pci::Bar to driver, but a Devres<pci::Bar>. With this a driver can't prevent the pci::Bar being dropped once the device is unbound. So, the main objective here is to ensure that a driver can't keep the pci::Bar (and hence the memory mapping) alive arbitrarily. Now, let's get back to concurrent code that might still attempt to use the pci::Bar. Surely, we need mechanisms to shut down all asynchronous execution paths (e.g. workqueues) once the device is unbound. But that's not the job of Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust against misuse. Again, that the revocable characteristic comes in handy for drivers that still run code after remove() intentionally, is a nice coincidence.> > Yes, this is a peculiar lifetime model, but it is pretty important in > the kernel. I'm not convinced you can just fully ignore it in Rust as > a design pattern. We use it pretty much everywhere a function pointer > is involved. > > For instance, I'm looking at workqueue.rs and wondering why is it safe > against Execute After Free races. I see none of the C functions I > would expect to be used to prevent those races in the code. > > Even the simple example: > > //! fn print_later(val: Arc<MyStruct>) { > //! let _ = workqueue::system().enqueue(val); > //! } > > Seems to be missing the EAF prevention ie WorkItem::run() is in .text > of THIS_MODULE and I see nothing is preventing THIS_MODULE from being > unloaded. > > The expectation of work queues is to follow the above revoke threads > then free pattern. A module should do that sequence in the driver > remove() or module __exit function.Fully agree with that. I guess you're referring to cancel_work_sync() and friends as well as destroy_workqueue(), etc. They're indeed missing, this is because the workqueue work originates from the Rust binder efforts and binder is only used builtin, so there was no need so far. But yes, once people start using workqueues for other modules, we surely need to extend the abstraction accordingly. Other abstractions do consider this though, e.g. the upcoming hrtimer work. [1] In terms of IOCTLs it depends on the particular subsystem, but this is (or will be) also reflected by the corresponding abstraction. Dropping a MiscDeviceRegistration [2] on module_exit() for instance will ensure that there are no concurrent IOCTLs, just like the corresponding C code. [1] https://lore.kernel.org/rust-for-linux/20250224-hrtimer-v3-v6-12-rc2-v9-0-5bd3bf0ce6cc at kernel.org/ [2] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/rust/kernel/miscdevice.rs#n50
Jason Gunthorpe
2025-Feb-26 23:47 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:> Let's take a step back and look again why we have Devres (and Revocable) for > e.g. pci::Bar. > > The device / driver model requires that device resources are only held by a > driver, as long as the driver is bound to the device. > > For instance, in C we achieve this by calling > > pci_iounmap() > pci_release_region() > > from remove(). > > We rely on this, we trust drivers to actually do this.Right, exactly But it is not just PCI bar. There are a *huge* number of kernel APIs that have built in to them the same sort of requirement - teardown MUST run with remove, and once done the resource cannot be used by another thread. Basically most things involving function pointers has this sort of lifecycle requirement because it is a common process that prevents a EAF of module unload. This is all incredibly subtle and driver writers never seem to understand it properly.. See below for my thoughts on hrtimer bindings having the same EAF issue. My fear, that is intensifying as we go through this discussion, is that rust binding authors have not fully comprehended what the kernel life cycle model and common design pattern actually is, and have not fully thought through issues like module unload creating a lifetime cycle for *function pointers*. This stuff is really hard. C programers rarely understand it. Existing drivers tend to frequenly have these bug classes. Without an obvious easy to use Rust framework to, effectively, revoke function pointers and synchronously destroy objects during remove, I think this will be a recurring problem going forward. I assume that Rust philsophy should be quite concerned if it does not protect against function pointers becoming asynchronously invalid due to module unload races. That sounds like a safety problem to me??> We also trust drivers that they don't access the pointer originally > returned by pci_iomap() after remove().Yes, I get it, you are trying to use a reference tracking type design pattern when the C API is giving you a fencing design pattern, they are not compatible and it is hard to interwork them.> Now, let's get back to concurrent code that might still attempt to use the > pci::Bar. Surely, we need mechanisms to shut down all asynchronous execution > paths (e.g. workqueues) once the device is unbound. But that's not the job of > Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust against misuse.The thing is once you have a mechanism to shutdown all the stuff you don't need the overhead of this revocable checking on the normal paths. What you need is a way to bring your pci::Bar into a safety contract that remove will shootdown concurrency and that directly denies references to pci::Bar, and the same contract will guarentee it frees pci::Bar memory. A more fancy version of devm, if you will.> I guess you're referring to cancel_work_sync() and friends as well as > destroy_workqueue(), etc.Yes, and flush, and you often need to design special locking to avoid work-self-requeing. It is tricky stuff, again I've seen lots and lots of bugs in these remove paths here. Hopefully rust can describe this adequately without limiting work queue functionality :\> But yes, once people start using workqueues for other modules, we > surely need to extend the abstraction accordingly.You say that like it will be easy, but it is exactly the same type of lifetime issue as pci_iomap, and that seems to be quite a challenge here???> Other abstractions do consider this though, e.g. the upcoming hrtimer work. [1]Does it??? hrtimer uses function pointers. Any time you take a function pointer you have to reason about how does the .text lifetime work relative to the usage of the function pointer. So how does [1] guarentee that the hrtimer C code will not call the function pointer after driver remove() completes? My rust is aweful, but it looks to me like the timer lifetime is linked to the HrTimerHandle lifetime, but nothing seems to hard link that to the driver bound, or module lifetime? This is what I'm talking about, the design pattern you are trying to fix with revocable is *everywhere* in the C APIs, it is very subtle, but must be considered. One solution would be to force hrtimer into a revocable too. And on and on for basically every kernel API that uses function pointers. This does not seem reasonable to me at all, it certainly isn't better than the standard pattern.> be) also reflected by the corresponding abstraction. Dropping a > MiscDeviceRegistration [2] on module_exit() for instance will ensure that there > are no concurrent IOCTLs, just like the corresponding C code.The way misc device works you can't unload the module until all the FDs are closed and the misc code directly handles races with opening new FDs while modules are unloading. It is quite a different scheme than discussed in this thread. Jason