Jason Gunthorpe
2025-Feb-27 14:46 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:> And if you don't store the HrTimerHandle anywhere, like you drop() it > right after start a hrtimer, it will immediately stop the timer. Does > this make sense?Oh, I understand that, but it is not sufficient in the kernel. You are making an implicit argument that something external to the rust universe will hold the module alive until all rust destructors are run. That is trivialy obvious in your example above. However, make it more complex. Run the destructor call for your hrtimer in a workqueue thread. Use workqueue.rs. Now you don't have this implicit argument anymore, and it will EAF things. Danilo argues this is a bug in workqueue.rs. Regardless, it seems like EAF is an overlooked topic in the safety analysis. Further, you and Danilo are making opposing correctness arguments: 1) all rust destructors run before module __exit completes 2) rust destructors can run after driver removal completes I understand the technical underpinnings why these are different, but I feel that if you can make #1 reliably true for __exit then it is highly desirable to use the same techniques to make it true for remove() too. Jason
Boqun Feng
2025-Feb-27 15:18 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:> On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote: > > And if you don't store the HrTimerHandle anywhere, like you drop() it > > right after start a hrtimer, it will immediately stop the timer. Does > > this make sense? > > Oh, I understand that, but it is not sufficient in the kernel. > > You are making an implicit argument that something external to the > rust universe will hold the module alive until all rust destructors > are run. That is trivialy obvious in your example above. >The question in your previous email is about function pointer of hrtimer EAF because of module unload, are you moving to a broader topic here? If no, the for module unload, the argument is not implicit because in rust/macro/module.rs the module __exit() function is generated by Rust, and in that function, `assume_init_drop()` will call these destructors.> However, make it more complex. Run the destructor call for your > hrtimer in a workqueue thread. Use workqueue.rs. Now you don't have > this implicit argument anymore, and it will EAF things. >Note that HrTimerHandle holds a "reference" (could be a normal reference, or an refcounted reference, like Arc) to the hrtimer (and the struct contains it), therefore as long as HrTimerHandle exists, the destructor call of the hrtimer won't be call. Hence the argument is not implicit, it literally is: * If a HrTimerHandle exists, it means the timer has been started, and since the timer has been started, the existence of HrTimerHandle will prevent the destructors of the hrtimer. * drop() on HrTimerHandle will 1) stop the timer and 2) release the reference to the hrtimer, so then the destructors could be called.> Danilo argues this is a bug in workqueue.rs. > > Regardless, it seems like EAF is an overlooked topic in the safety > analysis. >Well, no. See above.> Further, you and Danilo are making opposing correctness arguments: > > 1) all rust destructors run before module __exit completesWhat do you mean by "all rust destructor"? In my previous email, I was talking about the particular destructors of fields in module struct, right?> 2) rust destructors can run after driver removal completes >I will defer this to Danilo, because I'm not sure that's what he was talking about. Regards, Boqun> I understand the technical underpinnings why these are different, but > I feel that if you can make #1 reliably true for __exit then it is > highly desirable to use the same techniques to make it true for > remove() too. > > Jason
Jason Gunthorpe
2025-Feb-27 16:17 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote:> On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote: > > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote: > > > And if you don't store the HrTimerHandle anywhere, like you drop() it > > > right after start a hrtimer, it will immediately stop the timer. Does > > > this make sense? > > > > Oh, I understand that, but it is not sufficient in the kernel. > > > > You are making an implicit argument that something external to the > > rust universe will hold the module alive until all rust destructors > > are run. That is trivialy obvious in your example above. > > > > The question in your previous email is about function pointer of hrtimer > EAF because of module unload, are you moving to a broader topic > here?No> If no, the for module unload, the argument is not implicit because in > rust/macro/module.rs the module __exit() function is generated by Rust, > and in that function, `assume_init_drop()` will call these > destructors.That is not what I mean. You can be running code in multiple threads from multiple functions in the module those are all being protected implicitly by external C code functions. Rust itself is not managing module life time. Then you are making the argument that everything created by a rust module somehow traces its reference back to the module itself, regardless of what thread, callback or memory was used to create it. So all bindings for everything are expected to clean themselves up, recursively. That does make sense, but then it still raises questions that things like workqueue don't seem to have the cleanup. I still wonder why you couldn't also have these reliable reference counts rooted on the device driver instead of only on the module. Jason