Jason Gunthorpe
2025-Feb-27 16:17 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote:> On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote: > > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote: > > > And if you don't store the HrTimerHandle anywhere, like you drop() it > > > right after start a hrtimer, it will immediately stop the timer. Does > > > this make sense? > > > > Oh, I understand that, but it is not sufficient in the kernel. > > > > You are making an implicit argument that something external to the > > rust universe will hold the module alive until all rust destructors > > are run. That is trivialy obvious in your example above. > > > > The question in your previous email is about function pointer of hrtimer > EAF because of module unload, are you moving to a broader topic > here?No> If no, the for module unload, the argument is not implicit because in > rust/macro/module.rs the module __exit() function is generated by Rust, > and in that function, `assume_init_drop()` will call these > destructors.That is not what I mean. You can be running code in multiple threads from multiple functions in the module those are all being protected implicitly by external C code functions. Rust itself is not managing module life time. Then you are making the argument that everything created by a rust module somehow traces its reference back to the module itself, regardless of what thread, callback or memory was used to create it. So all bindings for everything are expected to clean themselves up, recursively. That does make sense, but then it still raises questions that things like workqueue don't seem to have the cleanup. I still wonder why you couldn't also have these reliable reference counts rooted on the device driver instead of only on the module. Jason
Boqun Feng
2025-Feb-27 16:55 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:> On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote: > > On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote: > > > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote: > > > > And if you don't store the HrTimerHandle anywhere, like you drop() it > > > > right after start a hrtimer, it will immediately stop the timer. Does > > > > this make sense? > > > > > > Oh, I understand that, but it is not sufficient in the kernel. > > > > > > You are making an implicit argument that something external to the > > > rust universe will hold the module alive until all rust destructors > > > are run. That is trivialy obvious in your example above. > > > > > > > The question in your previous email is about function pointer of hrtimer > > EAF because of module unload, are you moving to a broader topic > > here? > > No > > > If no, the for module unload, the argument is not implicit because in > > rust/macro/module.rs the module __exit() function is generated by Rust, > > and in that function, `assume_init_drop()` will call these > > destructors. > > That is not what I mean. You can be running code in multiple threads > from multiple functions in the module those are all being protected > implicitly by external C code functions. Rust itself is not managing > module life time. > > Then you are making the argument that everything created by a rust > module somehow traces its reference back to the module itself, > regardless of what thread, callback or memory was used to create it. > > So all bindings for everything are expected to clean themselves up, > recursively. >Right, that would be the most cases in Rust if you want to control the cleanup orderings.> That does make sense, but then it still raises questions that things > like workqueue don't seem to have the cleanup. >It was because the existing Workqueue was designed for built-in cases, and we should fix that. Thank you for spotting that.> I still wonder why you couldn't also have these reliable reference > counts rooted on the device driver instead of only on the module. >You could put reliable reference counts anywhere you want, as long as it reflects the resource dependencies. Regards, Boqun> Jason
Danilo Krummrich
2025-Feb-27 17:32 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng wrote:> On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote: > > > I still wonder why you couldn't also have these reliable reference > > counts rooted on the device driver instead of only on the module. > > > > You could put reliable reference counts anywhere you want, as long as it > reflects the resource dependencies.Right, as I explained in a different reply, the signature for PCI driver probe() looks like this: fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> The returned Pin<KBox<Self>> has the lifetime of the driver being bound to the device. Which means a driver can bind things to this lifetime. But, it isn't forced to, it can also put things into an Arc and share it with the rest of the world. If something is crucial to be bound to the lifetime of a driver being bound to a device (i.e. device resources), you have to expose it as Devres<T>. Subsequently, you can put the Devres<T> in an Arc and do whatever you want, the result will still be that T is dropped once the device is unbound.