thr3ads.net - Nouveau - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Jason Gunthorpe

2025-Feb-27 16:17 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng
wrote:> On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> > > And if you don't store the HrTimerHandle anywhere, like you
drop() it
> > > right after start a hrtimer, it will immediately stop the timer.
Does
> > > this make sense?
> > 
> > Oh, I understand that, but it is not sufficient in the kernel.
> > 
> > You are making an implicit argument that something external to the
> > rust universe will hold the module alive until all rust destructors
> > are run. That is trivialy obvious in your example above.
> > 
> 
> The question in your previous email is about function pointer of hrtimer
> EAF because of module unload, are you moving to a broader topic
> here?
No
> If no, the for module unload, the argument is not implicit because in
> rust/macro/module.rs the module __exit() function is generated by Rust,
> and in that function, `assume_init_drop()` will call these
> destructors.
That is not what I mean. You can be running code in multiple threads
from multiple functions in the module those are all being protected
implicitly by external C code functions. Rust itself is not managing
module life time.

Then you are making the argument that everything created by a rust
module somehow traces its reference back to the module itself,
regardless of what thread, callback or memory was used to create it.

So all bindings for everything are expected to clean themselves up,
recursively.

That does make sense, but then it still raises questions that things
like workqueue don't seem to have the cleanup.

I still wonder why you couldn't also have these reliable reference
counts rooted on the device driver instead of only on the module.

Jason

Boqun Feng

2025-Feb-27 16:55 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe
wrote:> On Thu, Feb 27, 2025 at 07:18:02AM -0800, Boqun Feng wrote:
> > On Thu, Feb 27, 2025 at 10:46:18AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng wrote:
> > > > And if you don't store the HrTimerHandle anywhere, like
you drop() it
> > > > right after start a hrtimer, it will immediately stop the
timer. Does
> > > > this make sense?
> > > 
> > > Oh, I understand that, but it is not sufficient in the kernel.
> > > 
> > > You are making an implicit argument that something external to
the
> > > rust universe will hold the module alive until all rust
destructors
> > > are run. That is trivialy obvious in your example above.
> > > 
> > 
> > The question in your previous email is about function pointer of
hrtimer
> > EAF because of module unload, are you moving to a broader topic
> > here?
> 
> No
> 
> > If no, the for module unload, the argument is not implicit because in
> > rust/macro/module.rs the module __exit() function is generated by
Rust,
> > and in that function, `assume_init_drop()` will call these
> > destructors.
> 
> That is not what I mean. You can be running code in multiple threads
> from multiple functions in the module those are all being protected
> implicitly by external C code functions. Rust itself is not managing
> module life time.
> 
> Then you are making the argument that everything created by a rust
> module somehow traces its reference back to the module itself,
> regardless of what thread, callback or memory was used to create it.
> 
> So all bindings for everything are expected to clean themselves up,
> recursively.
> 
Right, that would be the most cases in Rust if you want to control the 
cleanup orderings.
> That does make sense, but then it still raises questions that things
> like workqueue don't seem to have the cleanup.
> 
It was because the existing Workqueue was designed for built-in cases,
and we should fix that. Thank you for spotting that.
> I still wonder why you couldn't also have these reliable reference
> counts rooted on the device driver instead of only on the module.
> 
You could put reliable reference counts anywhere you want, as long as it
reflects the resource dependencies.

Regards,
Boqun
> Jason

Danilo Krummrich

2025-Feb-27 17:32 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Thu, Feb 27, 2025 at 08:55:09AM -0800, Boqun Feng
wrote:> On Thu, Feb 27, 2025 at 12:17:33PM -0400, Jason Gunthorpe wrote:
> 
> > I still wonder why you couldn't also have these reliable reference
> > counts rooted on the device driver instead of only on the module.
> > 
> 
> You could put reliable reference counts anywhere you want, as long as it
> reflects the resource dependencies.
Right, as I explained in a different reply, the signature for PCI driver probe()
looks like this:

	fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) ->
Result<Pin<KBox<Self>>>

The returned Pin<KBox<Self>> has the lifetime of the driver being
bound to the
device.

Which means a driver can bind things to this lifetime. But, it isn't forced
to,
it can also put things into an Arc and share it with the rest of the world.

If something is crucial to be bound to the lifetime of a driver being bound to a
device (i.e. device resources), you have to expose it as Devres<T>.

Subsequently, you can put the Devres<T> in an Arc and do whatever you
want, the
result will still be that T is dropped once the device is unbound.

Nouveau - Feb 2025 - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation