thr3ads.net - Nouveau - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Jason Gunthorpe

2025-Feb-26 23:47 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich
wrote:> Let's take a step back and look again why we have Devres (and
Revocable) for
> e.g. pci::Bar.
> 
> The device / driver model requires that device resources are only held by a
> driver, as long as the driver is bound to the device.
> 
> For instance, in C we achieve this by calling
> 
> 	pci_iounmap()
> 	pci_release_region()
> 
> from remove().
> 
> We rely on this, we trust drivers to actually do this.
Right, exactly

But it is not just PCI bar. There are a *huge* number of kernel APIs
that have built in to them the same sort of requirement - teardown
MUST run with remove, and once done the resource cannot be used by
another thread.

Basically most things involving function pointers has this sort of
lifecycle requirement because it is a common process that prevents a
EAF of module unload.

This is all incredibly subtle and driver writers never seem to
understand it properly.. See below for my thoughts on hrtimer bindings
having the same EAF issue.

My fear, that is intensifying as we go through this discussion, is
that rust binding authors have not fully comprehended what the kernel
life cycle model and common design pattern actually is, and have not
fully thought through issues like module unload creating a lifetime
cycle for *function pointers*.

This stuff is really hard. C programers rarely understand it. Existing
drivers tend to frequenly have these bug classes. Without an obvious
easy to use Rust framework to, effectively, revoke function pointers
and synchronously destroy objects during remove, I think this will be
a recurring problem going forward.

I assume that Rust philsophy should be quite concerned if it does not
protect against function pointers becoming asynchronously invalid due
to module unload races. That sounds like a safety problem to me??
> We also trust drivers that they don't access the pointer originally
> returned by pci_iomap() after remove().
Yes, I get it, you are trying to use a reference tracking type design
pattern when the C API is giving you a fencing design pattern, they
are not compatible and it is hard to interwork them.
> Now, let's get back to concurrent code that might still attempt to use
the
> pci::Bar. Surely, we need mechanisms to shut down all asynchronous
execution
> paths (e.g. workqueues) once the device is unbound. But that's not the
job of
> Devres<pci::Bar>. The job of Devres<pci::Bar> is to be robust
against misuse.
The thing is once you have a mechanism to shutdown all the stuff you
don't need the overhead of this revocable checking on the normal
paths. What you need is a way to bring your pci::Bar into a safety
contract that remove will shootdown concurrency and that directly
denies references to pci::Bar, and the same contract will guarentee it
frees pci::Bar memory.

A more fancy version of devm, if you will.
> I guess you're referring to cancel_work_sync() and friends as well as
> destroy_workqueue(), etc.
Yes, and flush, and you often need to design special locking to avoid
work-self-requeing. It is tricky stuff, again I've seen lots and lots
of bugs in these remove paths here.

Hopefully rust can describe this adequately without limiting work
queue functionality :\
> But yes, once people start using workqueues for other modules, we
> surely need to extend the abstraction accordingly.
You say that like it will be easy, but it is exactly the same type of
lifetime issue as pci_iomap, and that seems to be quite a challenge
here???
> Other abstractions do consider this though, e.g. the upcoming hrtimer work.
[1]
Does it??? hrtimer uses function pointers. Any time you take a
function pointer you have to reason about how does the .text lifetime
work relative to the usage of the function pointer.

So how does [1] guarentee that the hrtimer C code will not call the
function pointer after driver remove() completes?

My rust is aweful, but it looks to me like the timer lifetime is
linked to the HrTimerHandle lifetime, but nothing seems to hard link
that to the driver bound, or module lifetime?

This is what I'm talking about, the design pattern you are trying to
fix with revocable is *everywhere* in the C APIs, it is very subtle,
but must be considered. One solution would be to force hrtimer into
a revocable too.

And on and on for basically every kernel API that uses function
pointers.

This does not seem reasonable to me at all, it certainly isn't better
than the standard pattern.
> be) also reflected by the corresponding abstraction. Dropping a
> MiscDeviceRegistration [2] on module_exit() for instance will ensure that
there
> are no concurrent IOCTLs, just like the corresponding C code.
The way misc device works you can't unload the module until all the
FDs are closed and the misc code directly handles races with opening
new FDs while modules are unloading. It is quite a different scheme
than discussed in this thread.

Jason

Boqun Feng

2025-Feb-27 00:41 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
[...]> 
> > Other abstractions do consider this though, e.g. the upcoming hrtimer
work. [1]
> 
> Does it??? hrtimer uses function pointers. Any time you take a
> function pointer you have to reason about how does the .text lifetime
> work relative to the usage of the function pointer.
> 
> So how does [1] guarentee that the hrtimer C code will not call the
> function pointer after driver remove() completes?
> 
> My rust is aweful, but it looks to me like the timer lifetime is
> linked to the HrTimerHandle lifetime, but nothing seems to hard link
> that to the driver bound, or module lifetime?
> 
So to write a module, normally you need to have a module struct, e.g.

	struct MyModule { ... }

and if a hrtimer is used by MyModule, you can put an HrTimerHandle in
it:

	struct MyModule {
	    ...
	    handle: Option<HrTimerHandle>
	}

, when module unloaded, every field of MyModule will call their drop()
function, and HrTimerHandle's drop() will cancel the hrtimer, so that
the function pointer won't be referenced by hrtimer core.

And if you don't store the HrTimerHandle anywhere, like you drop() it
right after start a hrtimer, it will immediately stop the timer. Does
this make sense?

Regards,
Boqun
> This is what I'm talking about, the design pattern you are trying to
> fix with revocable is *everywhere* in the C APIs, it is very subtle,
> but must be considered. One solution would be to force hrtimer into
> a revocable too.
> 
> And on and on for basically every kernel API that uses function
> pointers.
> 
> This does not seem reasonable to me at all, it certainly isn't better
> than the standard pattern.
> 
> > be) also reflected by the corresponding abstraction. Dropping a
> > MiscDeviceRegistration [2] on module_exit() for instance will ensure
that there
> > are no concurrent IOCTLs, just like the corresponding C code.
> 
> The way misc device works you can't unload the module until all the
> FDs are closed and the misc code directly handles races with opening
> new FDs while modules are unloading. It is quite a different scheme
> than discussed in this thread.
> 
> Jason

Greg KH

2025-Feb-27 01:02 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe
wrote:> The way misc device works you can't unload the module until all the
> FDs are closed and the misc code directly handles races with opening
> new FDs while modules are unloading. It is quite a different scheme
> than discussed in this thread.
And I would argue that is it the _right_ scheme to be following overall
here.  Removing modules with in-flight devices/drivers is to me is odd,
and only good for developers doing work, not for real systems, right?

Yes, networking did add that functionality to allow modules to be
unloaded with network connections open, and I'm guessing RDMA followed
that, but really, why?

What is the requirement that means that you have to do this for function
pointers?  I can understand the disconnect issue between devices and
drivers and open file handles (or sockets), as that is a normal thing,
but not removing code from the system, that is not normal.

thanks,

greg k-h

Danilo Krummrich

2025-Feb-27 11:32 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe
wrote:> On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > Let's take a step back and look again why we have Devres (and
Revocable) for
> > e.g. pci::Bar.
> > 
> > The device / driver model requires that device resources are only held
by a
> > driver, as long as the driver is bound to the device.
> > 
> > For instance, in C we achieve this by calling
> > 
> > 	pci_iounmap()
> > 	pci_release_region()
> > 
> > from remove().
> > 
> > We rely on this, we trust drivers to actually do this.
> 
> Right, exactly
> 
> But it is not just PCI bar. There are a *huge* number of kernel APIs
> that have built in to them the same sort of requirement - teardown
> MUST run with remove, and once done the resource cannot be used by
> another thread.
> 
> Basically most things involving function pointers has this sort of
> lifecycle requirement because it is a common process that prevents a
> EAF of module unload.
You're still mixing topics, the whole Devres<pci::Bar> thing as about
limiting
object lifetime to the point where the driver is unbound.

Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
to prevent unexpected access to the module .text section is a whole different
topic.

You seem to imply that if we ensure the latter, we don't need to enforce the
former, but that isn't true.

In other words, assuming that we properly enforce that there are no async
execution paths after remove() or module_exit() (not necessarily the same),
we still need to ensure that a pci::Bar object does not outlive remove().

Device resources are a bit special, since their lifetime must be cap'd at
device
unbind, *independent* of the object lifetime they reside in. Hence the Devres
container.

For a memory allocation for instance (e.g. KBox<T>), this is different. If
it
outlives remove(), e.g. because it's part of the Module structure,
that's fine.
It's only important that it's dropped eventually.
> 
> This is all incredibly subtle and driver writers never seem to
> understand it properly.. See below for my thoughts on hrtimer bindings
> having the same EAF issue.
I don't think it has, see Boqun's reply [1].
> 
> My fear, that is intensifying as we go through this discussion, is
> that rust binding authors have not fully comprehended what the kernel
> life cycle model and common design pattern actually is, and have not
> fully thought through issues like module unload creating a lifetime
> cycle for *function pointers*.
I do *not* see where you take the evidence from to make such a generic
statement.

Especially because there aren't a lot of abstractions upstream yet that fall
under this category.

If you think that a particular abstraction has a design issue, you're very
welcome to chime in on the particular mail thread and help improve things.

But implying that no one considers this is not representing reality.
> 
> This stuff is really hard. C programers rarely understand it. Existing
> drivers tend to frequenly have these bug classes. Without an obvious
> easy to use Rust framework to, effectively, revoke function pointers
> and synchronously destroy objects during remove, I think this will be
> a recurring problem going forward.
> 
> I assume that Rust philsophy should be quite concerned if it does not
> protect against function pointers becoming asynchronously invalid due
> to module unload races. That sounds like a safety problem to me??
Yes, it would be a safety problem. That's why HrTimer for instance
implements
the corresponding synchronization when dropped.
> 
> > We also trust drivers that they don't access the pointer
originally
> > returned by pci_iomap() after remove().
> 
> Yes, I get it, you are trying to use a reference tracking type design
> pattern when the C API is giving you a fencing design pattern, they
> are not compatible and it is hard to interwork them.
> 
> > Now, let's get back to concurrent code that might still attempt to
use the
> > pci::Bar. Surely, we need mechanisms to shut down all asynchronous
execution
> > paths (e.g. workqueues) once the device is unbound. But that's not
the job of
> > Devres<pci::Bar>. The job of Devres<pci::Bar> is to be
robust against misuse.
> 
> The thing is once you have a mechanism to shutdown all the stuff you
> don't need the overhead of this revocable checking on the normal
> paths. What you need is a way to bring your pci::Bar into a safety
> contract that remove will shootdown concurrency and that directly
> denies references to pci::Bar, and the same contract will guarentee it
> frees pci::Bar memory.
This contract needs to be technically enforced, not by convention as we do in C.

And with embedding the pci::Bar in Devres, such that it is automatically revoked
when the device is unbound, we do exactly that.

(Again, we don't do it primarily to protect against some concurrent access,
we do
it to forcefully ensure that a pci::Bar object can not outlive the device /
driver binding.)

If we don't do that, how else do we do it? How do we prevent a driver from
keeping the pci::Bar (and hence the memory mapping and memory region reservation
alive after the device has been unbound?
> > But yes, once people start using workqueues for other modules, we
> > surely need to extend the abstraction accordingly.
> 
> You say that like it will be easy, but it is exactly the same type of
> lifetime issue as pci_iomap, and that seems to be quite a challenge
> here???
No, the workqueue implementation can so something similar to what Boqun
explained for HrTimer [1].

A workqueue lifetime also does not need a hard cap at device unbind, like device
resources. If it is bound to the module lifetime for instance, that is fine too.

Data that is accessed from a work item can't be freed under the workqueue by
design in Rust.

[1] https://lore.kernel.org/rust-for-linux/Z7-0pOmWO6r_KeQI at boqun-archlinux/

Jason Gunthorpe

2025-Feb-27 14:46 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 04:41:08PM -0800, Boqun Feng
wrote:> And if you don't store the HrTimerHandle anywhere, like you drop() it
> right after start a hrtimer, it will immediately stop the timer. Does
> this make sense?
Oh, I understand that, but it is not sufficient in the kernel.

You are making an implicit argument that something external to the
rust universe will hold the module alive until all rust destructors
are run. That is trivialy obvious in your example above.

However, make it more complex. Run the destructor call for your
hrtimer in a workqueue thread. Use workqueue.rs. Now you don't have
this implicit argument anymore, and it will EAF things.

Danilo argues this is a bug in workqueue.rs.

Regardless, it seems like EAF is an overlooked topic in the safety
analysis.

Further, you and Danilo are making opposing correctness arguments:

 1) all rust destructors run before module __exit completes
 2) rust destructors can run after driver removal completes

I understand the technical underpinnings why these are different, but
I feel that if you can make #1 reliably true for __exit then it is
highly desirable to use the same techniques to make it true for
remove() too.

Jason

Nouveau - Feb 2025 - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation