thr3ads.net - Nouveau - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2025-Feb-27 11:32 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe
wrote:> On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > Let's take a step back and look again why we have Devres (and
Revocable) for
> > e.g. pci::Bar.
> > 
> > The device / driver model requires that device resources are only held
by a
> > driver, as long as the driver is bound to the device.
> > 
> > For instance, in C we achieve this by calling
> > 
> > 	pci_iounmap()
> > 	pci_release_region()
> > 
> > from remove().
> > 
> > We rely on this, we trust drivers to actually do this.
> 
> Right, exactly
> 
> But it is not just PCI bar. There are a *huge* number of kernel APIs
> that have built in to them the same sort of requirement - teardown
> MUST run with remove, and once done the resource cannot be used by
> another thread.
> 
> Basically most things involving function pointers has this sort of
> lifecycle requirement because it is a common process that prevents a
> EAF of module unload.
You're still mixing topics, the whole Devres<pci::Bar> thing as about
limiting
object lifetime to the point where the driver is unbound.

Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
to prevent unexpected access to the module .text section is a whole different
topic.

You seem to imply that if we ensure the latter, we don't need to enforce the
former, but that isn't true.

In other words, assuming that we properly enforce that there are no async
execution paths after remove() or module_exit() (not necessarily the same),
we still need to ensure that a pci::Bar object does not outlive remove().

Device resources are a bit special, since their lifetime must be cap'd at
device
unbind, *independent* of the object lifetime they reside in. Hence the Devres
container.

For a memory allocation for instance (e.g. KBox<T>), this is different. If
it
outlives remove(), e.g. because it's part of the Module structure,
that's fine.
It's only important that it's dropped eventually.
> 
> This is all incredibly subtle and driver writers never seem to
> understand it properly.. See below for my thoughts on hrtimer bindings
> having the same EAF issue.
I don't think it has, see Boqun's reply [1].
> 
> My fear, that is intensifying as we go through this discussion, is
> that rust binding authors have not fully comprehended what the kernel
> life cycle model and common design pattern actually is, and have not
> fully thought through issues like module unload creating a lifetime
> cycle for *function pointers*.
I do *not* see where you take the evidence from to make such a generic
statement.

Especially because there aren't a lot of abstractions upstream yet that fall
under this category.

If you think that a particular abstraction has a design issue, you're very
welcome to chime in on the particular mail thread and help improve things.

But implying that no one considers this is not representing reality.
> 
> This stuff is really hard. C programers rarely understand it. Existing
> drivers tend to frequenly have these bug classes. Without an obvious
> easy to use Rust framework to, effectively, revoke function pointers
> and synchronously destroy objects during remove, I think this will be
> a recurring problem going forward.
> 
> I assume that Rust philsophy should be quite concerned if it does not
> protect against function pointers becoming asynchronously invalid due
> to module unload races. That sounds like a safety problem to me??
Yes, it would be a safety problem. That's why HrTimer for instance
implements
the corresponding synchronization when dropped.
> 
> > We also trust drivers that they don't access the pointer
originally
> > returned by pci_iomap() after remove().
> 
> Yes, I get it, you are trying to use a reference tracking type design
> pattern when the C API is giving you a fencing design pattern, they
> are not compatible and it is hard to interwork them.
> 
> > Now, let's get back to concurrent code that might still attempt to
use the
> > pci::Bar. Surely, we need mechanisms to shut down all asynchronous
execution
> > paths (e.g. workqueues) once the device is unbound. But that's not
the job of
> > Devres<pci::Bar>. The job of Devres<pci::Bar> is to be
robust against misuse.
> 
> The thing is once you have a mechanism to shutdown all the stuff you
> don't need the overhead of this revocable checking on the normal
> paths. What you need is a way to bring your pci::Bar into a safety
> contract that remove will shootdown concurrency and that directly
> denies references to pci::Bar, and the same contract will guarentee it
> frees pci::Bar memory.
This contract needs to be technically enforced, not by convention as we do in C.

And with embedding the pci::Bar in Devres, such that it is automatically revoked
when the device is unbound, we do exactly that.

(Again, we don't do it primarily to protect against some concurrent access,
we do
it to forcefully ensure that a pci::Bar object can not outlive the device /
driver binding.)

If we don't do that, how else do we do it? How do we prevent a driver from
keeping the pci::Bar (and hence the memory mapping and memory region reservation
alive after the device has been unbound?
> > But yes, once people start using workqueues for other modules, we
> > surely need to extend the abstraction accordingly.
> 
> You say that like it will be easy, but it is exactly the same type of
> lifetime issue as pci_iomap, and that seems to be quite a challenge
> here???
No, the workqueue implementation can so something similar to what Boqun
explained for HrTimer [1].

A workqueue lifetime also does not need a hard cap at device unbind, like device
resources. If it is bound to the module lifetime for instance, that is fine too.

Data that is accessed from a work item can't be freed under the workqueue by
design in Rust.

[1] https://lore.kernel.org/rust-for-linux/Z7-0pOmWO6r_KeQI at boqun-archlinux/

Jason Gunthorpe

2025-Feb-27 15:07 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Thu, Feb 27, 2025 at 12:32:45PM +0100, Danilo Krummrich
wrote:> On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > > Let's take a step back and look again why we have Devres (and
Revocable) for
> > > e.g. pci::Bar.
> > > 
> > > The device / driver model requires that device resources are only
held by a
> > > driver, as long as the driver is bound to the device.
> > > 
> > > For instance, in C we achieve this by calling
> > > 
> > > 	pci_iounmap()
> > > 	pci_release_region()
> > > 
> > > from remove().
> > > 
> > > We rely on this, we trust drivers to actually do this.
> > 
> > Right, exactly
> > 
> > But it is not just PCI bar. There are a *huge* number of kernel APIs
> > that have built in to them the same sort of requirement - teardown
> > MUST run with remove, and once done the resource cannot be used by
> > another thread.
> > 
> > Basically most things involving function pointers has this sort of
> > lifecycle requirement because it is a common process that prevents a
> > EAF of module unload.
> 
> You're still mixing topics, the whole Devres<pci::Bar> thing as
about limiting
> object lifetime to the point where the driver is unbound.
> 
> Shutting down asynchronous execution of things, i.e. workqueues, timers,
IOCTLs
> to prevent unexpected access to the module .text section is a whole
different
> topic.
Again, the standard kernel design pattern is to put these things
together so that shutdown isolates concurrency which permits free
without UAF.
> In other words, assuming that we properly enforce that there are no async
> execution paths after remove() or module_exit() (not necessarily the same),
> we still need to ensure that a pci::Bar object does not outlive remove().
Yes, you just have to somehow use rust to ensure a call pci_iounmap()
happens during remove, after the isolation.

You are already doing it with devm.  It seems to me the only problem
you have is nobody has invented a way in rust to contract that the devm
won't run until the threads are isolated.

I don't see this as insolvable, you can have some input argument to
any API that creates concurrency that also pushes an ordered
destructor to the struct device lifecycle that ensures it cancels that
concurrency.
> Device resources are a bit special, since their lifetime must be cap'd
at device
> unbind, *independent* of the object lifetime they reside in. Hence the
Devres
> container.
I'd argue many resources should be limited to device unbind. Memory is
perhaps the only exception.
> > My fear, that is intensifying as we go through this discussion, is
> > that rust binding authors have not fully comprehended what the kernel
> > life cycle model and common design pattern actually is, and have not
> > fully thought through issues like module unload creating a lifetime
> > cycle for *function pointers*.
> 
> I do *not* see where you take the evidence from to make such a generic
> statement.
Well, I take the basic insistance that is OK to leak stuff from driver
scope to module scope is not well designed.
> Especially because there aren't a lot of abstractions upstream yet that
fall
> under this category.
And I am thinking forward to other APIs you will need and how they
will interact and not feeling good about this direction.
> > The thing is once you have a mechanism to shutdown all the stuff you
> > don't need the overhead of this revocable checking on the normal
> > paths. What you need is a way to bring your pci::Bar into a safety
> > contract that remove will shootdown concurrency and that directly
> > denies references to pci::Bar, and the same contract will guarentee it
> > frees pci::Bar memory.
> 
> This contract needs to be technically enforced, not by convention as
> we do in C.
People do amazing things with contracts in rust, why is this case so
hard?
> Data that is accessed from a work item can't be freed under the
> workqueue by design in Rust.
What? That's madness, alot of work functions are freeing
something. They are often the terminal point of an object's lifecycle
because you often have to allocate memory to launch the work in the
first place.

Certainly if you restrict workqueues to be very limited then alot of
their challenging problems disappear :\

Jason

Nouveau - Feb 2025 - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation