thr3ads.net - Nouveau - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2025-Feb-25 16:09 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes
wrote:> 
> 
> On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> >> Hi Danilo,
> >>
> >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich
wrote:
> >>>> CC: Gary
> >>>>
> >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre
Courbot wrote:
> >>>>> This inability to sleep while we are accessing
registers seems very
> >>>>> constraining to me, if not dangerous. It is pretty
common to have
> >>>>> functions intermingle hardware accesses with other
operations that might
> >>>>> sleep, and this constraint means that in such cases
the caller would
> >>>>> need to perform guard lifetime management manually:
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   /* do something that might sleep */
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   ...
> >>>>>
> >>>>> Failure to drop the guard potentially introduces a
race condition, which
> >>>>> will receive no compile-time warning and potentialy
not even a runtime
> >>>>> one unless lockdep is enabled. This problem does not
exist with the
> >>>>> equivalent C code AFAICT
> >>>
> >>> Without klint [1] it is exactly the same as in C, where I have
to remember to
> >>> not call into something that might sleep from atomic context.
> >>>
> >>
> >> Sure, but in C, a sequence of MMIO accesses don't need to be
constrained to
> >> not sleeping?
> > 
> > It's not that MMIO needs to be constrained to not sleeping in Rust
either. It's
> > just that the synchronization mechanism (RCU) used for the Revocable
type
> > implies that.
> > 
> > In C we have something that is pretty similar with drm_dev_enter() /
> > drm_dev_exit() even though it is using SRCU instead and is specialized
to DRM.
> > 
> > In DRM this is used to prevent accesses to device resources after the
device has
> > been unplugged.
> 
> Thanks a lot for the response. Might it make more sense to use SRCU then?
The
> use of RCU seems overly restrictive due to the no-sleep-while-guard-held
thing.
Allowing to hold on to the guard for too long is a bit contradictive to the goal
of detecting hotunplug I guess.

Besides that I don't really see why we can't just re-acquire it after we
sleep?
Rust provides good options to implement it ergonimcally I think.
> 
> Another colleague told me RDMA also uses SRCU for a similar purpose as
well.
See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at phenom.ffwll.local/
> 
> >> I am fairly new to rust, could you help elaborate more about why
these MMIO
> >> accesses need to have RevocableGuard in Rust? What problem are we
trying to
> >> solve that C has but Rust doesn't with the aid of a RCU
read-side section? I
> >> vaguely understand we are trying to "wait for an MMIO
access" using
> >> synchronize here, but it is just a guest.
> > 
> > Similar to the above, in Rust it's a safety constraint to prevent
MMIO accesses
> > to unplugged devices.
> > 
> > The exact type in Rust in this case is Devres<pci::Bar>. Within
Devres, the
> > pci::Bar is placed in a Revocable. The Revocable is revoked when the
device
> > is detached from the driver (for instance because it has been
unplugged).
> 
> I guess the Devres concept of revoking resources on driver detach is not a
rust
> thing (even for PCI)... but correct me if I'm wrong.
I'm not sure what you mean with that, can you expand a bit?
> 
> > By revoking the Revocable, the pci::Bar is dropped, which implies that
it's also
> > unmapped; a subsequent call to try_access() would fail.
> > 
> > But yes, if the device is unplugged while holding the RCU guard, one
is on their
> > own; that's also why keeping the critical sections short is
desirable.
> 
> I have heard some concern around whether Rust is changing the driver model
when
> it comes to driver detach / driver remove.  Can you elaborate may be a bit
about
> how Rust changes that mechanism versus C, when it comes to that?
I think that one is simple, Rust does *not* change the driver model.

What makes you think so?
> Ideally we
> would not want Rust drivers to have races with user space accesses when
they are
> detached/remove. But we also don't want accesses to be non-sleepable
sections
> where this guard is held, it seems restrictive (though to your point the
> sections are expected to be small).
In the very extreme case, nothing prevents you from implementing a wrapper like:

	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) ->
Result<u32> {
		let bar = bar.try_access()?;
		bar.read32(offset);
	}

Which limits the RCU read side critical section to my_write32().

Similarly you can have custom functions for short sequences of I/O ops, or use
closures. I don't understand the concern.

Joel Fernandes

2025-Feb-25 21:02 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich
wrote:> On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> > 
> > 
> > On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> > >> Hi Danilo,
> > >>
> > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich
wrote:
> > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo
Krummrich wrote:
> > >>>> CC: Gary
> > >>>>
> > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre
Courbot wrote:
> > >>>>> This inability to sleep while we are accessing
registers seems very
> > >>>>> constraining to me, if not dangerous. It is
pretty common to have
> > >>>>> functions intermingle hardware accesses with
other operations that might
> > >>>>> sleep, and this constraint means that in such
cases the caller would
> > >>>>> need to perform guard lifetime management
manually:
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   /* do something that might sleep */
> > >>>>>
> > >>>>>   let bar_guard = bar.try_access()?;
> > >>>>>   /* do something non-sleeping with bar_guard */
> > >>>>>   drop(bar_guard);
> > >>>>>
> > >>>>>   ...
> > >>>>>
> > >>>>> Failure to drop the guard potentially introduces
a race condition, which
> > >>>>> will receive no compile-time warning and
potentialy not even a runtime
> > >>>>> one unless lockdep is enabled. This problem does
not exist with the
> > >>>>> equivalent C code AFAICT
> > >>>
> > >>> Without klint [1] it is exactly the same as in C, where I
have to remember to
> > >>> not call into something that might sleep from atomic
context.
> > >>>
> > >>
> > >> Sure, but in C, a sequence of MMIO accesses don't need to
be constrained to
> > >> not sleeping?
> > > 
> > > It's not that MMIO needs to be constrained to not sleeping in
Rust either. It's
> > > just that the synchronization mechanism (RCU) used for the
Revocable type
> > > implies that.
> > > 
> > > In C we have something that is pretty similar with
drm_dev_enter() /
> > > drm_dev_exit() even though it is using SRCU instead and is
specialized to DRM.
> > > 
> > > In DRM this is used to prevent accesses to device resources after
the device has
> > > been unplugged.
> > 
> > Thanks a lot for the response. Might it make more sense to use SRCU
then? The
> > use of RCU seems overly restrictive due to the
no-sleep-while-guard-held thing.
> 
> Allowing to hold on to the guard for too long is a bit contradictive to the
goal
> of detecting hotunplug I guess.
> 
> Besides that I don't really see why we can't just re-acquire it
after we sleep?
> Rust provides good options to implement it ergonimcally I think.
> 
> > 
> > Another colleague told me RDMA also uses SRCU for a similar purpose as
well.
> 
> See the reasoning against SRCU from Sima [1], what's the reasoning of
RDMA?
> 
> [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at phenom.ffwll.local/
Hmm, so you're saying SRCU sections blocking indefinitely is a concern as
per
that thread. But I think SRCU GPs should not be stalled in normal operation.
If it is, that is a bug anyway. Stalling SRCU grace periods is not really a
good thing anyway, you could run out of memory (even though stalling RCU is
even more dangerous).

For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
me if I'm wrong about the RDMA user but this is what I recollect discussing
with you.
> > 
> > >> I am fairly new to rust, could you help elaborate more about
why these MMIO
> > >> accesses need to have RevocableGuard in Rust? What problem
are we trying to
> > >> solve that C has but Rust doesn't with the aid of a RCU
read-side section? I
> > >> vaguely understand we are trying to "wait for an MMIO
access" using
> > >> synchronize here, but it is just a guest.
> > > 
> > > Similar to the above, in Rust it's a safety constraint to
prevent MMIO accesses
> > > to unplugged devices.
> > > 
> > > The exact type in Rust in this case is Devres<pci::Bar>.
Within Devres, the
> > > pci::Bar is placed in a Revocable. The Revocable is revoked when
the device
> > > is detached from the driver (for instance because it has been
unplugged).
> > 
> > I guess the Devres concept of revoking resources on driver detach is
not a rust
> > thing (even for PCI)... but correct me if I'm wrong.
> 
> I'm not sure what you mean with that, can you expand a bit?
I was reading the devres documentation earlier. It mentios that one of its
use is to clean up resources. Maybe I mixed up the meaning of "clean
up" and
"revoke" as I was reading it.

Honestly, I am still confused a bit by the difference between
"revoking" and
"cleaning up".
> > 
> > > By revoking the Revocable, the pci::Bar is dropped, which implies
that it's also
> > > unmapped; a subsequent call to try_access() would fail.
> > > 
> > > But yes, if the device is unplugged while holding the RCU guard,
one is on their
> > > own; that's also why keeping the critical sections short is
desirable.
> > 
> > I have heard some concern around whether Rust is changing the driver
model when
> > it comes to driver detach / driver remove.  Can you elaborate may be a
bit about
> > how Rust changes that mechanism versus C, when it comes to that?
> 
> I think that one is simple, Rust does *not* change the driver model.
> 
> What makes you think so?
Well, the revocable concept for one is rust-only right?

It is also possibly just some paranoia based on discussions, but I'm not
sure
at the moment.
> > Ideally we
> > would not want Rust drivers to have races with user space accesses
when they are
> > detached/remove. But we also don't want accesses to be
non-sleepable sections
> > where this guard is held, it seems restrictive (though to your point
the
> > sections are expected to be small).
> 
> In the very extreme case, nothing prevents you from implementing a wrapper
like:
> 
> 	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) ->
Result<u32> {
> 		let bar = bar.try_access()?;
> 		bar.read32(offset);
> 	}
> 
> Which limits the RCU read side critical section to my_write32().
> 
> Similarly you can have custom functions for short sequences of I/O ops, or
use
> closures. I don't understand the concern.
Yeah, this is certainly possible. I think one concern is similar to what you
raised on the other thread you shared [1]:
"Maybe we even want to replace it with SRCU entirely to ensure that drivers
can't stall the RCU grace period for too long by accident."

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at phenom.ffwll.local/

thanks,

 - Joel

Danilo Krummrich

2025-Feb-25 22:02 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, Feb 25, 2025 at 04:02:28PM -0500, Joel Fernandes
wrote:> On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> > > 
> > > 
> > > On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes
wrote:
> > > >> Hi Danilo,
> > > >>
> > > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo
Krummrich wrote:
> > > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo
Krummrich wrote:
> > > >>>> CC: Gary
> > > >>>>
> > > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900,
Alexandre Courbot wrote:
> > > >>>>> This inability to sleep while we are
accessing registers seems very
> > > >>>>> constraining to me, if not dangerous. It is
pretty common to have
> > > >>>>> functions intermingle hardware accesses with
other operations that might
> > > >>>>> sleep, and this constraint means that in
such cases the caller would
> > > >>>>> need to perform guard lifetime management
manually:
> > > >>>>>
> > > >>>>>   let bar_guard = bar.try_access()?;
> > > >>>>>   /* do something non-sleeping with
bar_guard */
> > > >>>>>   drop(bar_guard);
> > > >>>>>
> > > >>>>>   /* do something that might sleep */
> > > >>>>>
> > > >>>>>   let bar_guard = bar.try_access()?;
> > > >>>>>   /* do something non-sleeping with
bar_guard */
> > > >>>>>   drop(bar_guard);
> > > >>>>>
> > > >>>>>   ...
> > > >>>>>
> > > >>>>> Failure to drop the guard potentially
introduces a race condition, which
> > > >>>>> will receive no compile-time warning and
potentialy not even a runtime
> > > >>>>> one unless lockdep is enabled. This problem
does not exist with the
> > > >>>>> equivalent C code AFAICT
> > > >>>
> > > >>> Without klint [1] it is exactly the same as in C,
where I have to remember to
> > > >>> not call into something that might sleep from atomic
context.
> > > >>>
> > > >>
> > > >> Sure, but in C, a sequence of MMIO accesses don't
need to be constrained to
> > > >> not sleeping?
> > > > 
> > > > It's not that MMIO needs to be constrained to not
sleeping in Rust either. It's
> > > > just that the synchronization mechanism (RCU) used for the
Revocable type
> > > > implies that.
> > > > 
> > > > In C we have something that is pretty similar with
drm_dev_enter() /
> > > > drm_dev_exit() even though it is using SRCU instead and is
specialized to DRM.
> > > > 
> > > > In DRM this is used to prevent accesses to device resources
after the device has
> > > > been unplugged.
> > > 
> > > Thanks a lot for the response. Might it make more sense to use
SRCU then? The
> > > use of RCU seems overly restrictive due to the
no-sleep-while-guard-held thing.
> > 
> > Allowing to hold on to the guard for too long is a bit contradictive
to the goal
> > of detecting hotunplug I guess.
> > 
> > Besides that I don't really see why we can't just re-acquire
it after we sleep?
> > Rust provides good options to implement it ergonimcally I think.
> > 
> > > 
> > > Another colleague told me RDMA also uses SRCU for a similar
purpose as well.
> > 
> > See the reasoning against SRCU from Sima [1], what's the reasoning
of RDMA?
> > 
> > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at
phenom.ffwll.local/
> 
> Hmm, so you're saying SRCU sections blocking indefinitely is a concern
as per
> that thread. But I think SRCU GPs should not be stalled in normal
operation.
> If it is, that is a bug anyway. Stalling SRCU grace periods is not really a
> good thing anyway, you could run out of memory (even though stalling RCU is
> even more dangerous).
I'm saying that extending the time of critical sections is a concern,
because
it's more likely to miss the unplug event and it's just not necessary.
You grab
the guard, do a few I/O ops and drop it -- simple.

If you want to sleep in between just re-acquire it when you're done
sleeping.
You can easily avoid explicit drop(guard) calls by moving critical sections to
their own function or closures.

I still don't understand why you're thinking that it's crucial to
sleep while
holding the RevocableGuard?
> 
> For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason,
correct
> me if I'm wrong about the RDMA user but this is what I recollect
discussing
> with you.
> 
> > > 
> > > >> I am fairly new to rust, could you help elaborate more
about why these MMIO
> > > >> accesses need to have RevocableGuard in Rust? What
problem are we trying to
> > > >> solve that C has but Rust doesn't with the aid of a
RCU read-side section? I
> > > >> vaguely understand we are trying to "wait for an
MMIO access" using
> > > >> synchronize here, but it is just a guest.
> > > > 
> > > > Similar to the above, in Rust it's a safety constraint
to prevent MMIO accesses
> > > > to unplugged devices.
> > > > 
> > > > The exact type in Rust in this case is
Devres<pci::Bar>. Within Devres, the
> > > > pci::Bar is placed in a Revocable. The Revocable is revoked
when the device
> > > > is detached from the driver (for instance because it has
been unplugged).
> > > 
> > > I guess the Devres concept of revoking resources on driver detach
is not a rust
> > > thing (even for PCI)... but correct me if I'm wrong.
> > 
> > I'm not sure what you mean with that, can you expand a bit?
> 
> I was reading the devres documentation earlier. It mentios that one of its
> use is to clean up resources. Maybe I mixed up the meaning of "clean
up" and
> "revoke" as I was reading it.
> 
> Honestly, I am still confused a bit by the difference between
"revoking" and
> "cleaning up".
The Devres [1] implementation implements the devres callback that is called when
the
device is unbound from the driver.

Once that happens, it revokes the underlying resource (e.g. the PCI bar mapping)
by using a Revocable [2] internally. Once the resource is revoked, try_access()
returns None and the resource (e.g. pci::Bar is dropped). By dropping the
pci::Bar the mapping is unmapped and the resource region is removed (which is
typically called cleanup).

[1] https://rust.docs.kernel.org/kernel/devres/struct.Devres.html
[2] https://rust.docs.kernel.org/kernel/revocable/struct.Revocable.html
> 
> > > 
> > > > By revoking the Revocable, the pci::Bar is dropped, which
implies that it's also
> > > > unmapped; a subsequent call to try_access() would fail.
> > > > 
> > > > But yes, if the device is unplugged while holding the RCU
guard, one is on their
> > > > own; that's also why keeping the critical sections short
is desirable.
> > > 
> > > I have heard some concern around whether Rust is changing the
driver model when
> > > it comes to driver detach / driver remove.  Can you elaborate may
be a bit about
> > > how Rust changes that mechanism versus C, when it comes to that?
> > 
> > I think that one is simple, Rust does *not* change the driver model.
> > 
> > What makes you think so?
> 
> Well, the revocable concept for one is rust-only right?
Yes, but that has nothing to do with changing the driver model. It is just an
additional implementation detail to ensure safety.

IIRC there are also have been efforts for a similar mechanism in C.
> 
> It is also possibly just some paranoia based on discussions, but I'm
not sure
> at the moment.
Again there is nothing different to C, except one additional step to ensure
safety. For instance, let's take devm_kzalloc(). Once the device is detached
from the driver the memory allocated with this function is freed automatically.

The additional step in Rust is, that we'd not only free the memory, but also
revoke the access to the pointer that has been returned by devm_kzalloc() for
the driver, such that it can't be used by accident anymore.

Besides that, I'd be interested to what kind of discussion you're
referring to.
> 
> > > Ideally we
> > > would not want Rust drivers to have races with user space
accesses when they are
> > > detached/remove. But we also don't want accesses to be
non-sleepable sections
> > > where this guard is held, it seems restrictive (though to your
point the
> > > sections are expected to be small).
> > 
> > In the very extreme case, nothing prevents you from implementing a
wrapper like:
> > 
> > 	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) ->
Result<u32> {
> > 		let bar = bar.try_access()?;
> > 		bar.read32(offset);
> > 	}
> > 
> > Which limits the RCU read side critical section to my_write32().
> > 
> > Similarly you can have custom functions for short sequences of I/O
ops, or use
> > closures. I don't understand the concern.
> 
> Yeah, this is certainly possible. I think one concern is similar to what
you
> raised on the other thread you shared [1]:
> "Maybe we even want to replace it with SRCU entirely to ensure that
drivers
> can't stall the RCU grace period for too long by accident."
Yeah, I was just thinking out loud, but I think it wasn't a good idea -- we
really do want to keep the critical sections short, so RCU is fine. Prohibit
drivers to use RCU, just because they could mess up, wasn't a good reason.
> 
> [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at phenom.ffwll.local/
> 
> thanks,
> 
>  - Joel
> 
>

Jason Gunthorpe

2025-Feb-25 22:57 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, Feb 25, 2025 at 04:02:28PM -0500, Joel Fernandes
wrote:> > Besides that I don't really see why we can't just re-acquire
it after we sleep?
> > Rust provides good options to implement it ergonimcally I think.
> > 
> > > 
> > > Another colleague told me RDMA also uses SRCU for a similar
purpose as well.
> > 
> > See the reasoning against SRCU from Sima [1], what's the reasoning
of RDMA?
> > 
> > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6 at
phenom.ffwll.local/
> For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason,
correct
> me if I'm wrong about the RDMA user but this is what I recollect
discussing
> with you.
In RDMA SRCU is not unbounded. It is limited to a system call
duration, and we don't have system calls that become time unbounded
inside drivers.

The motivation for RDMA was not really hotplug, but to support kernel
module upgrade. Some data center HA users were very interested in
this. To achieve it the driver module itself cannot have an elevated
module refcount. This requires swapping the module refcount for a
sleepable RW lock like SRCU or rwsem protecting all driver
callbacks. [1]

To be very clear, in RDMA you can open /dev/infiniband/uverbsX, run a
ioctl on the FD and then successfully rmmod the driver module while
the FD is open and while the ioctl is running. Any driver op will
complete, future ioctls will fail, and the module will complete.

So, from my perspective, this revocable idea would completely destroy
the actual production purpose we built the fast hot-plug machinery
for. It does not guarentee that driver threads are fenced prior to
completing remove. Intead it must rely on the FD itself to hold the
module refcount on the driver to keep the .text alive while driver
callbacks continue to be called. Making the module refcount linked to
userspace closing a FD renders the module unloadable in practice.

The common driver shutdown process in the kernel, that is well tested
and copied, makes the driver single threaded during the remove()
callback. Effectively instead of trying to micro-revoke individual
resources we revoke all concurrency threads and then it is completely
safe to destroy all the resources. This also guarentees that after
completing remove there is no Execute After Free risk to the driver
code.

SRCU/rwsem across all driver ops function pointer calls is part of
this scheme, but also things like cancling/flushing work queues,
blocking new work submission, preventing interrupts, removing syfs
files (they block concurrent threads internally), synchronizing any
possibly outstanding RCU callbacks, and more.

So, I'd suggest that if you have system calls that wait, the typical
expected solution would be to shoot down the waits during a remove
event so they can become time bounded.
> > > I have heard some concern around whether Rust is changing the
driver model when
> > > it comes to driver detach / driver remove.  Can you elaborate may
be a bit about
> > > how Rust changes that mechanism versus C, when it comes to that?
> > 
> > I think that one is simple, Rust does *not* change the driver model.
I think this resource-revoke idea is deviating from the normal
expected driver model by allowing driver code to continue to run in
other threads once remove completes. That is definitely abnormal at
least.

It is not necessarily *wrong*, but it sure is weird and as I explained
above it has bad system level properties.

Further, it seems to me there is a very unique DRM specific issue at
work "time unbounded driver callbacks". A weird solution to this
should not be baked into the common core kernel rust bindings and
break the working model of all other subsystems that don't have that
problem.
> > Similarly you can have custom functions for short sequences of I/O
ops, or use
> > closures. I don't understand the concern.
> 
> Yeah, this is certainly possible. I think one concern is similar to what
you
> raised on the other thread you shared [1]:
> "Maybe we even want to replace it with SRCU entirely to ensure that
drivers
> can't stall the RCU grace period for too long by accident."
I'd be worried about introducing a whole bunch more untestable failure
paths in drivers. Especially in areas like work queue submit that are
designed not to fail [2]. Non-failing work queues is a critical property
that I've relied on countless times. I'm not sure you even *can* recover
from this correctly in all cases.

Then in the other email did you say that even some memory allocations
go into this scheme? Yikes!

Further, hiding a synchronize_rcu in a devm destructor [3], once per
revocable object is awful. If you imagine having a rcu around each of
your revocable objects, how many synchronize_rcu()s is devm going to
call post-remove()?

On a busy server it is known to take a long time. So it is easy to
imagine driver remove times going into the many 10's of seconds for no
good reason. Maybe even multiple minutes if the driver ends up with
many of these objects.

[1] - Module .text is not unplugged from the kernel until all probed
drivers affiliated with that module have completed their remove
operations.

[2] - It is important that drivers shut down all their concurrency in
workqueues during remove because work queues do not hold the module
refcount. The only way .text lifecyle works is for drivers using work
queues is to rely on [1] to protect against Execute after Free.

[3] - Personally I agree with Laurent's points and I strongly dislike
devm. I'm really surprised to see Rust using it, I imagined Rust has
sufficiently strong object lifecycle management that it would not be
needed :(

Jason

Nouveau - Feb 2025 - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation