thr3ads.net - Nouveau - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2025-Feb-25 15:06 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot
wrote:> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> > CC: Gary
> >
> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> This inability to sleep while we are accessing registers seems
very
> >> constraining to me, if not dangerous. It is pretty common to have
> >> functions intermingle hardware accesses with other operations that
might
> >> sleep, and this constraint means that in such cases the caller
would
> >> need to perform guard lifetime management manually:
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   /* do something that might sleep */
> >> 
> >>   let bar_guard = bar.try_access()?;
> >>   /* do something non-sleeping with bar_guard */
> >>   drop(bar_guard);
> >> 
> >>   ...
> >> 
> >> Failure to drop the guard potentially introduces a race condition,
which
> >> will receive no compile-time warning and potentialy not even a
runtime
> >> one unless lockdep is enabled. This problem does not exist with
the
> >> equivalent C code AFAICT, which makes the Rust version actually
more
> >> error-prone and dangerous, the opposite of what we are trying to
achieve
> >> with Rust. Or am I missing something?
> >
> > Generally you are right, but you have to see it from a different
perspective.
> >
> > What you describe is not an issue that comes from the design of the
API, but is
> > a limitation of Rust in the kernel. People are aware of the issue and
with klint
> > [1] there are solutions for that in the pipeline, see also [2] and
[3].
> >
> > [1] https://rust-for-linux.com/klint
> > [2]
https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> 
> Thanks, I wasn't aware of klint and it looks indeed cool, even if not
perfect by
> its own admission. But even if the ignore the safety issue, the other one
> (ergonomics) is still there.
> 
> Basically this way of accessing registers imposes quite a mental burden on
its
> users. It requires a very different (and harsher) discipline than when
writing
> the same code in C
We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
drm_dev_unplug().
> and the correct granularity to use is unclear to me.
> 
> For instance, if I want to do the equivalent of Nouveau's nvkm_usec()
to poll a
> particular register in a busy loop, should I call try_access() once before
the
> loop? Or every time before accessing the register?
I think we should re-acquire the guard in each iteration and drop it before the
delay. I think a simple closure would work very well for this pattern?
> I'm afraid having to check
> that the resource is still alive before accessing any register is going to
> become tedious very quickly.
> 
> I understand that we want to protect against accessing the IO region of an
> unplugged device ; but still there is no guarantee that the device
won't be
> unplugged in the middle of a critical section, however short. Thus the
driver
> code should be able to recognize that the device has fallen off the bus
when it
> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> extra protection that AFAICT isn't used in C?
As mentioned above, we already do similar things in C.

Also, think about what's the alternative. If we remove the Devres wrapper of
pci::Bar, we lose the control over the lifetime of the bar mapping and it can
easily out-live the device / driver binding. This makes the API unsound.

With this drivers would be able to keep resources acquired. What if after a
hotplug the physical address region is re-used and to be mapped by another
driver?

Alexandre Courbot

2025-Feb-25 15:23 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich
wrote:> On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
>> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
>> > CC: Gary
>> >
>> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>> >> This inability to sleep while we are accessing registers seems
very
>> >> constraining to me, if not dangerous. It is pretty common to
have
>> >> functions intermingle hardware accesses with other operations
that might
>> >> sleep, and this constraint means that in such cases the caller
would
>> >> need to perform guard lifetime management manually:
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   /* do something that might sleep */
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   ...
>> >> 
>> >> Failure to drop the guard potentially introduces a race
condition, which
>> >> will receive no compile-time warning and potentialy not even a
runtime
>> >> one unless lockdep is enabled. This problem does not exist
with the
>> >> equivalent C code AFAICT, which makes the Rust version
actually more
>> >> error-prone and dangerous, the opposite of what we are trying
to achieve
>> >> with Rust. Or am I missing something?
>> >
>> > Generally you are right, but you have to see it from a different
perspective.
>> >
>> > What you describe is not an issue that comes from the design of
the API, but is
>> > a limitation of Rust in the kernel. People are aware of the issue
and with klint
>> > [1] there are solutions for that in the pipeline, see also [2] and
[3].
>> >
>> > [1] https://rust-for-linux.com/klint
>> > [2]
https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
>> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
>> 
>> Thanks, I wasn't aware of klint and it looks indeed cool, even if
not perfect by
>> its own admission. But even if the ignore the safety issue, the other
one
>> (ergonomics) is still there.
>> 
>> Basically this way of accessing registers imposes quite a mental burden
on its
>> users. It requires a very different (and harsher) discipline than when
writing
>> the same code in C
>
> We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit()
and
> drm_dev_unplug().
Granted, but the use of these is much more coarsed-grained than what is
expected of IO resources, right?
>
>> and the correct granularity to use is unclear to me.
>> 
>> For instance, if I want to do the equivalent of Nouveau's
nvkm_usec() to poll a
>> particular register in a busy loop, should I call try_access() once
before the
>> loop? Or every time before accessing the register?
>
> I think we should re-acquire the guard in each iteration and drop it before
the
> delay. I think a simple closure would work very well for this pattern?
>
>> I'm afraid having to check
>> that the resource is still alive before accessing any register is going
to
>> become tedious very quickly.
>> 
>> I understand that we want to protect against accessing the IO region of
an
>> unplugged device ; but still there is no guarantee that the device
won't be
>> unplugged in the middle of a critical section, however short. Thus the
driver
>> code should be able to recognize that the device has fallen off the bus
when it
>> e.g. gets a bunch of 0xff instead of a valid value. So do we really
need to
>> extra protection that AFAICT isn't used in C?
>
> As mentioned above, we already do similar things in C.
>
> Also, think about what's the alternative. If we remove the Devres
wrapper of
> pci::Bar, we lose the control over the lifetime of the bar mapping and it
can
> easily out-live the device / driver binding. This makes the API unsound.
Oh my issue is not with the Devres wrapper, I think it makes sense -
it's more the use of RCU to control access to the resource that I find
too constraining. And I'm pretty sure there will be more users of the
same opinion as more drivers using it get written.
>
> With this drivers would be able to keep resources acquired. What if after a
> hotplug the physical address region is re-used and to be mapped by another
> driver?
Actually - wouldn't that issue also be addressed by a PCI equivalent to
drm_dev_enter() and friends that ensures the device (and thus its
devres resources) stay in place?

Using Rust, I can imagine (but not picture precisely yet) some method of
the device that returns a reference to an inner structure containing its
resources, available with immediate access. Since it would be
coarser-grained, it could rely on something less constraining than RCU
without a noticeable performance penalty.

Nouveau - Feb 2025 - [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation