thr3ads.net - Nouveau - Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header) [Sep 2025]

If this information is useful, please help other people find it:
Share via:

Alexandre Courbot

2025-Aug-27 08:47 UTC

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote:
<snip>>> +    /// Returns the data payload of the firmware, or `None` if the
data range is out of bounds of
>> +    /// the firmware image.
>> +    fn data(&self) -> Option<&[u8]> {
>> +        let fw_start = self.hdr.data_offset as usize;
>> +        let fw_size = self.hdr.data_size as usize;
>> +
>> +        self.fw.get(fw_start..fw_start + fw_size)
>
> This worries me a bit, because we never checked that these bounds
> are reasonable: within the range of the firmware, and not overflowing
> (.checked_add() for example), that sort of thing.
>
> Thoughts?
`get` returns `None` if the requested slice is out of bounds, so there
should be no risk of panicking here.

However, `fw_start + fw_size` can panic in debug configuration if it
overflows. In a release build I believe it will just happily wrap, and
`get` should consequently return `None` at the invalid range... Although
we can also get unlucky and produce a valid, yet incorrect, one.

This is actually something I've been thinking about while writing this
series and could not really decide upon: how to deal with operands and
functions in Rust that can potentially panic. Using `checked` operands
everywhere is a bit tedious, and even with great care there is no way to
guarantee that no panic occurs in a given function.

Panics are a big no-no in the kernel, yet I don't feel like we have the
proper tools to ensure they do not happen.

User-space has some crates like `no_panic`, but even these feel more
like hacks than anything else. Something at the compiler level would be
nice.

Maybe that would be a good discussion topic for the Plumber
Microconference?

John Hubbard

2025-Aug-27 21:50 UTC

head link

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

On 8/27/25 1:47 AM, Alexandre Courbot wrote:> On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote:
> <snip>
>>> +    /// Returns the data payload of the firmware, or `None` if the
data range is out of bounds of
>>> +    /// the firmware image.
>>> +    fn data(&self) -> Option<&[u8]> {
>>> +        let fw_start = self.hdr.data_offset as usize;
>>> +        let fw_size = self.hdr.data_size as usize;
>>> +
>>> +        self.fw.get(fw_start..fw_start + fw_size)
>>
>> This worries me a bit, because we never checked that these bounds
>> are reasonable: within the range of the firmware, and not overflowing
>> (.checked_add() for example), that sort of thing.
>>
>> Thoughts?
> 
> `get` returns `None` if the requested slice is out of bounds, so there
> should be no risk of panicking here.
I was wondering about the bounds themselves, though. Couldn't they
be wrong? (Do we care?)
> 
> However, `fw_start + fw_size` can panic in debug configuration if it
> overflows. In a release build I believe it will just happily wrap, and
> `get` should consequently return `None` at the invalid range... Although
> we can also get unlucky and produce a valid, yet incorrect, one.
> 
> This is actually something I've been thinking about while writing this
> series and could not really decide upon: how to deal with operands and
> functions in Rust that can potentially panic. Using `checked` operands
> everywhere is a bit tedious, and even with great care there is no way to
> guarantee that no panic occurs in a given function.
Yes, .checked_add() all over the place is just awful, would like to 
avoid that for sure.
> 
> Panics are a big no-no in the kernel, yet I don't feel like we have the
> proper tools to ensure they do not happen.
> 
> User-space has some crates like `no_panic`, but even these feel more
> like hacks than anything else. Something at the compiler level would be
> nice.
> 
> Maybe that would be a good discussion topic for the Plumber
> Microconference?
Yes. And maybe even for Kangrejos.

thanks,
-- 
John Hubbard

Miguel Ojeda

2025-Aug-28 11:26 UTC

head link

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at
nvidia.com> wrote:>
> However, `fw_start + fw_size` can panic in debug configuration if it
> overflows. In a release build I believe it will just happily wrap, and
In the kernel, it is a panic in the default configuration, not just a debug one.

We have debug assertions too -- and those are disabled by default, but
they are separate from the overflow checking, which is the one enabled
by default.

So, any use of those operators is limited to cases where one knows,
somehow, that it will not overflow. And e.g. user-controlled inputs
cannot use them at all.

So, conceptually, something like this:

  - Static assert if the compiler knows it cannot fail.
  - Build assert if the optimizer knows it cannot fail.
  - Unfallible (like the possibly panicking operators) if the
developer knows it cannot fail.
  - Fallible/wrapping/saturating/... if the developer isn't sure or it
simply cannot be known until runtime. User-derived inputs must use
this option (or rarely the unsafe one).
  - Unsafe if the developer knows it cannot fail and the other options
are not acceptable for some reason. Ideally paired with a debug
assertion (the compiler adds these already for many unsafe
preconditions).

In the past I requested upstream Rust a way to have a "third mode"
("report and continue") for the operators so that it would wrap (like
the non-panicking mode) but allowing us to add a customization point
so that we can e.g. `WARN_ON_ONCE`.

That would be ideal, because it is a nice middle ground that is not as
punishing if a developer gets it wrong, especially for "normal users",
where panics typically mean lost reports etc. And other kernel users,
like cloud operators, can keep the panicking mode.

As for discussing no-panic, sure!

Cheers,
Miguel

Alexandre Courbot

2025-Sep-10 05:44 UTC

head link

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Hi Miguel, sorry for the delay in replying!

On Thu Aug 28, 2025 at 8:26 PM JST, Miguel Ojeda wrote:> On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at
nvidia.com> wrote:
>>
>> However, `fw_start + fw_size` can panic in debug configuration if it
>> overflows. In a release build I believe it will just happily wrap, and
>
> In the kernel, it is a panic in the default configuration, not just a debug
one.
>
> We have debug assertions too -- and those are disabled by default, but
> they are separate from the overflow checking, which is the one enabled
> by default.
>
> So, any use of those operators is limited to cases where one knows,
> somehow, that it will not overflow. And e.g. user-controlled inputs
> cannot use them at all.
>
> So, conceptually, something like this:
>
>   - Static assert if the compiler knows it cannot fail.
>   - Build assert if the optimizer knows it cannot fail.
>   - Unfallible (like the possibly panicking operators) if the
> developer knows it cannot fail.
>   - Fallible/wrapping/saturating/... if the developer isn't sure or it
> simply cannot be known until runtime. User-derived inputs must use
> this option (or rarely the unsafe one).
>   - Unsafe if the developer knows it cannot fail and the other options
> are not acceptable for some reason. Ideally paired with a debug
> assertion (the compiler adds these already for many unsafe
> preconditions).
>
> In the past I requested upstream Rust a way to have a "third
mode"
> ("report and continue") for the operators so that it would wrap
(like
> the non-panicking mode) but allowing us to add a customization point
> so that we can e.g. `WARN_ON_ONCE`.
That would be nice, but also wouldn't cover all the cases where implicit
panics can happen, like out-of-bounds slice accesses - we can't have a
"report-and-continue" mode for these.

And that's really the elephant in the room IMHO: such panic sites can be
introduced implicitly, without the programmer realizing it, potentially
resulting in more runtime panics for Rust modules than one might expect
from a language whose main selling point is safety. I understand that
the previous sentence is a bit fallacious, since such panics indicate
bugs in the code that would likely go unnoticed in C (which is arguably
worse). But perception matters, and such crashes can be damaging to the
reputation of the project.

In user-space, crates like `no_panic` can provide a compile-time
guarantee that a given function cannot panic. I don't know how that
would translate to the kernel, but ideally we could have some support
from tooling (compiler and/or LSP?) to warn us of sites introduced in
the code. After all, since the compiler inserts these panic sites, it
should also be able to tell us where they are, allowing us to evaluate
(and hopefully remove) them before the code ships to users. Most of them
could then be eliminated by constraining inputs or using checked
variants.

I am not suggesting we should mandate that ALL Rust kernel code be
proven panic-free at compile time, however since I started writing
kernel code in Rust, I've often wished I had a simple way to check
whether my carefully-crafted function processing user-space data really
*is* panic-free.
> As for discussing no-panic, sure!
Writing a uC topic proposal for Plumbers right now. :)

Nouveau - Sep 2025 - Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)