thr3ads.net - Nouveau - Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header) [Sep 2025]

If this information is useful, please help other people find it:
Share via:

Alexandre Courbot

2025-Sep-10 05:44 UTC

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Hi Miguel, sorry for the delay in replying!

On Thu Aug 28, 2025 at 8:26 PM JST, Miguel Ojeda wrote:> On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at
nvidia.com> wrote:
>>
>> However, `fw_start + fw_size` can panic in debug configuration if it
>> overflows. In a release build I believe it will just happily wrap, and
>
> In the kernel, it is a panic in the default configuration, not just a debug
one.
>
> We have debug assertions too -- and those are disabled by default, but
> they are separate from the overflow checking, which is the one enabled
> by default.
>
> So, any use of those operators is limited to cases where one knows,
> somehow, that it will not overflow. And e.g. user-controlled inputs
> cannot use them at all.
>
> So, conceptually, something like this:
>
>   - Static assert if the compiler knows it cannot fail.
>   - Build assert if the optimizer knows it cannot fail.
>   - Unfallible (like the possibly panicking operators) if the
> developer knows it cannot fail.
>   - Fallible/wrapping/saturating/... if the developer isn't sure or it
> simply cannot be known until runtime. User-derived inputs must use
> this option (or rarely the unsafe one).
>   - Unsafe if the developer knows it cannot fail and the other options
> are not acceptable for some reason. Ideally paired with a debug
> assertion (the compiler adds these already for many unsafe
> preconditions).
>
> In the past I requested upstream Rust a way to have a "third
mode"
> ("report and continue") for the operators so that it would wrap
(like
> the non-panicking mode) but allowing us to add a customization point
> so that we can e.g. `WARN_ON_ONCE`.
That would be nice, but also wouldn't cover all the cases where implicit
panics can happen, like out-of-bounds slice accesses - we can't have a
"report-and-continue" mode for these.

And that's really the elephant in the room IMHO: such panic sites can be
introduced implicitly, without the programmer realizing it, potentially
resulting in more runtime panics for Rust modules than one might expect
from a language whose main selling point is safety. I understand that
the previous sentence is a bit fallacious, since such panics indicate
bugs in the code that would likely go unnoticed in C (which is arguably
worse). But perception matters, and such crashes can be damaging to the
reputation of the project.

In user-space, crates like `no_panic` can provide a compile-time
guarantee that a given function cannot panic. I don't know how that
would translate to the kernel, but ideally we could have some support
from tooling (compiler and/or LSP?) to warn us of sites introduced in
the code. After all, since the compiler inserts these panic sites, it
should also be able to tell us where they are, allowing us to evaluate
(and hopefully remove) them before the code ships to users. Most of them
could then be eliminated by constraining inputs or using checked
variants.

I am not suggesting we should mandate that ALL Rust kernel code be
proven panic-free at compile time, however since I started writing
kernel code in Rust, I've often wished I had a simple way to check
whether my carefully-crafted function processing user-space data really
*is* panic-free.
> As for discussing no-panic, sure!
Writing a uC topic proposal for Plumbers right now. :)

Miguel Ojeda

2025-Sep-10 10:01 UTC

head link

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

On Wed, Sep 10, 2025 at 7:45?AM Alexandre Courbot <acourbot at nvidia.com>
wrote:>
> That would be nice, but also wouldn't cover all the cases where
implicit
> panics can happen, like out-of-bounds slice accesses - we can't have a
> "report-and-continue" mode for these.
In principle, it could cover OOBs (even if it is a bad idea).
> But perception matters, and such crashes can be damaging to the reputation
of the project.
Yes, we are well aware -- we have had it in our wish list for upstream
Rust for a long time.

We are tackling these things as we go -- e.g. we solved the `alloc`
panics and the ball on the report-and-continue mode for overflows
started moving.

Part of Rust for Linux is about making Rust the best language for
kernel development it can be, after all, and so far upstream Rust has
been quite helpful on giving us the features we need -- we meet with
them every two weeks, please join if you have time!

(Side note: the "safety" that Rust "sells" isn't really
about avoiding
panics, although obviously it would be a nice feature to have.)
> Writing a uC topic proposal for Plumbers right now. :)
I see it there, thanks! I can briefly mention the topic in Kangrejos,
since we will have Rust representation, including from the language
team.

I don't think the discussion should focus much on "Do we need
this?"
but rather more on "What exactly do we want? Would we be OK with a
local solution? Do we need/want a global one? Would we be OK with LSP?
Would we be OK with no panics after optimizations, e.g. a link time
check? Or do we want full support in the language for guaranteed
non-panicking functions? Do we need exceptional carve-outs on such
checking for particular language constructs?" and so on. And, of
course, "Who has time to write an RFC and implement an experiment
upstream if an approach is decided".

Getting data on "in practice, how much of an issue it is on the Rust
side" would help too -- those with actual users running Rust kernel
code probably can tell us something.

What I would personally expect to happen is that, over time, we
understand better what are the worst cases we must tackle.

Cheers,
Miguel

Alexandre Courbot

2025-Sep-10 13:54 UTC

head link

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

On Wed Sep 10, 2025 at 7:00 PM JST, Miguel Ojeda wrote:> On Wed, Sep 10, 2025 at 7:45?AM Alexandre Courbot <acourbot at
nvidia.com> wrote:
>>
>> That would be nice, but also wouldn't cover all the cases where
implicit
>> panics can happen, like out-of-bounds slice accesses - we can't
have a
>> "report-and-continue" mode for these.
>
> In principle, it could cover OOBs (even if it is a bad idea).
>
>> But perception matters, and such crashes can be damaging to the
reputation of the project.
>
> Yes, we are well aware -- we have had it in our wish list for upstream
> Rust for a long time.
>
> We are tackling these things as we go -- e.g. we solved the `alloc`
> panics and the ball on the report-and-continue mode for overflows
> started moving.
>
> Part of Rust for Linux is about making Rust the best language for
> kernel development it can be, after all, and so far upstream Rust has
> been quite helpful on giving us the features we need -- we meet with
> them every two weeks, please join if you have time!
>
> (Side note: the "safety" that Rust "sells" isn't
really about avoiding
> panics, although obviously it would be a nice feature to have.)
That's right, these panics are actually the last line of safety to
prevent a program for doing something damaging. It is just that the
consequences in a regular program are not as heavy as in the kernel.

The only two options are either allowing user-space to crash the kernel
through a module with a missing bound check, or letting it tamper with
data it is not supposed to access. While the first option is terrible,
the second one is unacceptable - so at the end of the day what we likely
want is to keep the panic behavior and limit these occurrences as much
as possible through information to the programmer.

Build errors on such panic sites insertions, with the option to relax
the rule locally if a justifying SAFETY comment is provided? And as you
said, what do we do if a panic can be removed through a particular
optimization - does it become mandatory to build the kernel? Is it
applicable to all architectures and (in the future) all supported
compilers?

I suspect it will take more than Plumbers to get to the bottom of this.
:)
>
>> Writing a uC topic proposal for Plumbers right now. :)
>
> I see it there, thanks! I can briefly mention the topic in Kangrejos,
> since we will have Rust representation, including from the language
> team.
>
> I don't think the discussion should focus much on "Do we need
this?"
> but rather more on "What exactly do we want? Would we be OK with a
> local solution? Do we need/want a global one? Would we be OK with LSP?
> Would we be OK with no panics after optimizations, e.g. a link time
> check? Or do we want full support in the language for guaranteed
> non-panicking functions? Do we need exceptional carve-outs on such
> checking for particular language constructs?" and so on. And, of
> course, "Who has time to write an RFC and implement an experiment
> upstream if an approach is decided".
>
> Getting data on "in practice, how much of an issue it is on the Rust
> side" would help too -- those with actual users running Rust kernel
> code probably can tell us something.
>
> What I would personally expect to happen is that, over time, we
> understand better what are the worst cases we must tackle.
Thanks, these are great directions to explore. I see that some thinking
has already been done on this, do we have a bug or tracking issue so I
can catch up with the discussions that have already taken place?

Nouveau - Sep 2025 - Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)