Alexandre Courbot
2025-Sep-10 05:44 UTC
Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)
Hi Miguel, sorry for the delay in replying! On Thu Aug 28, 2025 at 8:26 PM JST, Miguel Ojeda wrote:> On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> However, `fw_start + fw_size` can panic in debug configuration if it >> overflows. In a release build I believe it will just happily wrap, and > > In the kernel, it is a panic in the default configuration, not just a debug one. > > We have debug assertions too -- and those are disabled by default, but > they are separate from the overflow checking, which is the one enabled > by default. > > So, any use of those operators is limited to cases where one knows, > somehow, that it will not overflow. And e.g. user-controlled inputs > cannot use them at all. > > So, conceptually, something like this: > > - Static assert if the compiler knows it cannot fail. > - Build assert if the optimizer knows it cannot fail. > - Unfallible (like the possibly panicking operators) if the > developer knows it cannot fail. > - Fallible/wrapping/saturating/... if the developer isn't sure or it > simply cannot be known until runtime. User-derived inputs must use > this option (or rarely the unsafe one). > - Unsafe if the developer knows it cannot fail and the other options > are not acceptable for some reason. Ideally paired with a debug > assertion (the compiler adds these already for many unsafe > preconditions). > > In the past I requested upstream Rust a way to have a "third mode" > ("report and continue") for the operators so that it would wrap (like > the non-panicking mode) but allowing us to add a customization point > so that we can e.g. `WARN_ON_ONCE`.That would be nice, but also wouldn't cover all the cases where implicit panics can happen, like out-of-bounds slice accesses - we can't have a "report-and-continue" mode for these. And that's really the elephant in the room IMHO: such panic sites can be introduced implicitly, without the programmer realizing it, potentially resulting in more runtime panics for Rust modules than one might expect from a language whose main selling point is safety. I understand that the previous sentence is a bit fallacious, since such panics indicate bugs in the code that would likely go unnoticed in C (which is arguably worse). But perception matters, and such crashes can be damaging to the reputation of the project. In user-space, crates like `no_panic` can provide a compile-time guarantee that a given function cannot panic. I don't know how that would translate to the kernel, but ideally we could have some support from tooling (compiler and/or LSP?) to warn us of sites introduced in the code. After all, since the compiler inserts these panic sites, it should also be able to tell us where they are, allowing us to evaluate (and hopefully remove) them before the code ships to users. Most of them could then be eliminated by constraining inputs or using checked variants. I am not suggesting we should mandate that ALL Rust kernel code be proven panic-free at compile time, however since I started writing kernel code in Rust, I've often wished I had a simple way to check whether my carefully-crafted function processing user-space data really *is* panic-free.> As for discussing no-panic, sure!Writing a uC topic proposal for Plumbers right now. :)
Miguel Ojeda
2025-Sep-10 10:01 UTC
Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)
On Wed, Sep 10, 2025 at 7:45?AM Alexandre Courbot <acourbot at nvidia.com> wrote:> > That would be nice, but also wouldn't cover all the cases where implicit > panics can happen, like out-of-bounds slice accesses - we can't have a > "report-and-continue" mode for these.In principle, it could cover OOBs (even if it is a bad idea).> But perception matters, and such crashes can be damaging to the reputation of the project.Yes, we are well aware -- we have had it in our wish list for upstream Rust for a long time. We are tackling these things as we go -- e.g. we solved the `alloc` panics and the ball on the report-and-continue mode for overflows started moving. Part of Rust for Linux is about making Rust the best language for kernel development it can be, after all, and so far upstream Rust has been quite helpful on giving us the features we need -- we meet with them every two weeks, please join if you have time! (Side note: the "safety" that Rust "sells" isn't really about avoiding panics, although obviously it would be a nice feature to have.)> Writing a uC topic proposal for Plumbers right now. :)I see it there, thanks! I can briefly mention the topic in Kangrejos, since we will have Rust representation, including from the language team. I don't think the discussion should focus much on "Do we need this?" but rather more on "What exactly do we want? Would we be OK with a local solution? Do we need/want a global one? Would we be OK with LSP? Would we be OK with no panics after optimizations, e.g. a link time check? Or do we want full support in the language for guaranteed non-panicking functions? Do we need exceptional carve-outs on such checking for particular language constructs?" and so on. And, of course, "Who has time to write an RFC and implement an experiment upstream if an approach is decided". Getting data on "in practice, how much of an issue it is on the Rust side" would help too -- those with actual users running Rust kernel code probably can tell us something. What I would personally expect to happen is that, over time, we understand better what are the worst cases we must tackle. Cheers, Miguel
Alexandre Courbot
2025-Sep-10 13:54 UTC
Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)
On Wed Sep 10, 2025 at 7:00 PM JST, Miguel Ojeda wrote:> On Wed, Sep 10, 2025 at 7:45?AM Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> That would be nice, but also wouldn't cover all the cases where implicit >> panics can happen, like out-of-bounds slice accesses - we can't have a >> "report-and-continue" mode for these. > > In principle, it could cover OOBs (even if it is a bad idea). > >> But perception matters, and such crashes can be damaging to the reputation of the project. > > Yes, we are well aware -- we have had it in our wish list for upstream > Rust for a long time. > > We are tackling these things as we go -- e.g. we solved the `alloc` > panics and the ball on the report-and-continue mode for overflows > started moving. > > Part of Rust for Linux is about making Rust the best language for > kernel development it can be, after all, and so far upstream Rust has > been quite helpful on giving us the features we need -- we meet with > them every two weeks, please join if you have time! > > (Side note: the "safety" that Rust "sells" isn't really about avoiding > panics, although obviously it would be a nice feature to have.)That's right, these panics are actually the last line of safety to prevent a program for doing something damaging. It is just that the consequences in a regular program are not as heavy as in the kernel. The only two options are either allowing user-space to crash the kernel through a module with a missing bound check, or letting it tamper with data it is not supposed to access. While the first option is terrible, the second one is unacceptable - so at the end of the day what we likely want is to keep the panic behavior and limit these occurrences as much as possible through information to the programmer. Build errors on such panic sites insertions, with the option to relax the rule locally if a justifying SAFETY comment is provided? And as you said, what do we do if a panic can be removed through a particular optimization - does it become mandatory to build the kernel? Is it applicable to all architectures and (in the future) all supported compilers? I suspect it will take more than Plumbers to get to the bottom of this. :)> >> Writing a uC topic proposal for Plumbers right now. :) > > I see it there, thanks! I can briefly mention the topic in Kangrejos, > since we will have Rust representation, including from the language > team. > > I don't think the discussion should focus much on "Do we need this?" > but rather more on "What exactly do we want? Would we be OK with a > local solution? Do we need/want a global one? Would we be OK with LSP? > Would we be OK with no panics after optimizations, e.g. a link time > check? Or do we want full support in the language for guaranteed > non-panicking functions? Do we need exceptional carve-outs on such > checking for particular language constructs?" and so on. And, of > course, "Who has time to write an RFC and implement an experiment > upstream if an approach is decided". > > Getting data on "in practice, how much of an issue it is on the Rust > side" would help too -- those with actual users running Rust kernel > code probably can tell us something. > > What I would personally expect to happen is that, over time, we > understand better what are the worst cases we must tackle.Thanks, these are great directions to explore. I see that some thinking has already been done on this, do we have a bug or tracking issue so I can catch up with the discussions that have already taken place?