Alexandre Courbot
2025-Aug-27 08:47 UTC
[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header
On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote: <snip>>> + /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of >> + /// the firmware image. >> + fn data(&self) -> Option<&[u8]> { >> + let fw_start = self.hdr.data_offset as usize; >> + let fw_size = self.hdr.data_size as usize; >> + >> + self.fw.get(fw_start..fw_start + fw_size) > > This worries me a bit, because we never checked that these bounds > are reasonable: within the range of the firmware, and not overflowing > (.checked_add() for example), that sort of thing. > > Thoughts?`get` returns `None` if the requested slice is out of bounds, so there should be no risk of panicking here. However, `fw_start + fw_size` can panic in debug configuration if it overflows. In a release build I believe it will just happily wrap, and `get` should consequently return `None` at the invalid range... Although we can also get unlucky and produce a valid, yet incorrect, one. This is actually something I've been thinking about while writing this series and could not really decide upon: how to deal with operands and functions in Rust that can potentially panic. Using `checked` operands everywhere is a bit tedious, and even with great care there is no way to guarantee that no panic occurs in a given function. Panics are a big no-no in the kernel, yet I don't feel like we have the proper tools to ensure they do not happen. User-space has some crates like `no_panic`, but even these feel more like hacks than anything else. Something at the compiler level would be nice. Maybe that would be a good discussion topic for the Plumber Microconference?
John Hubbard
2025-Aug-27 21:50 UTC
[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header
On 8/27/25 1:47 AM, Alexandre Courbot wrote:> On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote: > <snip> >>> + /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of >>> + /// the firmware image. >>> + fn data(&self) -> Option<&[u8]> { >>> + let fw_start = self.hdr.data_offset as usize; >>> + let fw_size = self.hdr.data_size as usize; >>> + >>> + self.fw.get(fw_start..fw_start + fw_size) >> >> This worries me a bit, because we never checked that these bounds >> are reasonable: within the range of the firmware, and not overflowing >> (.checked_add() for example), that sort of thing. >> >> Thoughts? > > `get` returns `None` if the requested slice is out of bounds, so there > should be no risk of panicking here.I was wondering about the bounds themselves, though. Couldn't they be wrong? (Do we care?)> > However, `fw_start + fw_size` can panic in debug configuration if it > overflows. In a release build I believe it will just happily wrap, and > `get` should consequently return `None` at the invalid range... Although > we can also get unlucky and produce a valid, yet incorrect, one. > > This is actually something I've been thinking about while writing this > series and could not really decide upon: how to deal with operands and > functions in Rust that can potentially panic. Using `checked` operands > everywhere is a bit tedious, and even with great care there is no way to > guarantee that no panic occurs in a given function.Yes, .checked_add() all over the place is just awful, would like to avoid that for sure.> > Panics are a big no-no in the kernel, yet I don't feel like we have the > proper tools to ensure they do not happen. > > User-space has some crates like `no_panic`, but even these feel more > like hacks than anything else. Something at the compiler level would be > nice. > > Maybe that would be a good discussion topic for the Plumber > Microconference?Yes. And maybe even for Kangrejos. thanks, -- John Hubbard
Miguel Ojeda
2025-Aug-28 11:26 UTC
[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header
On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at nvidia.com> wrote:> > However, `fw_start + fw_size` can panic in debug configuration if it > overflows. In a release build I believe it will just happily wrap, andIn the kernel, it is a panic in the default configuration, not just a debug one. We have debug assertions too -- and those are disabled by default, but they are separate from the overflow checking, which is the one enabled by default. So, any use of those operators is limited to cases where one knows, somehow, that it will not overflow. And e.g. user-controlled inputs cannot use them at all. So, conceptually, something like this: - Static assert if the compiler knows it cannot fail. - Build assert if the optimizer knows it cannot fail. - Unfallible (like the possibly panicking operators) if the developer knows it cannot fail. - Fallible/wrapping/saturating/... if the developer isn't sure or it simply cannot be known until runtime. User-derived inputs must use this option (or rarely the unsafe one). - Unsafe if the developer knows it cannot fail and the other options are not acceptable for some reason. Ideally paired with a debug assertion (the compiler adds these already for many unsafe preconditions). In the past I requested upstream Rust a way to have a "third mode" ("report and continue") for the operators so that it would wrap (like the non-panicking mode) but allowing us to add a customization point so that we can e.g. `WARN_ON_ONCE`. That would be ideal, because it is a nice middle ground that is not as punishing if a developer gets it wrong, especially for "normal users", where panics typically mean lost reports etc. And other kernel users, like cloud operators, can keep the panicking mode. As for discussing no-panic, sure! Cheers, Miguel
Alexandre Courbot
2025-Sep-10 05:44 UTC
Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)
Hi Miguel, sorry for the delay in replying! On Thu Aug 28, 2025 at 8:26 PM JST, Miguel Ojeda wrote:> On Wed, Aug 27, 2025 at 10:47?AM Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> However, `fw_start + fw_size` can panic in debug configuration if it >> overflows. In a release build I believe it will just happily wrap, and > > In the kernel, it is a panic in the default configuration, not just a debug one. > > We have debug assertions too -- and those are disabled by default, but > they are separate from the overflow checking, which is the one enabled > by default. > > So, any use of those operators is limited to cases where one knows, > somehow, that it will not overflow. And e.g. user-controlled inputs > cannot use them at all. > > So, conceptually, something like this: > > - Static assert if the compiler knows it cannot fail. > - Build assert if the optimizer knows it cannot fail. > - Unfallible (like the possibly panicking operators) if the > developer knows it cannot fail. > - Fallible/wrapping/saturating/... if the developer isn't sure or it > simply cannot be known until runtime. User-derived inputs must use > this option (or rarely the unsafe one). > - Unsafe if the developer knows it cannot fail and the other options > are not acceptable for some reason. Ideally paired with a debug > assertion (the compiler adds these already for many unsafe > preconditions). > > In the past I requested upstream Rust a way to have a "third mode" > ("report and continue") for the operators so that it would wrap (like > the non-panicking mode) but allowing us to add a customization point > so that we can e.g. `WARN_ON_ONCE`.That would be nice, but also wouldn't cover all the cases where implicit panics can happen, like out-of-bounds slice accesses - we can't have a "report-and-continue" mode for these. And that's really the elephant in the room IMHO: such panic sites can be introduced implicitly, without the programmer realizing it, potentially resulting in more runtime panics for Rust modules than one might expect from a language whose main selling point is safety. I understand that the previous sentence is a bit fallacious, since such panics indicate bugs in the code that would likely go unnoticed in C (which is arguably worse). But perception matters, and such crashes can be damaging to the reputation of the project. In user-space, crates like `no_panic` can provide a compile-time guarantee that a given function cannot panic. I don't know how that would translate to the kernel, but ideally we could have some support from tooling (compiler and/or LSP?) to warn us of sites introduced in the code. After all, since the compiler inserts these panic sites, it should also be able to tell us where they are, allowing us to evaluate (and hopefully remove) them before the code ships to users. Most of them could then be eliminated by constraining inputs or using checked variants. I am not suggesting we should mandate that ALL Rust kernel code be proven panic-free at compile time, however since I started writing kernel code in Rust, I've often wished I had a simple way to check whether my carefully-crafted function processing user-space data really *is* panic-free.> As for discussing no-panic, sure!Writing a uC topic proposal for Plumbers right now. :)