Alexandre Courbot
2025-Dec-03 11:54 UTC
[PATCH 10/11] gpu: nova-core: LibosMemoryRegionInitArgument size must be page aligned
On Tue Dec 2, 2025 at 8:25 AM JST, Timur Tabi wrote:> On Wed, 2025-11-19 at 12:36 +0900, Alexandre Courbot wrote: >> You can use the `Alignment` type here, as the rest of the code does: >> >> ??? let size = num::usize_as_u64(obj.size()) >> ??????? .align_up(Alignment::new::<GSP_PAGE_SIZE>())?; >> >> Now `align_up` returns an error in case of overflow, that we will need >> to pass down to the caller by changing the return type of `new`. It is a >> bit annoying, but better than the behavior of `next_mutiple_of` in such >> a case, which is to panic. :) > > I see your point, but these are u64s that we're talking about. The only way next_mutiple_of() can > panic is if obj.size() is greater than 0xFFFFFFFFFFFFF000, which is not possible. > > I would say in this case, a panic is preferable to a convoluted error return that will never be > exercised, because failure here indicates a coding error, not an input error.The input data is a usize, so technically we could get an input that triggers that error. I know it's a very edge case, and clearly indicates a bug, but the general rule is: don't panic the kernel. And in Rust, if possible, don't even let me compiler insert panic-handling code. If you don't want to change the return type of the method, then maybe use `unwrap_or` and `inspect_err` to print an error before returning e.g. `0`. But others have already thought "naah, that's never gonna happen" and got burnt very publicly [1], so let's learn from that. :P [1] https://blog.cloudflare.com/18-november-2025-outage/
Alice Ryhl
2025-Dec-03 12:03 UTC
[PATCH 10/11] gpu: nova-core: LibosMemoryRegionInitArgument size must be page aligned
On Wed, Dec 03, 2025 at 08:54:31PM +0900, Alexandre Courbot wrote:> On Tue Dec 2, 2025 at 8:25 AM JST, Timur Tabi wrote: > > On Wed, 2025-11-19 at 12:36 +0900, Alexandre Courbot wrote: > >> You can use the `Alignment` type here, as the rest of the code does: > >> > >> ??? let size = num::usize_as_u64(obj.size()) > >> ??????? .align_up(Alignment::new::<GSP_PAGE_SIZE>())?; > >> > >> Now `align_up` returns an error in case of overflow, that we will need > >> to pass down to the caller by changing the return type of `new`. It is a > >> bit annoying, but better than the behavior of `next_mutiple_of` in such > >> a case, which is to panic. :) > > > > I see your point, but these are u64s that we're talking about. The only way next_mutiple_of() can > > panic is if obj.size() is greater than 0xFFFFFFFFFFFFF000, which is not possible. > > > > I would say in this case, a panic is preferable to a convoluted error return that will never be > > exercised, because failure here indicates a coding error, not an input error. > > The input data is a usize, so technically we could get an input that > triggers that error. > > I know it's a very edge case, and clearly indicates a bug, but the > general rule is: don't panic the kernel. And in Rust, if possible, don't > even let me compiler insert panic-handling code. If you don't want to > change the return type of the method, then maybe use `unwrap_or` and > `inspect_err` to print an error before returning e.g. `0`. > > But others have already thought "naah, that's never gonna happen" and > got burnt very publicly [1], so let's learn from that. :P > > [1] https://blog.cloudflare.com/18-november-2025-outage/For what it's worth, my friend at Cloudflare tells me they would have failed in the same way had they used ? instead of unwrap. Alice
Timur Tabi
2025-Dec-03 18:31 UTC
[PATCH 10/11] gpu: nova-core: LibosMemoryRegionInitArgument size must be page aligned
On Wed, 2025-12-03 at 20:54 +0900, Alexandre Courbot wrote:> On Tue Dec 2, 2025 at 8:25 AM JST, Timur Tabi wrote: > > On Wed, 2025-11-19 at 12:36 +0900, Alexandre Courbot wrote: > > > You can use the `Alignment` type here, as the rest of the code does: > > > > > > ??? let size = num::usize_as_u64(obj.size()) > > > ??????? .align_up(Alignment::new::<GSP_PAGE_SIZE>())?; > > > > > > Now `align_up` returns an error in case of overflow, that we will need > > > to pass down to the caller by changing the return type of `new`. It is a > > > bit annoying, but better than the behavior of `next_mutiple_of` in such > > > a case, which is to panic. :) > > > > I see your point, but these are u64s that we're talking about.? The only way next_mutiple_of() > > can > > panic is if obj.size() is greater than 0xFFFFFFFFFFFFF000, which is not possible.? > > > > I would say in this case, a panic is preferable to a convoluted error return that will never be > > exercised, because failure here indicates a coding error, not an input error. > > The input data is a usize, so technically we could get an input that > triggers that error.Actually, I still say it's not possible. Say I change the code to this, so that .next_multiple_of is called on a u64 instead of a usize: let size = num::usize_as_u64(obj.size()).next_multiple_of(GSP_PAGE_SIZE); Again, the only way this can fail is if the allocated object being passed in is almost 16 exabytes in size, which is physically impossible.> I know it's a very edge case, and clearly indicates a bug, but the > general rule is: don't panic the kernel. And in Rust, if possible, don't > even let me compiler insert panic-handling code. If you don't want to > change the return type of the method, then maybe use `unwrap_or` and > `inspect_err` to print an error before returning e.g. `0`.How about this: if .next_multiple_of(GSP_PAGE_SIZE) does return an error, I'll just assign size to obj.size() as-is? After all, at about 16GB/second for DMA, it will take about 31 years to DMA all that memory, so I will have long since retired before that bug shows up.
Miguel Ojeda
2025-Dec-03 18:34 UTC
[PATCH 10/11] gpu: nova-core: LibosMemoryRegionInitArgument size must be page aligned
On Wed, Dec 3, 2025 at 12:54?PM Alexandre Courbot <acourbot at nvidia.com> wrote:> > I know it's a very edge case, and clearly indicates a bug, but the > general rule is: don't panic the kernel. And in Rust, if possible, don't > even let me compiler insert panic-handling code. If you don't want to > change the return type of the method, then maybe use `unwrap_or` and > `inspect_err` to print an error before returning e.g. `0`.In general, indeed, but if it is truly something that cannot ever happen (as in one can prove it due to how the local code looks like), then adding error paths for it isn't good, especially if they change a signature. In any case, if it is decided to continue execution (either with an error or with a "default" value etc.), then please use EB, i.e. adding `debug_assert!` and possibly `pr_warn!` (or perhaps `pr_warn_once!` once available). Thanks!> But others have already thought "naah, that's never gonna happen" and > got burnt very publicly [1], so let's learn from that. :PThat sounded to me like a higher-level design issue, not so much about `unwrap()`. Cheers, Miguel
Timur Tabi
2025-Dec-03 19:17 UTC
[PATCH 10/11] gpu: nova-core: LibosMemoryRegionInitArgument size must be page aligned
On Wed, 2025-12-03 at 19:34 +0100, Miguel Ojeda wrote:> In any case, if it is decided to continue execution (either with an > error or with a "default" value etc.), then please use EB, i.e. adding > `debug_assert!` and possibly `pr_warn!` (or perhaps `pr_warn_once!` > once available).So this: let size = match num::usize_as_u64(obj.size()).align_up(GSP_PAGE_ALIGNMENT) { Some(v) => v, None => { debug_assert!(false, "Invalid size {}", obj.size()); pr_warn!("Invalid size {}", obj.size()); num::usize_as_u64(obj.size()) } }; Isn't this absurdly excessive? We cannot ever test this error path, because it's physically impossible for obj.size() to return a value that will cause an error.