thr3ads.net - Nouveau - [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling [Sep 2025]

If this information is useful, please help other people find it:
Share via:

Alistair Popple

2025-Sep-29 06:19 UTC

[PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

On 2025-09-26 at 14:37 +1000, Alexandre Courbot <acourbot at nvidia.com>
wrote...> Hi Alistair,
> 
> On Mon Sep 22, 2025 at 8:30 PM JST, Alistair Popple wrote:
> <snip>
> > diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs
b/drivers/gpu/nova-core/gsp/cmdq.rs
> > new file mode 100644
> > index 000000000000..a9ba1a4c73d8
> > --- /dev/null
> > +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> > @@ -0,0 +1,423 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +use core::mem::offset_of;
> > +use core::sync::atomic::fence;
> > +use core::sync::atomic::Ordering;
> > +
> > +use kernel::alloc::flags::GFP_KERNEL;
> > +use kernel::device;
> > +use kernel::dma::{CoherentAllocation, DmaAddress};
> > +use kernel::prelude::*;
> > +use kernel::sync::aref::ARef;
> > +use kernel::time::Delta;
> > +use kernel::transmute::{AsBytes, FromBytes};
> > +use kernel::{dma_read, dma_write};
> > +
> > +use super::fw::{
> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> > +    NV_VGPU_MSG_EVENT_POST_EVENT, NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA, NV_VGPU_MSG_FUNCTION_FREE,
> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
NV_VGPU_MSG_FUNCTION_LOG,
> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY, NV_VGPU_MSG_FUNCTION_NOP,
> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> > +};
> 
> As I mentioned in v1, let's turn these into two enums to avoid this big
> import and make sure we can never mix up the values.
> 
> This can be something like this in `fw.rs`:
> 
> #[repr(u32)]
> pub(crate) enum VgpuMsgEvent {
>     GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>     GspLockDownNotice = bindings::NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
>     ...
> }
> 
> Then you just need to import `VgpuMsgEvent`, and can use that type where
> appropriate, e.g. for the `FUNCTION` associated type of
> `GspMessageFromGsp`.
Yep, makes sense. I'd just overlooked this one.
> > +use crate::driver::Bar0;
> > +use crate::gsp::create_pte_array;
> > +use crate::gsp::fw::{GspMsgElement, MsgqRxHeader, MsgqTxHeader};
> > +use crate::gsp::{GSP_PAGE_SHIFT, GSP_PAGE_SIZE};
> > +use crate::regs::NV_PGSP_QUEUE_HEAD;
> > +use crate::sbuffer::SBuffer;
> > +use crate::util::wait_on;
> > +
> > +pub(crate) trait GspCommandToGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: u32;
> > +}
> > +
> > +pub(crate) trait GspMessageFromGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: u32;
> > +}
> 
> Do we need to repeat `Gsp` in these trait names? `CommandToGsp` and
> `MessageFromGsp` should be fine.
Nah, and actually they're quite confusing really. Gsp sending a message from
the
Gsp. Will fix them up.
 > (I am also guilty of prefixing type names to avoid name collisions - a
> habit inherited from years of C programming, but since we are already in
> the `gsp` module we can forgo this habit as users can just import the
> module and refer to the type as `gsp::MessageFromGsp` if there is any
> ambiguity).
Indeed. Old habits die hard and all that.
> > +
> > +/// Number of GSP pages making the Msgq.
> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
> > +
> > +#[repr(C, align(0x1000))]
> > +#[derive(Debug)]
> > +struct MsgqData {
> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
> > +}
> > +
> > +// Annoyingly there is no real equivalent of #define so we're
forced to use a
> > +// literal to specify the alignment above. So check that against the
actual GSP
> > +// page size here.
> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
> > +
> > +// There is no struct defined for this in the open-gpu-kernel-source
headers.
> > +// Instead it is defined by code in GspMsgQueuesInit().
> > +#[repr(C)]
> > +struct Msgq {
> > +    tx: MsgqTxHeader,
> > +    rx: MsgqRxHeader,
> > +    msgq: MsgqData,
> > +}
> > +
> > +#[repr(C)]
> > +struct GspMem {
> > +    ptes: [u8; GSP_PAGE_SIZE],
> > +    cpuq: Msgq,
> > +    gspq: Msgq,
> > +}
> > +
> > +// SAFETY: These structs don't meet the no-padding requirements
of AsBytes but
> > +// that is not a problem because they are not used outside the
kernel.
> > +unsafe impl AsBytes for GspMem {}
> > +
> > +// SAFETY: These structs don't meet the no-padding requirements
of FromBytes but
> > +// that is not a problem because they are not used outside the
kernel.
> > +unsafe impl FromBytes for GspMem {}
> 
> These ARE used outside the kernel, since they are shared with the GSP.
> :) I'd say the reason these are safe is just because we satisfy the
> requirements of AsBytes and FromBytes:
Yes, and they're actually old safety comments that I missed updating to be
the
same as the rest.
 > - No initialized bytes,
> - No interior mutability,
> - All bytes patterns are valid (for some generous definition of
>   "valid" limited to not triggering UB).
I was under the impression that that's all `unsafe` is really marking - code
the compiler can't prove won't trigger UB. So I'd assume that's
all we'd have to
prove in safety comments anyway.
> > +
> > +/// `GspMem` struct that is shared with the GSP.
> > +struct DmaGspMem(CoherentAllocation<GspMem>);
> > +
> > +impl DmaGspMem {
> > +    fn new(dev: &device::Device<device::Bound>) ->
Result<Self> {
> > +        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
> > +        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
> > +
> > +        let mut gsp_mem > > +           
CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL |
__GFP_ZERO)?;
> > +        create_pte_array(&mut gsp_mem, 0);
> > +        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE,
RX_HDR_OFF))?;
> > +        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
> > +
> > +        Ok(Self(gsp_mem))
> > +    }
> > +
> > +    #[expect(unused)]
> > +    fn dma_handle(&self) -> DmaAddress {
> > +        self.0.dma_handle()
> > +    }
> > +
> > +    /// # Safety
> > +    ///
> > +    /// The caller must ensure that the device doesn't access the
parts of the [`GspMem`] it works
> > +    /// with.
> > +    unsafe fn access_mut(&mut self) -> &mut GspMem {
> > +        // SAFETY:
> > +        // - The [`CoherentAllocation`] contains exactly one object.
> > +        // - Per the safety statement of the function, no concurrent
access wil be performed.
> > +        &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0]
> > +    }
> > +
> > +    /// # Safety
> > +    ///
> > +    /// The caller must ensure that the device doesn't access the
parts of the [`GspMem`] it works
> > +    /// with.
> > +    unsafe fn access(&self) -> &GspMem {
> > +        // SAFETY:
> > +        // - The [`CoherentAllocation`] contains exactly one object.
> > +        // - Per the safety statement of the function, no concurrent
access wil be performed.
> > +        &unsafe { self.0.as_slice(0, 1) }.unwrap()[0]
> > +    }
> 
> Since these two methods are only called once each from
> `driver_write/read_area`, let's inline them there and reduce our
> `unsafe` keyword counter.
Sure.
> > +
> > +    fn driver_write_area(&mut self) -> (&mut [[u8;
GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> > +        let tx = self.cpu_write_ptr() as usize;
> > +        let rx = self.gsp_read_ptr() as usize;
> > +
> > +        // SAFETY: we will only access the driver-owned part of the
shared memory.
> > +        let gsp_mem = unsafe { self.access_mut() };
> > +        let (before_tx, after_tx) =
gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> > +
> > +        if rx <= tx {
> > +            // The area from `tx` up to the end of the ring, and from
the beginning of the ring up
> > +            // to `rx`, minus one unit, belongs to the driver.
> > +            if rx == 0 {
> > +                let last = after_tx.len() - 1;
> > +                (&mut after_tx[..last], &mut before_tx[0..0])
> > +            } else {
> > +                (after_tx, &mut before_tx[..rx])
> > +            }
> > +        } else {
> > +            // The area from `tx` to `rx`, minus one unit, belongs to
the driver.
> > +            (after_tx.split_at_mut(rx - tx).0, &mut
before_tx[0..0])
> > +        }
> > +    }
> > +
> > +    fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]],
&[[u8; GSP_PAGE_SIZE]]) {
> > +        let tx = self.gsp_write_ptr() as usize;
> > +        let rx = self.cpu_read_ptr() as usize;
> > +
> > +        // SAFETY: we will only access the driver-owned part of the
shared memory.
> > +        let gsp_mem = unsafe { self.access() };
> > +        let (before_rx, after_rx) =
gsp_mem.gspq.msgq.data.split_at(rx);
> > +
> > +        if tx <= rx {
> > +            // The area from `rx` up to the end of the ring, and from
the beginning of the ring up
> > +            // to `tx`, minus one unit, belongs to the driver.
> > +            if tx == 0 {
> > +                let last = after_rx.len() - 1;
> > +                (&after_rx[..last], &before_rx[0..0])
> > +            } else {
> > +                (after_rx, &before_rx[..tx])
> > +            }
> > +        } else {
> > +            // The area from `rx` to `tx`, minus one unit, belongs to
the driver.
> > +            (after_rx.split_at(tx - rx).0, &before_rx[0..0])
> > +        }
> > +    }
> 
> As we discussed offline, this method is incorrect (amongst other things,
> it returns the whole ring buffer when it should be empty). Posting my
> suggested diff for the record:
Yep. I'm not sure how it was working on my setup but not yours but clearly
wrong.
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -152,17 +152,12 @@ unsafe fn access(&self) -> &GspMem {
>          let gsp_mem = unsafe { self.access() };
>          let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
> 
> -        if tx <= rx {
> +        if tx < rx {
>              // The area from `rx` up to the end of the ring, and from the
beginning of the ring up
> -            // to `tx`, minus one unit, belongs to the driver.
> -            if tx == 0 {
> -                let last = after_rx.len() - 1;
> -                (&after_rx[..last], &before_rx[0..0])
> -            } else {
> -                (after_rx, &before_rx[..tx])
> -            }
> +            // to `tx` belongs to the driver.
> +            (after_rx, &before_rx[0..tx])
>          } else {
> -            // The area from `rx` to `tx`, minus one unit, belongs to the
driver.
> +            // The area from `rx` to `tx` belongs to the driver.
>              (after_rx.split_at(tx - rx).0, &before_rx[0..0])
>          }
>      }
Thanks, similar to what I had. Although I also had this to avoid the panic later
when no message is present:

@@ -321,6 +315,10 @@ pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp,
R>(
     ) -> Result<R> {
         let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
             let driver_area = self.gsp_mem.driver_read_area();
+            if driver_area.0.is_empty() {
+                return None;
+            }
+
             // TODO: find an alternative to as_flattened()
             #[allow(clippy::incompatible_msrv)]
             let (msg_header_slice, slice_1) = driver_area
> > +
> > +    fn gsp_write_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].gspq.tx.writePtr).unwrap() %
MSGQ_NUM_PAGES
> > +    }
> > +
> > +    fn gsp_read_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].gspq.rx.readPtr).unwrap() %
MSGQ_NUM_PAGES
> > +    }
> > +
> > +    fn cpu_read_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].cpuq.rx.readPtr).unwrap() %
MSGQ_NUM_PAGES
> > +    }
> 
> Maybe add one line of documentation for these. Generally I think we want
> a bit more high-level documentation explaining how the ring buffers are
> operating.
Yeah, good idea. Something I've been meaning to do (document the ring buffer
a
bit better).
> > +
> > +    /// Inform the GSP that it can send `elem_count` new pages into
the message queue.
> > +    fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> > +        let gsp_mem = &self.0;
> > +        let rptr = self.cpu_read_ptr().wrapping_add(elem_count) %
MSGQ_NUM_PAGES;
> > +
> > +        // Ensure read pointer is properly ordered
> > +        fence(Ordering::SeqCst);
> > +
> > +        dma_write!(gsp_mem[0].cpuq.rx.readPtr = rptr).unwrap();
> > +    }
> > +
> > +    fn cpu_write_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].cpuq.tx.writePtr).unwrap() %
MSGQ_NUM_PAGES
> > +    }
> > +
> > +    /// Inform the GSP that it can process `elem_count` new pages
from the command queue.
> > +    fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> > +        let gsp_mem = &self.0;
> > +        let wptr = self.cpu_write_ptr().wrapping_add(elem_count)
& MSGQ_NUM_PAGES;
> > +        dma_write!(gsp_mem[0].cpuq.tx.writePtr = wptr).unwrap();
> > +
> > +        // Ensure all command data is visible before triggering the
GSP read
> > +        fence(Ordering::SeqCst);
> > +    }
> > +}
> > +
> > +pub(crate) struct GspCmdq {
> 
> Similar to my previous comment, we can just name this `Cmdq` since we
> are already in the `gsp` module.
>
> As a general comment, let's also document our types/methods a bit more,
> explaining at least what they are for.
Ack.
> > +    dev: ARef<device::Device>,
> > +    seq: u32,
> > +    gsp_mem: DmaGspMem,
> > +    pub _nr_ptes: u32,
> > +}
> > +
> > +impl GspCmdq {
> > +    pub(crate) fn new(dev: &device::Device<device::Bound>)
-> Result<GspCmdq> {
> > +        let gsp_mem = DmaGspMem::new(dev)?;
> > +        let nr_ptes = size_of::<GspMem>() >>
GSP_PAGE_SHIFT;
> > +        build_assert!(nr_ptes * size_of::<u64>() <=
GSP_PAGE_SIZE);
> > +
> > +        Ok(GspCmdq {
> > +            dev: dev.into(),
> > +            seq: 0,
> > +            gsp_mem,
> > +            _nr_ptes: nr_ptes as u32,
> > +        })
> > +    }
> > +
> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T)
-> u32 {
> > +        let sum64 = it
> > +            .enumerate()
> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> > +            .fold(0, |acc, (rol, byte)| acc ^
u64::from(byte).rotate_left(rol));
> > +
> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> > +    }
> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> > +        &mut self,
> > +        bar: &Bar0,
> > +        payload_size: usize,
> > +        init: impl FnOnce(&mut M,
SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> 
> This works pretty well for in-place initialization.
> 
> There a couple of small drawbacks though: `M` must implement `FromBytes`
> even though we only send it, and (more serious) there is no guarantee
Yes, that detail annoyed me slightly too.
> that `init` will fully initialize the command - a forgetful caller can
> omit some of its fields, or even the whole structure, and in that case
> we will send a command with what happened to be at that position of the
> queue at that time.
Good timing as I was just looking to see if there wasn't some way of
ensuring
this happened, or at least was much more explicit than what we currently do.
> I think this is a good case for using the `Init` trait: it's like
> `PinInit`, but without the `Pin`, and it ensures that the whole argument
> is initialized. So this method would change into something like:
> 
>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>         &mut self,
>         bar: &Bar0,
>         payload_size: usize,
>         init: impl Init<M, Error>,
>         init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut
[u8], 2>>) -> Result,
> 
> This also allows us to drop the `FromBytes` requirement on
> `GspCommandToGsp`! But also requires us to use `unsafe` to call
> `Init::__init` on the pointer to the command. I think it's worth it
> though, as this removes the risk of sending partially-uninitialized
> commands.
Agree on it being worth the unsafe call, especially because it is unsafe if the
caller doesn't do what it's supposed to regardless. But how does this
contrast
with `MaybeUninit`? I was wondering if something like this would work:

    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
        &mut self,
        bar: &Bar0,
        payload_size: usize,
        init: impl FnOnce(MaybeUninit<&mut M>,
SBuffer<core::array::IntoIter<&mut [u8], 2>>) ->
Result<&M>,

Note I'm not sure we could actually make that work as I haven't tried it
yet. I
suspect I've missed something with lifetimes there :-)
> Then there is what to do with the `SBuffer`. I'd like to think a bit
> more about this, as not all commands require it, and those that do
> typically send arrays of particular types. I think we should be able to
> use the type system to have more control over this, but let's keep that
> for the next revision.
Sure. You are correct that not all (most?) commands don't need it, and
it's a
shame we don't have proper bindings for them anyway. Although in some cases
that
still wouldn't work anyway (eg. for registry keys) where it really is just a
packed variable number of variable length strings. Not sure what a Rust native
representation of that would be.
> > +    ) -> Result {
> > +        // TODO: a method that extracts the regions for a given
command?
> > +        // ... and another that reduces the region to a given number
of bytes!
> > +        let driver_area = self.gsp_mem.driver_write_area();
> > +        let free_tx_pages = driver_area.0.len() +
driver_area.1.len();
> > +
> > +        // Total size of the message, including the headers, command,
and optional payload.
> > +        let msg_size = size_of::<GspMsgElement>() +
size_of::<M>() + payload_size;
> > +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
> > +            return Err(EAGAIN);
> > +        }
> > +
> > +        let (msg_header, cmd, payload_1, payload_2) = {
> > +            // TODO: find an alternative to as_flattened_mut()
> 
> I think we can use it if we enable the "slice_flatten" feature
(stable
> since 1.80, introduced in 1.67).
Ok, will try that. Hopefully explicitly using the feature also makes clippy
happy but if not I guess we can leave the over-rides below.
> > +            #[allow(clippy::incompatible_msrv)]
> > +            let (msg_header_slice, slice_1) = driver_area
> > +                .0
> > +                .as_flattened_mut()
> > +                .split_at_mut(size_of::<GspMsgElement>());
> > +            let msg_header =
GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
> > +            let (cmd_slice, payload_1) =
slice_1.split_at_mut(size_of::<M>());
> > +            let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
> > +            // TODO: find an alternative to as_flattened_mut()
> > +            #[allow(clippy::incompatible_msrv)]
> > +            let payload_2 = driver_area.1.as_flattened_mut();
> > +            // TODO: Replace this workaround to cut the payload size.
> > +            let (payload_1, payload_2) = match
payload_size.checked_sub(payload_1.len()) {
> > +                // The payload is longer than `payload_1`, set
`payload_2` size to the difference.
> > +                Some(payload_2_len) => (payload_1, &mut
payload_2[..payload_2_len]),
> > +                // `payload_1` is longer than the payload, we need to
reduce its size.
> > +                None => (&mut payload_1[..payload_size],
payload_2),
> > +            };
> 
> We will want (either you or I) to address these TODOs for the next
> revision.
Yep, I'm happy to (along with any others that are still floating about).
> > +
> > +            (msg_header, cmd, payload_1, payload_2)
> > +        };
> > +
> > +        let sbuffer = SBuffer::new_writer([&mut payload_1[..],
&mut payload_2[..]]);
> > +        init(cmd, sbuffer)?;
> > +
> > +        *msg_header = GspMsgElement::new(self.seq,
size_of::<M>() + payload_size, M::FUNCTION);
> > +        msg_header.checkSum =
GspCmdq::calculate_checksum(SBuffer::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +            payload_1,
> > +            payload_2,
> > +        ]));
> > +
> > +        let rpc_header = &msg_header.rpc;
> > +        dev_info!(
> > +            &self.dev,
> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}),
length=0x{:x}\n",
> > +            self.seq,
> > +            rpc_header.function,
> > +            decode_gsp_function(rpc_header.function),
> > +            rpc_header.length,
> > +        );
> > +
> > +        let elem_count = msg_header.elemCount;
> > +        self.seq += 1;
> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
> 
> I'm a bit surprised that we always write `0` here, can we document this
> behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?
I was also suprised the value doesn't matter but this is what both Nouveau
and
OpenRM do. I guess it just triggers some kind of notification on the GSP and the
actual queue head is read from the shared memory pointers.
> > +
> > +        Ok(())
> > +    }
> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp,
R>(
> > +        &mut self,
> > +        timeout: Delta,
> > +        init: impl FnOnce(&M,
SBuffer<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
> > +    ) -> Result<R> {
> > +        let (driver_area, msg_header, slice_1) = wait_on(timeout, ||
{
> > +            let driver_area = self.gsp_mem.driver_read_area();
> > +            // TODO: find an alternative to as_flattened()
> > +            #[allow(clippy::incompatible_msrv)]
> > +            let (msg_header_slice, slice_1) = driver_area
> > +                .0
> > +                .as_flattened()
> > +                .split_at(size_of::<GspMsgElement>());
> 
> Beware that `split_at` will panic if the slice is shorter than the
> passed argument. So we must check here that the driver area is larger
> than `GspMsgElement`.
Yep, see the diff I had above.
> > +
> > +            // Can't fail because msg_slice will always be
> > +            // size_of::<GspMsgElement>() bytes long by the
above split.
> > +            let msg_header =
GspMsgElement::from_bytes(msg_header_slice).unwrap();
> > +            if msg_header.rpc.length < size_of::<M>() as u32
{
> > +                return None;
> > +            }
> 
> If we have a message in the queue and this condition doesn't hold, I
> don't think we can hope that it will in a further iteration - this
> should be an error.
I was thinking messages may get partially written and we'd see this state,
but
you're right - the shared pointers shouldn't be updated until the entire
message
is written so this should be an error.

That will require the `wait_on_result()` function, because `wait_on()`
doesn't
allow the closure to return a failure. We could just return
Option<Result<_>>
from the closure but that's a bit gross, because wait_on() would then return
Result<Option<Result<_>>>.
> > +
> > +            Some((driver_area, msg_header, slice_1))
> > +        })?;
> > +
> > +        let (cmd_slice, payload_1) =
slice_1.split_at(size_of::<M>());
> > +        let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
> > +        // TODO: find an alternative to as_flattened()
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let payload_2 = driver_area.1.as_flattened();
> 
> There is an issue here - payload_1 and payload_2 cover the *whole* area
> that is readable by the driver, not just the first message in the queue.
Oh good catch.
> If there is more than one message pending when this method is called, we
> will get a wrong checksum and skip all the following messages. We need
> to truncate payload_1/payload_2 to cover the exact area of the first
> message only.
>
> > +
> > +        // Log RPC receive with message type decoding
> > +        dev_info!(
> > +            self.dev,
> > +            "GSP RPC: receive: seq# {}, function=0x{:x} ({}),
length=0x{:x}\n",
> > +            msg_header.rpc.sequence,
> > +            msg_header.rpc.function,
> > +            decode_gsp_function(msg_header.rpc.function),
> > +            msg_header.rpc.length,
> > +        );
> > +
> > +        if GspCmdq::calculate_checksum(SBuffer::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +            payload_1,
> > +            payload_2,
> > +        ])) != 0
> > +        {
> > +            dev_err!(
> > +                self.dev,
> > +                "GSP RPC: receive: Call {} - bad checksum",
> > +                msg_header.rpc.sequence
> > +            );
> > +            return Err(EIO);
> > +        }
> > +
> > +        let result = if msg_header.rpc.function == M::FUNCTION {
> 
> This should be done way earlier. At this point we have already converted
> the command bytes slices into M, which is invalid if it happens that
> this condition doesn't hold.
> 
> (on a related note, the checksum verification should also be done before
> we interpret the message, as a corrupted message could make us cast
> `cmd` into something that it is not)
Sounds good.
> > +            let sbuffer = SBuffer::new_reader([payload_1,
payload_2]);
> > +            init(cmd, sbuffer)
> > +        } else {
> > +            Err(ERANGE)
> > +        };
> > +
> > +        self.gsp_mem
> > +           
.advance_cpu_read_ptr(msg_header.rpc.length.div_ceil(GSP_PAGE_SIZE as u32));
> 
> There is a landmine here. `msg_header.rpc.length` contains the length of
> the payload, INCLUDING the RPC header itself, but NOT INCLUDING the
> remainder of `msg_header`. Therefore the correct statement should be:
> 
>     self.gsp_mem.advance_cpu_read_ptr(
>         (size_of_val(msg_header) as u32 - size_of_val(&msg_header.rpc)
as u32
>             + msg_header.rpc.length)
>             .div_ceil(GSP_PAGE_SIZE as u32),
>     );
> 
> Of course, it looks horrible, so we want to hide this member altogether
> and provide a nice, well-documented method that returns something that
> is immediately useful for us. More on that in `fw.rs`.
> 
> (the previous use of `length` in this method is also incorrect).
> 
> > +        result
> > +    }
> > +}
> > +
> > +fn decode_gsp_function(function: u32) -> &'static str {
> 
> Once we have proper enums for the function and events, this can be an
> `as_str` method.
Yep.
> > +    match function {
> > +        // Common function codes
> > +        NV_VGPU_MSG_FUNCTION_NOP => "NOP",
> > +        NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO =>
"SET_GUEST_SYSTEM_INFO",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_ROOT => "ALLOC_ROOT",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE =>
"ALLOC_DEVICE",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY =>
"ALLOC_MEMORY",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA =>
"ALLOC_CTX_DMA",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA =>
"ALLOC_CHANNEL_DMA",
> > +        NV_VGPU_MSG_FUNCTION_MAP_MEMORY => "MAP_MEMORY",
> > +        NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA =>
"BIND_CTX_DMA",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT =>
"ALLOC_OBJECT",
> > +        NV_VGPU_MSG_FUNCTION_FREE => "FREE",
> > +        NV_VGPU_MSG_FUNCTION_LOG => "LOG",
> > +        NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO =>
"GET_GSP_STATIC_INFO",
> > +        NV_VGPU_MSG_FUNCTION_SET_REGISTRY =>
"SET_REGISTRY",
> > +        NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO =>
"GSP_SET_SYSTEM_INFO",
> > +        NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU =>
"GSP_INIT_POST_OBJGPU",
> > +        NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL =>
"GSP_RM_CONTROL",
> > +        NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO =>
"GET_STATIC_INFO",
> > +
> > +        // Event codes
> > +        NV_VGPU_MSG_EVENT_GSP_INIT_DONE => "INIT_DONE",
> > +        NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER =>
"RUN_CPU_SEQUENCER",
> > +        NV_VGPU_MSG_EVENT_POST_EVENT => "POST_EVENT",
> > +        NV_VGPU_MSG_EVENT_RC_TRIGGERED =>
"RC_TRIGGERED",
> > +        NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED =>
"MMU_FAULT_QUEUED",
> > +        NV_VGPU_MSG_EVENT_OS_ERROR_LOG =>
"OS_ERROR_LOG",
> > +        NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD =>
"NOCAT",
> > +        NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE =>
"LOCKDOWN_NOTICE",
> > +        NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT =>
"LIBOS_PRINT",
> > +
> > +        // Default for unknown codes
> > +        _ => "UNKNOWN",
> > +    }
> > +}
> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs
b/drivers/gpu/nova-core/gsp/fw.rs
> > index 7f4fe684ddaf..2e4255301e58 100644
> > --- a/drivers/gpu/nova-core/gsp/fw.rs
> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
> > @@ -15,7 +15,9 @@
> >  use crate::firmware::gsp::GspFirmware;
> >  use crate::gpu::Chipset;
> >  use crate::gsp;
> > +use crate::gsp::cmdq::MSGQ_NUM_PAGES;
> 
> I guess the number of pages in the message queue is firmware-dependent,
> so would it make sense to move its declaration to this module?
It's not firmware dependent - it appears we can set it to whatever we
reasonably
want (for some value of "reasonable) so long as we allocate the right
buffer
sizes, etc. I noticed and tested this a while back when removing some magic
numbers.

That said 0x3f is what I chose because that's what Nouveau and I think
OpenRM
use.
> >  use crate::gsp::FbLayout;
> > +use crate::gsp::GSP_PAGE_SIZE;
> >  
> >  /// Dummy type to group methods related to heap parameters for
running the GSP firmware.
> >  pub(crate) struct GspFwHeapParams(());
> > @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware,
fb_layout: &FbLayout) -> Self {
> >      // GSP firmware constants
> >      GSP_FW_WPR_META_MAGIC,
> >      GSP_FW_WPR_META_REVISION,
> > +
> > +    // GSP events
> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> > +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
> > +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
> > +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> > +    NV_VGPU_MSG_EVENT_POST_EVENT,
> > +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> > +
> > +    // GSP function calls
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
> > +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
> > +    NV_VGPU_MSG_FUNCTION_FREE,
> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> > +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
> > +    NV_VGPU_MSG_FUNCTION_LOG,
> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
> > +    NV_VGPU_MSG_FUNCTION_NOP,
> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> > +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> >  };
> >  
> >  #[repr(transparent)]
> > @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
> >          })
> >      }
> >  }
> > +
> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
> 
> This should be wrapped into a newtype that provides the exact set of
> features required by the gsp module, like what is done for
> `LibosMemoryRegionInitArgument`. For this type we just need two things:
> return the `writePtr`, and advance it by a given value. That's all
> the parent module needs to see.
Except that doesn't actually work, because we need to use DMA access methods
on
bindings::msgqTxHeader which we can't do from an accessor.
> By just aliasing this type, we make all its members visible to the `gsp`
> module. This increases its dependency on a given firmware version,
> carries a risk that the GSP module will mess with what it is not
> supposed to, and introduces inconsistency in how we abstract the
> firmware types - some are wrapped, others are not. Let's be consistent
> and make all bindings completely opaque outside of `fw.rs`.
I can't see how accessors in `fw.rs` for every firmware binding we use is
useful
though[1]? In what way does that reduce dependency on a given firmware version?
I can't see how it isn't just adding boiler plate for every struct field
we want
to access.

We already have higher level abstractions (eg. DmaGspMem) to ensure the GSP
won't mess with what it isn't supposed to, and these higher level
abstractions
also provide more options to reduce dependency on a given firmware version as
the logic isn't as tightly coupled to the binding representation. It also
allows
us to access the fields using the proper DMA access methods.

[1] - Except for maybe removing some nonRustySnakeyNames
> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
> > +unsafe impl AsBytes for MsgqTxHeader {}
> > +
> > +impl MsgqTxHeader {
> > +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self
{
> > +        Self {
> > +            version: 0,
> > +            size: msgq_size,
> > +            msgSize: GSP_PAGE_SIZE as u32,
> > +            msgCount: MSGQ_NUM_PAGES,
> > +            writePtr: 0,
> > +            flags: 1,
> > +            rxHdrOff: rx_hdr_offset,
> > +            entryOff: GSP_PAGE_SIZE as u32,
> > +        }
> > +    }
> > +}
> > +
> > +/// RX header for setting up a message queue with the GSP.
> > +///
> > +/// # Invariants
> > +///
> > +/// [`Self::read_ptr`] is guaranteed to return a value in the range
`0..NUM_PAGES`.
> > +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
> > +unsafe impl AsBytes for MsgqRxHeader {}
> > +
> > +impl MsgqRxHeader {
> > +    pub(crate) fn new() -> Self {
> > +        Default::default()
> > +    }
> > +}
> > +
> > +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
> 
> This type too is another good illustration of why we should make our
> bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
> times, ignoring the fact that it also includes the size of the RPC
> header itself - not just what comes after it! Since it doesn't come
with
> any documentation, we can be forgiven for assuming the obvious - that it
> is just the size of what follows, but it is not.
This has nothing to do with the bindings being opaque or not though. I can
almost gurantee that after writing ten brain dead accessor functions that the
eleventh would have looked pretty similar and included the same bug!
> What we actually want is a method on `GspMsgElement` that returns what
> we actually want (and is documented as such): the actual size of the
> payload following the whole header. That way there can be no room for
> mistake.
The length calculation can just be a method on GspRpcHeader without requiring
the rest of the fields to need accessor functions as welli though.

Take for example a structure that we will need in future, GspStaticConfigInfo_t.
This currently has 53 fields, although to be fair maybe we only care about half
of them.

My proposal[2] is to convert them into an abstract nova-core specific type,
GspConfigInfo, that contains the specific details Nova users care about.
Following the implied rule that no field accessors shall leak from fw.rs would
require accessors for all fields. That's at least another 78 lines (assuming
a
minimal 3 line accessor function) of boiler plate for one FW structure. Multiply
that by many structures and you now have a lot of boiler plate for no real gain
and a very large fw.rs (note my example[2] only contains one field, but I expect
we will need a lot more if not most of them).

[2] -
https://github.com/apopple-nvidia/linux/blob/nova-core-unstable/drivers/gpu/nova-core/gsp/commands.rs#L32
> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
> > +unsafe impl AsBytes for GspRpcHeader {}
> > +
> > +// SAFETY: This struct only contains integer types for which all bit
patterns
> > +// are valid.
> > +unsafe impl FromBytes for GspRpcHeader {}
> > +
> > +impl GspRpcHeader {
> > +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
> > +        Self {
> > +            // TODO: magic number
> > +            header_version: 0x03000000,
> > +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
> > +            function,
> > +            // TODO: overflow check?
> > +            length: size_of::<Self>() as u32 + cmd_size,
> > +            rpc_result: 0xffffffff,
> > +            rpc_result_private: 0xffffffff,
> > +            ..Default::default()
> > +        }
> > +    }
> > +}
> > +
> > +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
> 
> Hammering my previous argument a bit more: in `cmdq.rs`, we do the
> following:
> 
>     let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> 
> Even though we explicitly used `GspMsgElement`, `msg_header` appears as
> being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing
and
> can be avoided by using the newtype pattern.
That's surprising. I would have hoped the LSP would be more helpful than
that.
> Lastly, the bindings generated from C headers are supposed to be temporary,
and
> we eventually want to replace them with direct IDL-to-Rust bindings. Not
> leaking the C types let us design `fw.rs` as a blueprint for the ideal
> interface we would want to generate - so the limited extra labor that
> comes with wrapping these types would also pay off in that respect.
I agree with this - `fw.rs` should be the blueprint for the ideal bindings
that we could conceivably generate directly from IDL. But this is also why
I'm pushing back against the idea of putting any kind of logic (in the form
of
accessors/constructors) into `fw.rs` - I doubt you will ever be able to generate
those from IDL.

For example I doubt it would be possible to generate the length logic, and
I'm
even more doubtful we could figure out how to generate the constructors based on
Rust type.

So to summarise where I think we're at we have two proposals for how to
treat/use the generated bindings:

1) Make them all a newtype[3] using a transparent wrapper to make the fields
opaque and accessor functions to implement the logic. This would all be in fw.rs
to stop the C types leaking out.

2) Build a completely new type based on the underlying bindings plus whatever
other logic is needed. The C types could be used within the `gsp` module to
implement the new types but would not leak further than that.

Using GspStaticConfigInfo_t as a concrete example I think option (1) looks like
this:

In fw.rs:

#[repr(transparent)]
pub(crate) struct GspStaticConfigInfo(bindings::GspStaticConfigInfo_t);

impl GspStaticConfigInfo {
    pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
        // Create GspStaticConfigInfo sending whatever messages needed to the
Gsp.
    }
    
    pub(crate) fn gpu_name(self) -> String {
        ...
    }    

    pub(crate) fn sriov_max_gfid(self) -> u32 {
        self.0.sriovMaxGfid
    }

    pub(crate) fn fb_length(self) -> u64 {
       self.0.fb_length
    }

    pub(crate) fn ats_supported(self> -> bool {
        self.0.bAtsSupported != 0
    }

    // More for every field we need
    ...
}

And Option (2) could look more like this:

In commands.rs:

pub(crate) struct GspStaticConfigInfo {
    pub gpu_name: String,
    pub sriov_max_gfid: u32,
    pub fb_length: u64,
    pub ats_supported: bool,
}

impl GspStaticConfigInfo {
    pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
        // Create GspStaticConfigInfo sending whatever messages needed to the
Gsp.
    }
}

Peronally I favour option (2) because it is more concise and allows us to
split the code up more. It is also more resilient to FW changes as it
doesn't
depend on the underlying representation at all. However based on your feedback
I think you favour option (1) more so I want to make sure I'm understanding
it correctly.

[3] - https://doc.rust-lang.org/rust-by-example/generics/new_types.html

Alexandre Courbot

2025-Sep-29 14:34 UTC

head link

[PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

On Mon Sep 29, 2025 at 3:19 PM JST, Alistair Popple wrote:
<snip>>> > +
>> > +/// Number of GSP pages making the Msgq.
>> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
>> > +
>> > +#[repr(C, align(0x1000))]
>> > +#[derive(Debug)]
>> > +struct MsgqData {
>> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
>> > +}
>> > +
>> > +// Annoyingly there is no real equivalent of #define so we're
forced to use a
>> > +// literal to specify the alignment above. So check that against
the actual GSP
>> > +// page size here.
>> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
>> > +
>> > +// There is no struct defined for this in the
open-gpu-kernel-source headers.
>> > +// Instead it is defined by code in GspMsgQueuesInit().
>> > +#[repr(C)]
>> > +struct Msgq {
>> > +    tx: MsgqTxHeader,
>> > +    rx: MsgqRxHeader,
>> > +    msgq: MsgqData,
>> > +}
>> > +
>> > +#[repr(C)]
>> > +struct GspMem {
>> > +    ptes: [u8; GSP_PAGE_SIZE],
>> > +    cpuq: Msgq,
>> > +    gspq: Msgq,
>> > +}
>> > +
>> > +// SAFETY: These structs don't meet the no-padding
requirements of AsBytes but
>> > +// that is not a problem because they are not used outside the
kernel.
>> > +unsafe impl AsBytes for GspMem {}
>> > +
>> > +// SAFETY: These structs don't meet the no-padding
requirements of FromBytes but
>> > +// that is not a problem because they are not used outside the
kernel.
>> > +unsafe impl FromBytes for GspMem {}
>> 
>> These ARE used outside the kernel, since they are shared with the GSP.
>> :) I'd say the reason these are safe is just because we satisfy the
>> requirements of AsBytes and FromBytes:
>
> Yes, and they're actually old safety comments that I missed updating to
be the
> same as the rest.
>  
>> - No initialized bytes,
>> - No interior mutability,
>> - All bytes patterns are valid (for some generous definition of
>>   "valid" limited to not triggering UB).
>
> I was under the impression that that's all `unsafe` is really marking -
code
> the compiler can't prove won't trigger UB. So I'd assume
that's all we'd have to
> prove in safety comments anyway.
A good rule of thumb for writing `SAFETY` statements is to look at the
`Safety` section of the unsafe code we call (here `FromBytes`), and
justify why our calling code satisfies each of these.
>> > +    dev: ARef<device::Device>,
>> > +    seq: u32,
>> > +    gsp_mem: DmaGspMem,
>> > +    pub _nr_ptes: u32,
>> > +}
>> > +
>> > +impl GspCmdq {
>> > +    pub(crate) fn new(dev:
&device::Device<device::Bound>) -> Result<GspCmdq> {
>> > +        let gsp_mem = DmaGspMem::new(dev)?;
>> > +        let nr_ptes = size_of::<GspMem>() >>
GSP_PAGE_SHIFT;
>> > +        build_assert!(nr_ptes * size_of::<u64>() <=
GSP_PAGE_SIZE);
>> > +
>> > +        Ok(GspCmdq {
>> > +            dev: dev.into(),
>> > +            seq: 0,
>> > +            gsp_mem,
>> > +            _nr_ptes: nr_ptes as u32,
>> > +        })
>> > +    }
>> > +
>> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it:
T) -> u32 {
>> > +        let sum64 = it
>> > +            .enumerate()
>> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
>> > +            .fold(0, |acc, (rol, byte)| acc ^
u64::from(byte).rotate_left(rol));
>> > +
>> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
>> > +    }
>> > +
>> > +    #[expect(unused)]
>> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>> > +        &mut self,
>> > +        bar: &Bar0,
>> > +        payload_size: usize,
>> > +        init: impl FnOnce(&mut M,
SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
>> 
>> This works pretty well for in-place initialization.
>> 
>> There a couple of small drawbacks though: `M` must implement
`FromBytes`
>> even though we only send it, and (more serious) there is no guarantee
>
> Yes, that detail annoyed me slightly too.
>
>> that `init` will fully initialize the command - a forgetful caller can
>> omit some of its fields, or even the whole structure, and in that case
>> we will send a command with what happened to be at that position of the
>> queue at that time.
>
> Good timing as I was just looking to see if there wasn't some way of
ensuring
> this happened, or at least was much more explicit than what we currently
do.
>
>> I think this is a good case for using the `Init` trait: it's like
>> `PinInit`, but without the `Pin`, and it ensures that the whole
argument
>> is initialized. So this method would change into something like:
>> 
>>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>>         &mut self,
>>         bar: &Bar0,
>>         payload_size: usize,
>>         init: impl Init<M, Error>,
>>         init_sbuf: impl
FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) ->
Result,
>> 
>> This also allows us to drop the `FromBytes` requirement on
>> `GspCommandToGsp`! But also requires us to use `unsafe` to call
>> `Init::__init` on the pointer to the command. I think it's worth it
>> though, as this removes the risk of sending partially-uninitialized
>> commands.
>
> Agree on it being worth the unsafe call, especially because it is unsafe if
the
> caller doesn't do what it's supposed to regardless. But how does
this contrast
> with `MaybeUninit`? I was wondering if something like this would work:
>
>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>         &mut self,
>         bar: &Bar0,
>         payload_size: usize,
>         init: impl FnOnce(MaybeUninit<&mut M>,
SBuffer<core::array::IntoIter<&mut [u8], 2>>) ->
Result<&M>,
>
> Note I'm not sure we could actually make that work as I haven't
tried it yet. I
> suspect I've missed something with lifetimes there :-)
The difference with using `MaybeUninit` is that the unsafe code would
need to be *within* each `init` argument - IOW, within each caller. It
also does not provide any guarantee that the whole `M` is initialized,
we have to trust each caller for doing that properly.

I've tried implementing using `Init` and it works even better than I
thought. Each type actually automatically implement `Init` on itself,
which means that instead of implicitly calling init!() as in

  cmdq.send_gsp_command(bar, init!(MyCommand {
    foo: ...,
  }), ...);

You can just pass an object as a stack argument and it will also work!

  cmdq.send_gsp_command(bar, MyCommand {
    foo: ...,
  }, ...);

Of course in this case the `MyCommand` instance is first passed on the
stack before being copied into the command-queue, but for small commands
this is pretty convenient. Larger ones can use the `init!` syntax for
full in-place initialization.
>
>
>> Then there is what to do with the `SBuffer`. I'd like to think a
bit
>> more about this, as not all commands require it, and those that do
>> typically send arrays of particular types. I think we should be able to
>> use the type system to have more control over this, but let's keep
that
>> for the next revision.
>
> Sure. You are correct that not all (most?) commands don't need it, and
it's a
> shame we don't have proper bindings for them anyway. Although in some
cases that
> still wouldn't work anyway (eg. for registry keys) where it really is
just a
> packed variable number of variable length strings. Not sure what a Rust
native
> representation of that would be.
For now the best I could think of is to have not one but two traits for
GSP commands: one for commands with payload and another one for commands
without payload. And two variants of `send_gsp_command` for each case.
That way we can at least guarantee that we won't pass a payload where we
shouldn't, or forget one where we should.

I got a prototype of this working and along with the `Init` thing it
feels pretty right.
>
>> > +    ) -> Result {
>> > +        // TODO: a method that extracts the regions for a given
command?
>> > +        // ... and another that reduces the region to a given
number of bytes!
>> > +        let driver_area = self.gsp_mem.driver_write_area();
>> > +        let free_tx_pages = driver_area.0.len() +
driver_area.1.len();
>> > +
>> > +        // Total size of the message, including the headers,
command, and optional payload.
>> > +        let msg_size = size_of::<GspMsgElement>() +
size_of::<M>() + payload_size;
>> > +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
>> > +            return Err(EAGAIN);
>> > +        }
>> > +
>> > +        let (msg_header, cmd, payload_1, payload_2) = {
>> > +            // TODO: find an alternative to as_flattened_mut()
>> 
>> I think we can use it if we enable the "slice_flatten"
feature (stable
>> since 1.80, introduced in 1.67).
>
> Ok, will try that. Hopefully explicitly using the feature also makes clippy
> happy but if not I guess we can leave the over-rides below.
I think you will also need to explicitly enable the feature somewhere -
for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
different crate I am not sure where we are supposed to do it...

<snip>>> > +
>> > +            (msg_header, cmd, payload_1, payload_2)
>> > +        };
>> > +
>> > +        let sbuffer = SBuffer::new_writer([&mut
payload_1[..], &mut payload_2[..]]);
>> > +        init(cmd, sbuffer)?;
>> > +
>> > +        *msg_header = GspMsgElement::new(self.seq,
size_of::<M>() + payload_size, M::FUNCTION);
>> > +        msg_header.checkSum =
GspCmdq::calculate_checksum(SBuffer::new_reader([
>> > +            msg_header.as_bytes(),
>> > +            cmd.as_bytes(),
>> > +            payload_1,
>> > +            payload_2,
>> > +        ]));
>> > +
>> > +        let rpc_header = &msg_header.rpc;
>> > +        dev_info!(
>> > +            &self.dev,
>> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}),
length=0x{:x}\n",
>> > +            self.seq,
>> > +            rpc_header.function,
>> > +            decode_gsp_function(rpc_header.function),
>> > +            rpc_header.length,
>> > +        );
>> > +
>> > +        let elem_count = msg_header.elemCount;
>> > +        self.seq += 1;
>> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
>> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
>> 
>> I'm a bit surprised that we always write `0` here, can we document
this
>> behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?
>
> I was also suprised the value doesn't matter but this is what both
Nouveau and
> OpenRM do. I guess it just triggers some kind of notification on the GSP
and the
> actual queue head is read from the shared memory pointers.
In that case I think we should definitely document this - or even
better, if it nevers makes sense to write anything else than 0 to
`NV_PGSP_QUEUE_HEAD`, to add an explicit method (e.g. `kick`?) that does
this for us. Can you try and go to the bottom of this?
>> > +
>> > +            // Can't fail because msg_slice will always be
>> > +            // size_of::<GspMsgElement>() bytes long by the
above split.
>> > +            let msg_header =
GspMsgElement::from_bytes(msg_header_slice).unwrap();
>> > +            if msg_header.rpc.length < size_of::<M>() as
u32 {
>> > +                return None;
>> > +            }
>> 
>> If we have a message in the queue and this condition doesn't hold,
I
>> don't think we can hope that it will in a further iteration - this
>> should be an error.
>
> I was thinking messages may get partially written and we'd see this
state, but
> you're right - the shared pointers shouldn't be updated until the
entire message
> is written so this should be an error.
>
> That will require the `wait_on_result()` function, because `wait_on()`
doesn't
> allow the closure to return a failure. We could just return
Option<Result<_>>
> from the closure but that's a bit gross, because wait_on() would then
return
> Result<Option<Result<_>>>.
We can now use `read_poll_timeout` (from `kernel::io::poll`) which
separates the acquisition of the state and the test of the condition
into two separate closures - the first one returns a `Result`, and its
error is propagated as-is, which is what we want in this case.

Actually this reminded me that I should send a patch to remove `wait_on`
altogether now that we have a better alternative! :)
>> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs
b/drivers/gpu/nova-core/gsp/fw.rs
>> > index 7f4fe684ddaf..2e4255301e58 100644
>> > --- a/drivers/gpu/nova-core/gsp/fw.rs
>> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> > @@ -15,7 +15,9 @@
>> >  use crate::firmware::gsp::GspFirmware;
>> >  use crate::gpu::Chipset;
>> >  use crate::gsp;
>> > +use crate::gsp::cmdq::MSGQ_NUM_PAGES;
>> 
>> I guess the number of pages in the message queue is firmware-dependent,
>> so would it make sense to move its declaration to this module?
>
> It's not firmware dependent - it appears we can set it to whatever we
reasonably
> want (for some value of "reasonable) so long as we allocate the right
buffer
> sizes, etc. I noticed and tested this a while back when removing some magic
> numbers.
>
> That said 0x3f is what I chose because that's what Nouveau and I think
OpenRM
> use.
Ah, that's great and makes perfect sense! So I suppose the
`pageTableEntryCount` member of the `MESSAGE_QUEUE_INIT_ARGUMENT` is to
pass the total size of the message queue. I'm starting to think about a
version of the command queue where its size would be a generic argument,
but let's keep that for a later patch, if we ever want it at all. :)
>
>> >  use crate::gsp::FbLayout;
>> > +use crate::gsp::GSP_PAGE_SIZE;
>> >  
>> >  /// Dummy type to group methods related to heap parameters for
running the GSP firmware.
>> >  pub(crate) struct GspFwHeapParams(());
>> > @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware:
&GspFirmware, fb_layout: &FbLayout) -> Self {
>> >      // GSP firmware constants
>> >      GSP_FW_WPR_META_MAGIC,
>> >      GSP_FW_WPR_META_REVISION,
>> > +
>> > +    // GSP events
>> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>> > +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
>> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
>> > +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
>> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
>> > +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
>> > +    NV_VGPU_MSG_EVENT_POST_EVENT,
>> > +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
>> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>> > +
>> > +    // GSP function calls
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
>> > +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_FREE,
>> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_LOG,
>> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
>> > +    NV_VGPU_MSG_FUNCTION_NOP,
>> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>> >  };
>> >  
>> >  #[repr(transparent)]
>> > @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
>> >          })
>> >      }
>> >  }
>> > +
>> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
>> 
>> This should be wrapped into a newtype that provides the exact set of
>> features required by the gsp module, like what is done for
>> `LibosMemoryRegionInitArgument`. For this type we just need two things:
>> return the `writePtr`, and advance it by a given value. That's all
>> the parent module needs to see.
>
> Except that doesn't actually work, because we need to use DMA access
methods on
> bindings::msgqTxHeader which we can't do from an accessor.
If the accessor uses `read_volatile` and `write_volatile` then I guess
it should work? These are unsafe though, so this is not ideal, but at
least they would be contained into the `fw` module and limited to one
getter and one setter.
>
>> By just aliasing this type, we make all its members visible to the
`gsp`
>> module. This increases its dependency on a given firmware version,
>> carries a risk that the GSP module will mess with what it is not
>> supposed to, and introduces inconsistency in how we abstract the
>> firmware types - some are wrapped, others are not. Let's be
consistent
>> and make all bindings completely opaque outside of `fw.rs`.
>
> I can't see how accessors in `fw.rs` for every firmware binding we use
is useful
> though[1]? In what way does that reduce dependency on a given firmware
version?
> I can't see how it isn't just adding boiler plate for every struct
field we want
> to access.
While the main point of this remains hiding stuff other modules don't
need to see, it also allows us to choose stable method names that are
independent from the firmware.
>
> We already have higher level abstractions (eg. DmaGspMem) to ensure the GSP
> won't mess with what it isn't supposed to, and these higher level
abstractions
> also provide more options to reduce dependency on a given firmware version
as
> the logic isn't as tightly coupled to the binding representation. It
also allows
> us to access the fields using the proper DMA access methods.
`DmaGspMem` is a special case - it is in shared memory and requires
volatile reads and writes at runtime. And yes it also provides a
higher-level abstraction to access the driver's readable and writable
areas of the ring buffer - I think these are at the right place in
`cmdq.rs`.

But we also have other types that we build on the stack, even if we move
them to a DmaObject afterwards. `fw.rs` should provide constructors and
simple primitives for these, on top of which the command queue logic can
be built. Right now we are initializing their fields in `cmdq` and I
think we can agree this is not very portable.

To clarify further, our sole business with half of the bindings types is
initializing them to some value that makes sense and put it somewhere.
For this we only need a constructor. We don't need to know the names of
their fields, and we don't want to know.

Other bindings types are message-related, and we need to access their
contents. You discuss these in more detail below so let's address them
there.

Then the last kind are the types related to the command queue runtime
operation - namely the Rx/Tx headers and their read/write pointers, and
the message headers that we need to read and write. For the former we
only need to poke at their pointers using volatile accesses, which is
two methods that I propose to have for that purpose. For the latter we
only need to access a handful of fields, some of which can be tricky
(like `length`). In any case, we are talking about a handful of methods
here to keep the abstraction opaque.

More on messages below as you make an interesting point on them.
> [1] - Except for maybe removing some nonRustySnakeyNames
>
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
>> > +unsafe impl AsBytes for MsgqTxHeader {}
>> > +
>> > +impl MsgqTxHeader {
>> > +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) ->
Self {
>> > +        Self {
>> > +            version: 0,
>> > +            size: msgq_size,
>> > +            msgSize: GSP_PAGE_SIZE as u32,
>> > +            msgCount: MSGQ_NUM_PAGES,
>> > +            writePtr: 0,
>> > +            flags: 1,
>> > +            rxHdrOff: rx_hdr_offset,
>> > +            entryOff: GSP_PAGE_SIZE as u32,
>> > +        }
>> > +    }
>> > +}
>> > +
>> > +/// RX header for setting up a message queue with the GSP.
>> > +///
>> > +/// # Invariants
>> > +///
>> > +/// [`Self::read_ptr`] is guaranteed to return a value in the
range `0..NUM_PAGES`.
>> > +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
>> > +unsafe impl AsBytes for MsgqRxHeader {}
>> > +
>> > +impl MsgqRxHeader {
>> > +    pub(crate) fn new() -> Self {
>> > +        Default::default()
>> > +    }
>> > +}
>> > +
>> > +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
>> 
>> This type too is another good illustration of why we should make our
>> bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
>> times, ignoring the fact that it also includes the size of the RPC
>> header itself - not just what comes after it! Since it doesn't come
with
>> any documentation, we can be forgiven for assuming the obvious - that
it
>> is just the size of what follows, but it is not.
>
> This has nothing to do with the bindings being opaque or not though. I can
> almost gurantee that after writing ten brain dead accessor functions that
the
> eleventh would have looked pretty similar and included the same bug!
It does very much have to do with making the binding opaque. If it is
not opaque, and the original `length` is accessible, what is preventing
code from using it and repeating the same mistake again?

I don't quite get where the "ten brain dead accessor functions"
would
come from. I propose we write one, that returns a value that makes sense
(contrary to the original member), and most importantly is documented so
users can know exactly what it does. Tell me what is brain-dead about
that? This very patch misused the `length` member on at least two
different occasions, isn't the benefit clear?
>
>> What we actually want is a method on `GspMsgElement` that returns what
>> we actually want (and is documented as such): the actual size of the
>> payload following the whole header. That way there can be no room for
>> mistake.
>
> The length calculation can just be a method on GspRpcHeader without
requiring
> the rest of the fields to need accessor functions as welli though.
>
> Take for example a structure that we will need in future,
GspStaticConfigInfo_t.
> This currently has 53 fields, although to be fair maybe we only care about
half
> of them.
>
> My proposal[2] is to convert them into an abstract nova-core specific type,
> GspConfigInfo, that contains the specific details Nova users care about.
> Following the implied rule that no field accessors shall leak from fw.rs
would
> require accessors for all fields. That's at least another 78 lines
(assuming a
> minimal 3 line accessor function) of boiler plate for one FW structure.
Multiply
> that by many structures and you now have a lot of boiler plate for no real
gain
> and a very large fw.rs (note my example[2] only contains one field, but I
expect
> we will need a lot more if not most of them).
>
> [2] -
https://github.com/apopple-nvidia/linux/blob/nova-core-unstable/drivers/gpu/nova-core/gsp/commands.rs#L32
I am not opposed to having a nova-core specific type, and in the case of
GspStaticConfigInfo_t is certainly seems to make sense. Actually it even
does what we want here, since with that design this type should not be
visible outside of the `fw` module, right?

Now to be fair, one needs to also consider that this new
GspStaticConfigInfo type will also need its own boilerplate-y code to
initialize itself from the original GspStaticConfigInfo_t, field by
field. But in the end we would have a type that provides us with the
abstraction that we need for the GSP module, while making the bindings
type invisible - this fits the bill, and looks arguably nicer than
having too many accessors so why not!

One can even argue that the wrapping pattern is just a simpler version
of this - we create a new type to hide the bindings' internals and
provide us with the right interface. It is just that for many
commands/messages the payload is so small that it is actually less code
to just wrap it and provide a few utility methods. GspStaticConfigInfo_t
is certainly an exception.
>
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized
data.
>> > +unsafe impl AsBytes for GspRpcHeader {}
>> > +
>> > +// SAFETY: This struct only contains integer types for which all
bit patterns
>> > +// are valid.
>> > +unsafe impl FromBytes for GspRpcHeader {}
>> > +
>> > +impl GspRpcHeader {
>> > +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
>> > +        Self {
>> > +            // TODO: magic number
>> > +            header_version: 0x03000000,
>> > +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
>> > +            function,
>> > +            // TODO: overflow check?
>> > +            length: size_of::<Self>() as u32 + cmd_size,
>> > +            rpc_result: 0xffffffff,
>> > +            rpc_result_private: 0xffffffff,
>> > +            ..Default::default()
>> > +        }
>> > +    }
>> > +}
>> > +
>> > +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
>> 
>> Hammering my previous argument a bit more: in `cmdq.rs`, we do the
>> following:
>> 
>>     let msg_header =
GspMsgElement::from_bytes(msg_header_slice).unwrap();
>> 
>> Even though we explicitly used `GspMsgElement`, `msg_header` appears as
>> being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super
confusing and
>> can be avoided by using the newtype pattern.
>
> That's surprising. I would have hoped the LSP would be more helpful
than that.
>
>> Lastly, the bindings generated from C headers are supposed to be
temporary, and
>> we eventually want to replace them with direct IDL-to-Rust bindings.
Not
>> leaking the C types let us design `fw.rs` as a blueprint for the ideal
>> interface we would want to generate - so the limited extra labor that
>> comes with wrapping these types would also pay off in that respect.
>
> I agree with this - `fw.rs` should be the blueprint for the ideal bindings
> that we could conceivably generate directly from IDL. But this is also why
> I'm pushing back against the idea of putting any kind of logic (in the
form of
> accessors/constructors) into `fw.rs` - I doubt you will ever be able to
generate
> those from IDL.
`fw.rs` should contain the minimum required to build the logic on top of
it, agreed. Doing so might require having another type on top of the one
defined in `fw` to implement said logic - if you remember I did
something like that for MsgqTxHeader and you thought that was too
convulated. :)
>
> For example I doubt it would be possible to generate the length logic, and
I'm
> even more doubtful we could figure out how to generate the constructors
based on
> Rust type.
Where to put the length logic depends on how our firmware interface is
defined. If we agree that `length` is "The size of the RPC header, the
following command, and its payload but oh wait we don't include the rest
of the message element which includes the RPC header so don't forget to
add it" then we can just return or use the member as-is. But it's a bad
and error-prone interface. We can document it, and that in itself would
be an improvement, but it would still require some mental gymnastic to
use and is just a giant footgun.

What is safe and useful is the length of the entire message, header,
command and payload included, because that's how we use it in the code.
It's also trivial to produce, and if we do it here we only do that
operation on one place instead of at each calling site. So yes, it
belongs here, as part of the firmware abstraction, and if we have to
write it by hand then so be it.

The same goes for the constructor. Of course there will be some manual
work needed to include a new incompatible firmware version. But if you
don't do it here, you will have to do that work somewhere else anyway -
probably `gsp` or `cmdq`, except without the elegance of a method
containing that code. How is that better?
>
> So to summarise where I think we're at we have two proposals for how to
> treat/use the generated bindings:
>
> 1) Make them all a newtype[3] using a transparent wrapper to make the
fields
> opaque and accessor functions to implement the logic. This would all be in
fw.rs
> to stop the C types leaking out.
>
> 2) Build a completely new type based on the underlying bindings plus
whatever
> other logic is needed. The C types could be used within the `gsp` module to
> implement the new types but would not leak further than that.
>
> Using GspStaticConfigInfo_t as a concrete example I think option (1) looks
like
> this:
>
> In fw.rs:
>
> #[repr(transparent)]
> pub(crate) struct GspStaticConfigInfo(bindings::GspStaticConfigInfo_t);
>
> impl GspStaticConfigInfo {
>     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
>         // Create GspStaticConfigInfo sending whatever messages needed to
the Gsp.
>     }
>     
>     pub(crate) fn gpu_name(self) -> String {
>         ...
>     }    
>
>     pub(crate) fn sriov_max_gfid(self) -> u32 {
>         self.0.sriovMaxGfid
>     }
>
>     pub(crate) fn fb_length(self) -> u64 {
>        self.0.fb_length
>     }
>
>     pub(crate) fn ats_supported(self> -> bool {
>         self.0.bAtsSupported != 0
>     }
>
>     // More for every field we need
>     ...
> }
>
> And Option (2) could look more like this:
>
> In commands.rs:
>
> pub(crate) struct GspStaticConfigInfo {
>     pub gpu_name: String,
>     pub sriov_max_gfid: u32,
>     pub fb_length: u64,
>     pub ats_supported: bool,
> }
>
> impl GspStaticConfigInfo {
>     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
>         // Create GspStaticConfigInfo sending whatever messages needed to
the Gsp.
>     }
> }
To make a fair comparison your Option (2) should include this:

impl TryFrom<GspStaticConfigInfo_t> for GspStaticConfigInfo {
  type Error = kernel::Error;

  fn try_from(value: GspStaticConfigInfo_t) -> Result<Self,
Self::Error>
  {
    let gpu_name = // extract gpu string from `value`.
    ...

    Ok(Self {
      gpu_name,
      sriov_max_fgid: value.sriovMaxGid,
      fb_length: value.fb_length,
      ats_supported: value.bAtsSupported != 0,
      // More for every field we need
    })
  }
}

Because you are not only going to create this type from the `Gsp`, but
also from a message you have received as per the code.
>
> Peronally I favour option (2) because it is more concise and allows us to
> split the code up more. It is also more resilient to FW changes as it
doesn't
> depend on the underlying representation at all. However based on your
feedback
> I think you favour option (1) more so I want to make sure I'm
understanding
> it correctly.
My constraints are that I want the binding types to be opaque and
abstracted in a way that is portable and makes sense for the higher
levels. If Option (2) satisfies these constraints (and in the case of
GspStaticConfigInfo_t I believe it does), then I'm fine with it where is
makes sense, and for large types with many fields I think we can make
the case that it does.

I also think this really only applies to messages received from the GSP,
as commands NEED to be laid out in-place in the format that the firmware
expects, so there is little room for using a temporary type where we can
have an in-place constructor that is just as convenient and will use
less stack space. Still, in theory we could also use that technique for
e.g. very complex commands that would benefit from the constructor
pattern.

But for messages from the GSP, why not. There is an additional cost to
consider related to the conversion from one type to the other, and that
will require extra complexity like an associated type in the
MessageFromGsp trait, so for very simple messages a wrapper might be a
better fit, but otherwise, I think it's fair game.

Nouveau - Sep 2025 - [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

[PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

[PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling