thr3ads.net - Nouveau - [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware [Sep 2025]

If this information is useful, please help other people find it:
Share via:

Alexandre Courbot

2025-Aug-29 11:16 UTC

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich
wrote:> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>   /// Structure encapsulating the firmware blobs required for the GPU
to operate.
>>   #[expect(dead_code)]
>>   pub(crate) struct Firmware {
>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>       booter_unloader: BooterFirmware,
>>       /// GSP bootloader, verifies the GSP firmware before loading and
running it.
>>       gsp_bootloader: RiscvFirmware,
>> -    gsp: firmware::Firmware,
>> +    /// GSP firmware.
>> +    gsp: Pin<KBox<GspFirmware>>,
>
> Is there a reason why we don't just propagate it through struct Gpu,
which uses
> pin-init already?
>
> You can make Firmware pin_data too and then everything is within the single
> allocation of struct Gpu.
I tried doing that at first, and hit the problem that the `impl PinInit`
returned by `GspFirmware::new` borrows a reference to the GSP firmware
binary loaded by `Firmware::new` - when `Firmware::new` returns, the
firmware gets freed, and the borrow checker complains.

We could move the GSP firmware loading code into the `pin_init!` of
`Firmware::new`, but now we hit another problem: in `Gpu::new` the
following code is executed:

    FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?

which requires the `Firmware` instance, which doesn't exist yet as the
`Gpu` object isn't initialized until the end of the method.

So we could move `FbLayout`, and everything else created by `Gpu::new`
to become members of the `Gpu` instance. It does make sense actually:
this `new` method is doing a lot of stuff, such as running FRTS, and
with Alistair's series it even runs Booter, the sequencer and so on.
Maybe we should move all firmware execution to a separate method that is
called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
constructor looks like it does a bit more than it should.

... but even when doing that, `Firmware::new` and `FbLayout::new` still
require a reference to the `Bar`, and... you get the idea. :)

So I don't think the current design allows us to do that easily or at
all, and even if it does, it will be at a significant cost in code
clarity. There is also the fact that I am considering making the
firmware member of `Gpu` a trait object: the boot sequence is so
different between pre and post-Hopper that I don't think it makes sense
to share the same `Firmware` structure between the two. I would rather
see `Firmware` as an opaque trait object, which provides high-level
methods such as "start GSP" behind which the specifics of each GPU
family are hidden. If we go with this design, `Firmware` will become a
trait object and so cannot be pinned into `Gpu`.

This doesn't change my observation that `Gpu::new` should not IMHO do
steps like booting the GSP - it should just acquire the resources it
needs, return the pinned GPU object, and then `probe` can continue the
boot sequence. Having the GPU object pinned and constructed early
simplifies things quite a bit as the more we progress with boot, the
harder it becomes to construct everything in place (and the `PinInit`
closure also becomes more and more complex).

I'm still laying down the general design, but I'm pretty convinced that
having `Firmware` as a trait object is the right way to abstract the
differences between GPU families.

Danilo Krummrich

2025-Aug-30 12:56 UTC

head link

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

On 8/29/25 1:16 PM, Alexandre Courbot wrote:> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
>> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>>    /// Structure encapsulating the firmware blobs required for the
GPU to operate.
>>>    #[expect(dead_code)]
>>>    pub(crate) struct Firmware {
>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>>        booter_unloader: BooterFirmware,
>>>        /// GSP bootloader, verifies the GSP firmware before loading
and running it.
>>>        gsp_bootloader: RiscvFirmware,
>>> -    gsp: firmware::Firmware,
>>> +    /// GSP firmware.
>>> +    gsp: Pin<KBox<GspFirmware>>,
>>
>> Is there a reason why we don't just propagate it through struct
Gpu, which uses
>> pin-init already?
>>
>> You can make Firmware pin_data too and then everything is within the
single
>> allocation of struct Gpu.
Thanks a lot for the write-up below!
> I tried doing that at first, and hit the problem that the `impl PinInit`
> returned by `GspFirmware::new` borrows a reference to the GSP firmware
> binary loaded by `Firmware::new` - when `Firmware::new` returns, the
> firmware gets freed, and the borrow checker complains.
> 
> We could move the GSP firmware loading code into the `pin_init!` of
> `Firmware::new`, but now we hit another problem: in `Gpu::new` the
> following code is executed:
> 
>      FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?
> 
> which requires the `Firmware` instance, which doesn't exist yet as the
> `Gpu` object isn't initialized until the end of the method.
> 
> So we could move `FbLayout`, and everything else created by `Gpu::new`
> to become members of the `Gpu` instance. It does make sense actually:
> this `new` method is doing a lot of stuff, such as running FRTS, and
> with Alistair's series it even runs Booter, the sequencer and so on.
> Maybe we should move all firmware execution to a separate method that is
> called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
> constructor looks like it does a bit more than it should.
Absolutely, executing the firmware should be a separate method. Having it in the
constructor makes things more difficult.> ... but even when doing that, `Firmware::new` and `FbLayout::new` still
> require a reference to the `Bar`, and... you get the idea. :)
Lifetime wise this should be fine, the &Bar out-lives the constructor, since
it's lifetime is bound to the &Device<Bound> which lives for the
entire duration
of probe().> So I don't think the current design allows us to do that easily or at
> all, and even if it does, it will be at a significant cost in code
> clarity. There is also the fact that I am considering making the
> firmware member of `Gpu` a trait object: the boot sequence is so
> different between pre and post-Hopper that I don't think it makes sense
> to share the same `Firmware` structure between the two. I would rather
> see `Firmware` as an opaque trait object, which provides high-level
> methods such as "start GSP" behind which the specifics of each
GPU
> family are hidden. If we go with this design, `Firmware` will become a
> trait object and so cannot be pinned into `Gpu`.
> 
> This doesn't change my observation that `Gpu::new` should not IMHO do
> steps like booting the GSP - it should just acquire the resources it
> needs, return the pinned GPU object, and then `probe` can continue the
> boot sequence. Having the GPU object pinned and constructed early
> simplifies things quite a bit as the more we progress with boot, the
> harder it becomes to construct everything in place (and the `PinInit`
> closure also becomes more and more complex).
> 
> I'm still laying down the general design, but I'm pretty convinced
that
> having `Firmware` as a trait object is the right way to abstract the
> differences between GPU families.
Makes sense, it's fine with me to keep this in its separate allocation for
the
purpose of making Firmware an opaque trait object, which sounds reasonable.

But we should really properly separate construction of the GPU structure from
firmware boot code execution as you say. And actually move the construction of
the GPU object into try_pin_init!().

Alexandre Courbot

2025-Sep-01 07:11 UTC

head link

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

On Sat Aug 30, 2025 at 9:56 PM JST, Danilo Krummrich
wrote:> On 8/29/25 1:16 PM, Alexandre Courbot wrote:
>> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
>>> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>>>    /// Structure encapsulating the firmware blobs required for
the GPU to operate.
>>>>    #[expect(dead_code)]
>>>>    pub(crate) struct Firmware {
>>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>>>        booter_unloader: BooterFirmware,
>>>>        /// GSP bootloader, verifies the GSP firmware before
loading and running it.
>>>>        gsp_bootloader: RiscvFirmware,
>>>> -    gsp: firmware::Firmware,
>>>> +    /// GSP firmware.
>>>> +    gsp: Pin<KBox<GspFirmware>>,
>>>
>>> Is there a reason why we don't just propagate it through struct
Gpu, which uses
>>> pin-init already?
>>>
>>> You can make Firmware pin_data too and then everything is within
the single
>>> allocation of struct Gpu.
>
> Thanks a lot for the write-up below!
>
>> I tried doing that at first, and hit the problem that the `impl
PinInit`
>> returned by `GspFirmware::new` borrows a reference to the GSP firmware
>> binary loaded by `Firmware::new` - when `Firmware::new` returns, the
>> firmware gets freed, and the borrow checker complains.
>> 
>> We could move the GSP firmware loading code into the `pin_init!` of
>> `Firmware::new`, but now we hit another problem: in `Gpu::new` the
>> following code is executed:
>> 
>>      FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?
>> 
>> which requires the `Firmware` instance, which doesn't exist yet as
the
>> `Gpu` object isn't initialized until the end of the method.
>> 
>> So we could move `FbLayout`, and everything else created by `Gpu::new`
>> to become members of the `Gpu` instance. It does make sense actually:
>> this `new` method is doing a lot of stuff, such as running FRTS, and
>> with Alistair's series it even runs Booter, the sequencer and so
on.
>> Maybe we should move all firmware execution to a separate method that
is
>> called by `probe` after the `Gpu` is constructed, as right now the
`Gpu`
>> constructor looks like it does a bit more than it should.
>
> Absolutely, executing the firmware should be a separate method. Having it
in the
> constructor makes things more difficult.
>> ... but even when doing that, `Firmware::new` and `FbLayout::new` still
>> require a reference to the `Bar`, and... you get the idea. :)
>
> Lifetime wise this should be fine, the &Bar out-lives the constructor,
since
> it's lifetime is bound to the &Device<Bound> which lives for
the entire duration
> of probe().
The &Bar is actually obtained inside the constructor (which is passed
the `Arc<Devres<Bar0>>`), so I don't think that would even work
without
more clever tricks. :)
>> So I don't think the current design allows us to do that easily or
at
>> all, and even if it does, it will be at a significant cost in code
>> clarity. There is also the fact that I am considering making the
>> firmware member of `Gpu` a trait object: the boot sequence is so
>> different between pre and post-Hopper that I don't think it makes
sense
>> to share the same `Firmware` structure between the two. I would rather
>> see `Firmware` as an opaque trait object, which provides high-level
>> methods such as "start GSP" behind which the specifics of
each GPU
>> family are hidden. If we go with this design, `Firmware` will become a
>> trait object and so cannot be pinned into `Gpu`.
>> 
>> This doesn't change my observation that `Gpu::new` should not IMHO
do
>> steps like booting the GSP - it should just acquire the resources it
>> needs, return the pinned GPU object, and then `probe` can continue the
>> boot sequence. Having the GPU object pinned and constructed early
>> simplifies things quite a bit as the more we progress with boot, the
>> harder it becomes to construct everything in place (and the `PinInit`
>> closure also becomes more and more complex).
>> 
>> I'm still laying down the general design, but I'm pretty
convinced that
>> having `Firmware` as a trait object is the right way to abstract the
>> differences between GPU families.
>
> Makes sense, it's fine with me to keep this in its separate allocation
for the
> purpose of making Firmware an opaque trait object, which sounds reasonable.
>
> But we should really properly separate construction of the GPU structure
from
> firmware boot code execution as you say. And actually move the construction
of
> the GPU object into try_pin_init!().
Yes, and I'm glad you agree with this idea as the current design of
putting everything inside the GPU is a bit limiting.

Regarding the firmware, I also think this should undergo a redesign -
right now we are putting unrelated stuff inside the `Firmware`
structure, and this won't scale well when we start supporting Hopper+
which use a very different way to start the GSP.

I'll give more details in my review of Alistair's series, and hopefully
send a v3 with some of these ideas implemented soon.

Nouveau - Sep 2025 - [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware