Alexandre Courbot
2025-Aug-29 11:16 UTC
[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware
On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:> On 8/26/25 6:07 AM, Alexandre Courbot wrote: >> /// Structure encapsulating the firmware blobs required for the GPU to operate. >> #[expect(dead_code)] >> pub(crate) struct Firmware { >> @@ -36,7 +123,10 @@ pub(crate) struct Firmware { >> booter_unloader: BooterFirmware, >> /// GSP bootloader, verifies the GSP firmware before loading and running it. >> gsp_bootloader: RiscvFirmware, >> - gsp: firmware::Firmware, >> + /// GSP firmware. >> + gsp: Pin<KBox<GspFirmware>>, > > Is there a reason why we don't just propagate it through struct Gpu, which uses > pin-init already? > > You can make Firmware pin_data too and then everything is within the single > allocation of struct Gpu.I tried doing that at first, and hit the problem that the `impl PinInit` returned by `GspFirmware::new` borrows a reference to the GSP firmware binary loaded by `Firmware::new` - when `Firmware::new` returns, the firmware gets freed, and the borrow checker complains. We could move the GSP firmware loading code into the `pin_init!` of `Firmware::new`, but now we hit another problem: in `Gpu::new` the following code is executed: FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)? which requires the `Firmware` instance, which doesn't exist yet as the `Gpu` object isn't initialized until the end of the method. So we could move `FbLayout`, and everything else created by `Gpu::new` to become members of the `Gpu` instance. It does make sense actually: this `new` method is doing a lot of stuff, such as running FRTS, and with Alistair's series it even runs Booter, the sequencer and so on. Maybe we should move all firmware execution to a separate method that is called by `probe` after the `Gpu` is constructed, as right now the `Gpu` constructor looks like it does a bit more than it should. ... but even when doing that, `Firmware::new` and `FbLayout::new` still require a reference to the `Bar`, and... you get the idea. :) So I don't think the current design allows us to do that easily or at all, and even if it does, it will be at a significant cost in code clarity. There is also the fact that I am considering making the firmware member of `Gpu` a trait object: the boot sequence is so different between pre and post-Hopper that I don't think it makes sense to share the same `Firmware` structure between the two. I would rather see `Firmware` as an opaque trait object, which provides high-level methods such as "start GSP" behind which the specifics of each GPU family are hidden. If we go with this design, `Firmware` will become a trait object and so cannot be pinned into `Gpu`. This doesn't change my observation that `Gpu::new` should not IMHO do steps like booting the GSP - it should just acquire the resources it needs, return the pinned GPU object, and then `probe` can continue the boot sequence. Having the GPU object pinned and constructed early simplifies things quite a bit as the more we progress with boot, the harder it becomes to construct everything in place (and the `PinInit` closure also becomes more and more complex). I'm still laying down the general design, but I'm pretty convinced that having `Firmware` as a trait object is the right way to abstract the differences between GPU families.
Danilo Krummrich
2025-Aug-30 12:56 UTC
[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware
On 8/29/25 1:16 PM, Alexandre Courbot wrote:> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote: >> On 8/26/25 6:07 AM, Alexandre Courbot wrote: >>> /// Structure encapsulating the firmware blobs required for the GPU to operate. >>> #[expect(dead_code)] >>> pub(crate) struct Firmware { >>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware { >>> booter_unloader: BooterFirmware, >>> /// GSP bootloader, verifies the GSP firmware before loading and running it. >>> gsp_bootloader: RiscvFirmware, >>> - gsp: firmware::Firmware, >>> + /// GSP firmware. >>> + gsp: Pin<KBox<GspFirmware>>, >> >> Is there a reason why we don't just propagate it through struct Gpu, which uses >> pin-init already? >> >> You can make Firmware pin_data too and then everything is within the single >> allocation of struct Gpu.Thanks a lot for the write-up below!> I tried doing that at first, and hit the problem that the `impl PinInit` > returned by `GspFirmware::new` borrows a reference to the GSP firmware > binary loaded by `Firmware::new` - when `Firmware::new` returns, the > firmware gets freed, and the borrow checker complains. > > We could move the GSP firmware loading code into the `pin_init!` of > `Firmware::new`, but now we hit another problem: in `Gpu::new` the > following code is executed: > > FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)? > > which requires the `Firmware` instance, which doesn't exist yet as the > `Gpu` object isn't initialized until the end of the method. > > So we could move `FbLayout`, and everything else created by `Gpu::new` > to become members of the `Gpu` instance. It does make sense actually: > this `new` method is doing a lot of stuff, such as running FRTS, and > with Alistair's series it even runs Booter, the sequencer and so on. > Maybe we should move all firmware execution to a separate method that is > called by `probe` after the `Gpu` is constructed, as right now the `Gpu` > constructor looks like it does a bit more than it should.Absolutely, executing the firmware should be a separate method. Having it in the constructor makes things more difficult.> ... but even when doing that, `Firmware::new` and `FbLayout::new` still > require a reference to the `Bar`, and... you get the idea. :)Lifetime wise this should be fine, the &Bar out-lives the constructor, since it's lifetime is bound to the &Device<Bound> which lives for the entire duration of probe().> So I don't think the current design allows us to do that easily or at > all, and even if it does, it will be at a significant cost in code > clarity. There is also the fact that I am considering making the > firmware member of `Gpu` a trait object: the boot sequence is so > different between pre and post-Hopper that I don't think it makes sense > to share the same `Firmware` structure between the two. I would rather > see `Firmware` as an opaque trait object, which provides high-level > methods such as "start GSP" behind which the specifics of each GPU > family are hidden. If we go with this design, `Firmware` will become a > trait object and so cannot be pinned into `Gpu`. > > This doesn't change my observation that `Gpu::new` should not IMHO do > steps like booting the GSP - it should just acquire the resources it > needs, return the pinned GPU object, and then `probe` can continue the > boot sequence. Having the GPU object pinned and constructed early > simplifies things quite a bit as the more we progress with boot, the > harder it becomes to construct everything in place (and the `PinInit` > closure also becomes more and more complex). > > I'm still laying down the general design, but I'm pretty convinced that > having `Firmware` as a trait object is the right way to abstract the > differences between GPU families.Makes sense, it's fine with me to keep this in its separate allocation for the purpose of making Firmware an opaque trait object, which sounds reasonable. But we should really properly separate construction of the GPU structure from firmware boot code execution as you say. And actually move the construction of the GPU object into try_pin_init!().
Alexandre Courbot
2025-Sep-01 07:11 UTC
[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware
On Sat Aug 30, 2025 at 9:56 PM JST, Danilo Krummrich wrote:> On 8/29/25 1:16 PM, Alexandre Courbot wrote: >> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote: >>> On 8/26/25 6:07 AM, Alexandre Courbot wrote: >>>> /// Structure encapsulating the firmware blobs required for the GPU to operate. >>>> #[expect(dead_code)] >>>> pub(crate) struct Firmware { >>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware { >>>> booter_unloader: BooterFirmware, >>>> /// GSP bootloader, verifies the GSP firmware before loading and running it. >>>> gsp_bootloader: RiscvFirmware, >>>> - gsp: firmware::Firmware, >>>> + /// GSP firmware. >>>> + gsp: Pin<KBox<GspFirmware>>, >>> >>> Is there a reason why we don't just propagate it through struct Gpu, which uses >>> pin-init already? >>> >>> You can make Firmware pin_data too and then everything is within the single >>> allocation of struct Gpu. > > Thanks a lot for the write-up below! > >> I tried doing that at first, and hit the problem that the `impl PinInit` >> returned by `GspFirmware::new` borrows a reference to the GSP firmware >> binary loaded by `Firmware::new` - when `Firmware::new` returns, the >> firmware gets freed, and the borrow checker complains. >> >> We could move the GSP firmware loading code into the `pin_init!` of >> `Firmware::new`, but now we hit another problem: in `Gpu::new` the >> following code is executed: >> >> FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)? >> >> which requires the `Firmware` instance, which doesn't exist yet as the >> `Gpu` object isn't initialized until the end of the method. >> >> So we could move `FbLayout`, and everything else created by `Gpu::new` >> to become members of the `Gpu` instance. It does make sense actually: >> this `new` method is doing a lot of stuff, such as running FRTS, and >> with Alistair's series it even runs Booter, the sequencer and so on. >> Maybe we should move all firmware execution to a separate method that is >> called by `probe` after the `Gpu` is constructed, as right now the `Gpu` >> constructor looks like it does a bit more than it should. > > Absolutely, executing the firmware should be a separate method. Having it in the > constructor makes things more difficult. >> ... but even when doing that, `Firmware::new` and `FbLayout::new` still >> require a reference to the `Bar`, and... you get the idea. :) > > Lifetime wise this should be fine, the &Bar out-lives the constructor, since > it's lifetime is bound to the &Device<Bound> which lives for the entire duration > of probe().The &Bar is actually obtained inside the constructor (which is passed the `Arc<Devres<Bar0>>`), so I don't think that would even work without more clever tricks. :)>> So I don't think the current design allows us to do that easily or at >> all, and even if it does, it will be at a significant cost in code >> clarity. There is also the fact that I am considering making the >> firmware member of `Gpu` a trait object: the boot sequence is so >> different between pre and post-Hopper that I don't think it makes sense >> to share the same `Firmware` structure between the two. I would rather >> see `Firmware` as an opaque trait object, which provides high-level >> methods such as "start GSP" behind which the specifics of each GPU >> family are hidden. If we go with this design, `Firmware` will become a >> trait object and so cannot be pinned into `Gpu`. >> >> This doesn't change my observation that `Gpu::new` should not IMHO do >> steps like booting the GSP - it should just acquire the resources it >> needs, return the pinned GPU object, and then `probe` can continue the >> boot sequence. Having the GPU object pinned and constructed early >> simplifies things quite a bit as the more we progress with boot, the >> harder it becomes to construct everything in place (and the `PinInit` >> closure also becomes more and more complex). >> >> I'm still laying down the general design, but I'm pretty convinced that >> having `Firmware` as a trait object is the right way to abstract the >> differences between GPU families. > > Makes sense, it's fine with me to keep this in its separate allocation for the > purpose of making Firmware an opaque trait object, which sounds reasonable. > > But we should really properly separate construction of the GPU structure from > firmware boot code execution as you say. And actually move the construction of > the GPU object into try_pin_init!().Yes, and I'm glad you agree with this idea as the current design of putting everything inside the GPU is a bit limiting. Regarding the firmware, I also think this should undergo a redesign - right now we are putting unrelated stuff inside the `Firmware` structure, and this won't scale well when we start supporting Hopper+ which use a very different way to start the GSP. I'll give more details in my review of Alistair's series, and hopefully send a v3 with some of these ideas implemented soon.