Alexandre Courbot
2025-Sep-03 12:29 UTC
[PATCH v3 02/11] gpu: nova-core: move GSP boot code out of `Gpu` constructor
On Wed Sep 3, 2025 at 8:05 PM JST, Danilo Krummrich wrote:> On Wed Sep 3, 2025 at 12:44 PM CEST, Alexandre Courbot wrote: >> On Wed Sep 3, 2025 at 5:26 PM JST, Danilo Krummrich wrote: >>> On Wed Sep 3, 2025 at 9:08 AM CEST, Alexandre Courbot wrote: >>>> On Wed Sep 3, 2025 at 4:53 AM JST, Danilo Krummrich wrote: >>>>> On Tue Sep 2, 2025 at 4:31 PM CEST, Alexandre Courbot wrote: >>>>>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs >>>>>> index 274989ea1fb4a5e3e6678a08920ddc76d2809ab2..1062014c0a488e959379f009c2e8029ffaa1e2f8 100644 >>>>>> --- a/drivers/gpu/nova-core/driver.rs >>>>>> +++ b/drivers/gpu/nova-core/driver.rs >>>>>> @@ -6,6 +6,8 @@ >>>>>> >>>>>> #[pin_data] >>>>>> pub(crate) struct NovaCore { >>>>>> + // Placeholder for the real `Gsp` object once it is built. >>>>>> + pub(crate) gsp: (), >>>>>> #[pin] >>>>>> pub(crate) gpu: Gpu, >>>>>> _reg: auxiliary::Registration, >>>>>> @@ -40,8 +42,14 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self >>>>>> )?; >>>>>> >>>>>> let this = KBox::pin_init( >>>>>> - try_pin_init!(Self { >>>>>> + try_pin_init!(&this in Self { >>>>>> gpu <- Gpu::new(pdev, bar)?, >>>>>> + gsp <- { >>>>>> + // SAFETY: `this.gpu` is initialized to a valid value. >>>>>> + let gpu = unsafe { &(*this.as_ptr()).gpu }; >>>>>> + >>>>>> + gpu.start_gsp(pdev)? >>>>>> + }, >>>>> >>>>> Please use pin_chain() [1] for this. >>>> >>>> Sorry, but I couldn't figure out how I can use pin_chain here (and >>>> couldn't find any relevant example in the kernel code either). Can you >>>> elaborate a bit? >>> >>> I thought of just doing the following, which I think should be equivalent (diff >>> against current nova-next). >>> >>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs >>> index 274989ea1fb4..6d62867f7503 100644 >>> --- a/drivers/gpu/nova-core/driver.rs >>> +++ b/drivers/gpu/nova-core/driver.rs >>> @@ -41,7 +41,9 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self >>> >>> let this = KBox::pin_init( >>> try_pin_init!(Self { >>> - gpu <- Gpu::new(pdev, bar)?, >>> + gpu <- Gpu::new(pdev, bar)?.pin_chain(|gpu| { >>> + gpu.start_gsp(pdev) >>> + }), >>> _reg: auxiliary::Registration::new( >>> pdev.as_ref(), >>> c_str!("nova-drm"), >>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs >>> index 8caecaf7dfb4..211bc1a5a5b3 100644 >>> --- a/drivers/gpu/nova-core/gpu.rs >>> +++ b/drivers/gpu/nova-core/gpu.rs >>> @@ -266,7 +266,7 @@ fn run_fwsec_frts( >>> pub(crate) fn new( >>> pdev: &pci::Device<device::Bound>, >>> devres_bar: Arc<Devres<Bar0>>, >>> - ) -> Result<impl PinInit<Self>> { >>> + ) -> Result<impl PinInit<Self, Error>> { >>> let bar = devres_bar.access(pdev.as_ref())?; >>> let spec = Spec::new(bar)?; >>> let fw = Firmware::new(pdev.as_ref(), spec.chipset, FIRMWARE_VERSION)?; >>> @@ -302,11 +302,16 @@ pub(crate) fn new( >>> >>> Self::run_fwsec_frts(pdev.as_ref(), &gsp_falcon, bar, &bios, &fb_layout)?; >>> >>> - Ok(pin_init!(Self { >>> + Ok(try_pin_init!(Self { >>> spec, >>> bar: devres_bar, >>> fw, >>> sysmem_flush, >>> })) >>> } >>> + >>> + pub(crate) fn start_gsp(&self, _pdev: &pci::Device<device::Core>) -> Result { >>> + // noop >>> + Ok(()) >>> + } >>> } >>> >>> But maybe it doesn't capture your intend? >> >> The issue is that `start_gsp` returns a value (currently a placeholder >> `()`, but it will change into a real type) that needs to be stored into >> the newly-introduced `gsp` member of `NovaCore`. I could not figure how >> how `pin_chain` could help with this (and this is the same problem for >> the other `unsafe` statements in `firmware/gsp.rs`). > > Ok, I see, I think Benno is already working on a solution to access previously > initialized fields from subsequent initializers. > > @Benno: What's the status of this? I haven't seen an issue for that in the > pin-init GitHub repo, should we create one? > > However, in this case I'm a bit confused why we want Gsp next to Gpu? Why not > just make Gsp a member of Gpu then?To be honest I am not completely sure about the best layout yet and will need more visibility to understand whether this is optimal. But considering that we want to run the GSP boot process over a built `Gpu` instance, we cannot store the result of said process inside `Gpu` unless we put it inside e.g. an `Option`. But then the variant will always be `Some` after `probe` returns, and yet we will have to perform a match every time we want to access it. The current separation sounds reasonable to me for the time being, with `Gpu` containing purely hardware resources obtained without help from user-space, while `Gsp` is the result of running a bunch of firmwares. An alternative design would be to store `Gpu` inside `Gsp`, but `Gsp` inside `Gpu` is trickier due to the build order. No matter what we do, switching the layout later should be trivial if we don't choose the best one now. There is also an easy workaround to the sibling initialization issue, which is to store `Gpu` and `Gsp` behind `Pin<KBox>` - that way we can initialize both outside `try_pin_init!`, at the cost of two more heap allocations over the whole lifetime of the device. If we don't have a proper solution to the problem now, this might be better than using `unsafe` as a temporary solution. The same workaround could also be used for to `GspFirmware` and its page tables - since `GspFirmware` is temporary and can apparently be discarded after the GSP is booted, this shouldn't be a big issue. This will allow the driver to probe, and we can add TODO items to fix that later if a solution is in sight.> > I thought the intent was to keep temporary values local to start_gsp() and not > store them next to Gpu in the same allocation?It is not visible in the current patchset, but `start_gsp` will eventually return the runtime data of the GSP - notably its log buffers and command queue, which are needed to operate it. All the rest (notably the loaded firmwares) will be local to `start_gsp` and discarded upon its return.
Danilo Krummrich
2025-Sep-03 14:53 UTC
[PATCH v3 02/11] gpu: nova-core: move GSP boot code out of `Gpu` constructor
On Wed Sep 3, 2025 at 2:29 PM CEST, Alexandre Courbot wrote:> To be honest I am not completely sure about the best layout yet and will > need more visibility to understand whether this is optimal. But > considering that we want to run the GSP boot process over a built `Gpu` > instance, we cannot store the result of said process inside `Gpu` unless > we put it inside e.g. an `Option`. But then the variant will always be > `Some` after `probe` returns, and yet we will have to perform a match > every time we want to access it. > > The current separation sounds reasonable to me for the time being, with > `Gpu` containing purely hardware resources obtained without help from > user-space, while `Gsp` is the result of running a bunch of firmwares. > An alternative design would be to store `Gpu` inside `Gsp`, but `Gsp` > inside `Gpu` is trickier due to the build order. No matter what we do, > switching the layout later should be trivial if we don't choose the > best one now.Gsp should be part of the Gpu object. The Gpu object represents the entire instance of the Gpu, including hardware ressources, firmware runtime state, etc. The initialization of the Gsp structure doesn't really need a Gpu structure to be constructed, it needs certain members of the Gpu structure, i.e. order of initialization of the members does matter. If it makes things more obvious we can always create new types and increase the hierarchy within the Gpu struct itself. The technical limitation you're facing is always the same, no matter the layout we choose: we need pin-init to provide us references to already initialized members. I will check with Benno in today's Rust call what's the best way to address this.> There is also an easy workaround to the sibling initialization issue, > which is to store `Gpu` and `Gsp` behind `Pin<KBox>` - that way we can > initialize both outside `try_pin_init!`, at the cost of two more heap > allocations over the whole lifetime of the device. If we don't have a > proper solution to the problem now, this might be better than using > `unsafe` as a temporary solution.Yeah, this workaround is much easier to implement when they're siblings (less allocations temporarily), but let's not design things this way because of that. As mentioned above, I will check with Benno today.> The same workaround could also be used for to `GspFirmware` and its page > tables - since `GspFirmware` is temporary and can apparently be > discarded after the GSP is booted, this shouldn't be a big issue. This > will allow the driver to probe, and we can add TODO items to fix that > later if a solution is in sight. > >> >> I thought the intent was to keep temporary values local to start_gsp() and not >> store them next to Gpu in the same allocation? > > It is not visible in the current patchset, but `start_gsp` will > eventually return the runtime data of the GSP - notably its log buffers > and command queue, which are needed to operate it. All the rest (notably > the loaded firmwares) will be local to `start_gsp` and discarded upon > its return.Ok, that makes sense, but it should really be part of the Gpu structure.