Alexandre Courbot
2025-May-01 00:09 UTC
[PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
On Thu May 1, 2025 at 3:16 AM JST, Danilo Krummrich wrote:> On Wed, Apr 30, 2025 at 10:38:11AM -0400, Joel Fernandes wrote: >> On 4/30/2025 9:25 AM, Alexandre Courbot wrote: >> > On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote: >> >> >>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`. >> >>> +/// >> >>> +/// We use this function and a heap-allocated trait object instead of statically defined trait >> >>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the >> >>> +/// requested HAL. >> >> >> >> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is >> >> relevant to FalconHal impls? >> >> >> >> Can't we do something like I do in the following example [1]? >> >> >> >> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2 >> > >> > So are you have noticed there are two dimensions from which the falcons >> > can be instantiated: >> > >> > - The engine, which determines its register BASE, >> > - The HAL, which is determined by the chipset. >> > >> > For the engine, I want to keep things static for the main reason that if >> > BASE was dynamic, we would have to do all our IO using >> > try_read()/try_write() and check for an out-of-bounds error at each >> > register access. The cost of monomorphization is limited as there are >> > only a handful of engines. >> > >> > But the HAL introduces a second dimension to this, and if we support N >> > engines then the amount of monomorphized code would then increase by N >> > for each new HAL we add. Chipsets are released at a good cadence, so >> > this is the dimension that risks growing the most. > > I agree, avoiding the dynamic dispatch is probably not worth in this case > considering the long term. However, I wanted to point out an alternative with > [2]. > >> > It is also the one that makes use of methods to abstract things (vs. >> > fixed parameters), so it is a natural candidate for using virtual >> > methods. I am not a fan of having ever-growing boilerplate match >> > statements for each method that needs to be abstracted, especially since >> > this is that virtual methods do without requiring extra code, and for a >> > runtime penalty that is completely negligible in our context and IMHO >> > completely balanced by the smaller binary size that results from their >> > use. >> >> Adding to what Alex said, note that the runtime cost is still there even without >> using dyn. Because at runtime, the match conditionals need to route function >> calls to the right place. > > Honestly, I don't know how dynamic dispatch scales compared to static dispatch > with conditionals. > > OOC, I briefly looked for a benchmark and found [3], which doesn't look > unreasonable at a first glance. > > I modified it real quick to have more than 2 actions. [4] > > 2 Actions > --------- > Dynamic Dispatch: time: [2.0679 ns 2.0825 ns 2.0945 ns] > Static Dispatch: time: [850.29 ps 851.05 ps 852.36 ps] > > 20 Actions > ---------- > Dynamic Dispatch: time: [21.368 ns 21.827 ns 22.284 ns] > Static Dispatch: time: [1.3623 ns 1.3703 ns 1.3793 ns] > > 100 Actions > ----------- > Dynamic Dispatch: time: [103.72 ns 104.33 ns 105.13 ns] > Static Dispatch: time: [4.5905 ns 4.6311 ns 4.6775 ns] > > Absolutely take it with a grain of salt, I neither spend a lot of brain power > nor time on this, which usually is not a great combination with benchmarking > things. :) > > However, I think it's probably not too important here. Hence, feel free to go > with dynamic dispatch for this.Indeed, it looks like the cost of dispatch will be completely shadowed by the IO behind it anyway. And these HAL calls are like a few here and there anyway, it's not like they are on a critical path.> >> I am just not seeing the benefits of not using dyn for >> this use case and only drawbacks. IMHO, we should try to not be doing the >> compiler's job. >> >> Maybe the only benefit is you don't need an Arc or Kbox wrapper? > > That's not a huge concern for me, it's only one single allocation per Engine, > correct?Correct. Note that for other engines we will be able to store the HALs as static singletons instead of building them on the heap like I am currently doing. The reason for doing this on falcon is that the dual-dimension of the instances makes it more complex to build and look them up. ... or maybe I could just use a macro? Let me try that and see whether it works.
Joel Fernandes
2025-May-01 00:22 UTC
[PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
Hi Alex, On 4/30/2025 8:09 PM, Alexandre Courbot wrote:>>> I am just not seeing the benefits of not using dyn for >>> this use case and only drawbacks. IMHO, we should try to not be doing the >>> compiler's job. >>> >>> Maybe the only benefit is you don't need an Arc or Kbox wrapper? >> That's not a huge concern for me, it's only one single allocation per Engine, >> correct? > Correct. Note that for other engines we will be able to store the HALs as > static singletons instead of building them on the heap like I am > currently doing. The reason for doing this on falcon is that the > dual-dimension of the instances makes it more complex to build and look > them up. > > ... or maybe I could just use a macro? Let me try that and see whether > it works.Do you mean a macro for create_falcon_hal which adds an entry to this? let hal = match chipset { Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 |Chipset::GA107 => { .. } Actually it would be nice if a single macro defined both a chipset and created the hal together in the above list, that way the definition of a "chipset" is in a singe place. Kind of like what I did in the vbios patch for various BiosImage. But not sure how easy it is to do for Falcon. Or perhaps you meant a macro that statically allocates the Engine + HAL combination, and avoids need for Arc/KBox and their corresponding allocations? thanks, - Joel
Alexandre Courbot
2025-May-01 14:07 UTC
[PATCH 11/16] gpu: nova-core: add falcon register definitions and base code
On Thu May 1, 2025 at 9:22 AM JST, Joel Fernandes wrote:> Hi Alex, > > On 4/30/2025 8:09 PM, Alexandre Courbot wrote: >>>> I am just not seeing the benefits of not using dyn for >>>> this use case and only drawbacks. IMHO, we should try to not be doing the >>>> compiler's job. >>>> >>>> Maybe the only benefit is you don't need an Arc or Kbox wrapper? >>> That's not a huge concern for me, it's only one single allocation per Engine, >>> correct? >> Correct. Note that for other engines we will be able to store the HALs as >> static singletons instead of building them on the heap like I am >> currently doing. The reason for doing this on falcon is that the >> dual-dimension of the instances makes it more complex to build and look >> them up. >> >> ... or maybe I could just use a macro? Let me try that and see whether >> it works. > > Do you mean a macro for create_falcon_hal which adds an entry to this? > > let hal = match chipset { > Chipset::GA102 | Chipset::GA103 | Chipset::GA104 | Chipset::GA106 > |Chipset::GA107 => { .. } > > > Actually it would be nice if a single macro defined both a chipset and created > the hal together in the above list, that way the definition of a "chipset" is in > a singe place. Kind of like what I did in the vbios patch for various BiosImage. > But not sure how easy it is to do for Falcon. > > Or perhaps you meant a macro that statically allocates the Engine + HAL > combination, and avoids need for Arc/KBox and their corresponding allocations?I was thinking of a macro to create all the Chipset * Engine static instances of HALs, and generate the body of a lookup function to return the right one for a given Chipset at runtime. But trying to write it, I realized it wasn't as easy as I thought since generics cannot be used as macro parameters - i.e. if you have <E: Engine> as a generic and pass `E` to the macro, it will see... `E` and not whatever was bound to `E` when the code is monomorphized (macros are expanded before generics it seems). A solution to that would involve new traits and a bunch of boilerplate, which I have decided is not worth the trouble to save one 8-bytes object on the heap per falcon instance. :) I'll keep things as they currently are for now.