Timur Tabi
2025-Nov-02 18:14 UTC
[PATCH v4 3/3] gpu: nova-core: add boot42 support for next-gen GPUs
On Sat, 2025-11-01 at 18:36 -0700, John Hubbard wrote:> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain > architecture and revision details, and will instead use NV_PMC_BOOT_42 > in the future. NV_PMC_BOOT_0 will be zeroed out.You missed this one. Boot0 will not be completely zeroed out.> > ? > +impl TryFrom<regs::NV_PMC_BOOT_42> for Spec { > +??? type Error = Error; > + > +??? fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> { > +??????? Ok(Self { > +??????????? chipset: boot42.chipset()?, > +??????????? revision: boot42.revision(), > +??????? }) > +??? } > +} > + > ?impl fmt::Display for Revision { > ???? fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { > ???????? write!(f, "{:x}.{:x}", self.major, self.minor) > @@ -169,9 +180,34 @@ pub(crate) struct Spec { > ? > ?impl Spec { > ???? fn new(bar: &Bar0) -> Result<Spec> { > +??????? // Some brief notes about boot0 and boot42, in chronological order: > +??????? // > +??????? // NV04 through Volta: > +??????? // > +??????? //??? Not supported by Nova. boot0 is necessary and sufficient to identify these > GPUs. > +??????? //??? boot42 may not even exist on some of these GPUs.boot42Did you intend to write more than just "boot42" at the end here?> +??????? // > +??????? // Turing through Blackwell: > +??????? // > +??????? //???? Supported by both Nouveau and Nova. boot0 is still necessary and sufficient to > +??????? //???? identify these GPUs. boot42 exists on these GPUs but we don't need to use it. > +??????? // > +??????? // Rubin: > +??????? // > +??????? //???? Only supported by Nova. Need to use boot42 to fully identify these GPUs. > +??????? // > +??????? // "Future" (after Rubin) GPUs: > +??????? // > +??????? //??? Only supported by Nova. NV_PMC_BOOT's ARCH_0 (bits 28:24) will be zeroed out, > and > +??????? //??? ARCH_1 (bit 8:8) will be set to 1, which will mean, "refer to NV_PMC_BOOT_42". > + > ???????? let boot0 = regs::NV_PMC_BOOT_0::read(bar); > ? > -??????? Spec::try_from(boot0) > +??????? if boot0.use_boot42_instead() { > +??????????? Spec::try_from(regs::NV_PMC_BOOT_42::read(bar)) > +??????? } else { > +??????????? Spec::try_from(boot0) > +??????? } > ???? } > ?} > ? > diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs > index 207b865335af..8b5ff3858210 100644 > --- a/drivers/gpu/nova-core/regs.rs > +++ b/drivers/gpu/nova-core/regs.rs > @@ -25,6 +25,13 @@ > ?}); > ? > ?impl NV_PMC_BOOT_0 { > +??? pub(crate) fn use_boot42_instead(self) -> bool { > +??????? // "Future" GPUs (some time after Rubin) will set `architecture_0` > +??????? // to 0, and `architecture_1` to 1, and put the architecture details in > +??????? // boot42 instead. > +??????? self.architecture_0() == 0 && self.architecture_1() == 1 > +??? }So this was the crux of my initial objection, and I just don't think this is truly "forward looking". The code is using boot42 only if boot0 is "zeroed out". ?So sometimes Nova will use boot0 and sometimes it will use boot42, depending on the GPU. It's this inconsistency that bothers me. Instead, I think Nova should use only boot42, so that we have consistent information across all GPUs. boot0 should only be used to avoid accidentally reading boot42 when it doesn't exist. Previously, Danilo said this:> I think you're indeed talking about the same thing, but thinking differently > about the implementation details. > > A standalone is_ancient_gpu() function called from probe() like > > if is_ancient_gpu(bar) { > return Err(ENODEV); > } > > is what we would probably do in C, but in Rust we should just call > > let spec = Spec::new()?; > > from probe() and Spec::new() will return Err(ENODEV) when it run into an ancient > GPU spec internally.This I agree with. The first thing that Spec::new() should do is check whether we're on an ancient GPU that does not even have boot42. If so, return Err(ENODEV). Otherwise, from that point onward, no code will ever look at boot0 again. boot0 should never be used to return the actual architecture/gpu information.
John Hubbard
2025-Nov-03 01:04 UTC
[PATCH v4 3/3] gpu: nova-core: add boot42 support for next-gen GPUs
On 11/2/25 10:14 AM, Timur Tabi wrote:> On Sat, 2025-11-01 at 18:36 -0700, John Hubbard wrote: >> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain >> architecture and revision details, and will instead use NV_PMC_BOOT_42 >> in the future. NV_PMC_BOOT_0 will be zeroed out. > > You missed this one. Boot0 will not be completely zeroed out. >Thanks for catching that, I'll write it like the other case.>> >> >> +impl TryFrom<regs::NV_PMC_BOOT_42> for Spec { >> +??? type Error = Error; >> + >> +??? fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> { >> +??????? Ok(Self { >> +??????????? chipset: boot42.chipset()?, >> +??????????? revision: boot42.revision(), >> +??????? }) >> +??? } >> +} >> + >> ?impl fmt::Display for Revision { >> ???? fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { >> ???????? write!(f, "{:x}.{:x}", self.major, self.minor) >> @@ -169,9 +180,34 @@ pub(crate) struct Spec { >> >> ?impl Spec { >> ???? fn new(bar: &Bar0) -> Result<Spec> { >> +??????? // Some brief notes about boot0 and boot42, in chronological order: >> +??????? // >> +??????? // NV04 through Volta: >> +??????? // >> +??????? //??? Not supported by Nova. boot0 is necessary and sufficient to identify these >> GPUs. >> +??????? //??? boot42 may not even exist on some of these GPUs.boot42 > > Did you intend to write more than just "boot42" at the end here?Nope, that's just an odd typo fragment that I need to delete, thanks for spotting it. ...>> ???????? let boot0 = regs::NV_PMC_BOOT_0::read(bar); >> >> -??????? Spec::try_from(boot0) >> +??????? if boot0.use_boot42_instead() { >> +??????????? Spec::try_from(regs::NV_PMC_BOOT_42::read(bar)) >> +??????? } else { >> +??????????? Spec::try_from(boot0) >> +??????? } >> ???? } >> ?} >> >> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs >> index 207b865335af..8b5ff3858210 100644 >> --- a/drivers/gpu/nova-core/regs.rs >> +++ b/drivers/gpu/nova-core/regs.rs >> @@ -25,6 +25,13 @@ >> ?}); >> >> ?impl NV_PMC_BOOT_0 { >> +??? pub(crate) fn use_boot42_instead(self) -> bool { >> +??????? // "Future" GPUs (some time after Rubin) will set `architecture_0` >> +??????? // to 0, and `architecture_1` to 1, and put the architecture details in >> +??????? // boot42 instead. >> +??????? self.architecture_0() == 0 && self.architecture_1() == 1 >> +??? } > > So this was the crux of my initial objection, and I just don't think this is truly "forward > looking". The code is using boot42 only if boot0 is "zeroed out". ?So sometimes Nova will useTo put it another way: the code is only using boot42 if boot0 is encoded, by the HW team, to go read boot42. As you know, the future ref manual literally says "go read boot42."> boot0 and sometimes it will use boot42, depending on the GPU. It's this inconsistency that > bothers me. > > Instead, I think Nova should use only boot42, so that we have consistent information across all > GPUs. boot0 should only be used to avoid accidentally reading boot42 when it doesn't exist.I am convinced that the most appropriate thing for a device driver to do is to match what the HW configuration says. We should draw the dividing line at the changeover point, which is in an upcoming ref manual. Once boot0 has the encoding set to "go read boot42", the driver does that. Until then, HW promises that boot0 is correct. It may look all nice and neat to use "Nova is a new driver" to pick the point to change, but again, it's more accurate and appropriate for a device driver to follow HW's lead, and use what boot0 says to do.> > Previously, Danilo said this: > >> I think you're indeed talking about the same thing, but thinking differently >> about the implementation details. >> >> A standalone is_ancient_gpu() function called from probe() like >> >> if is_ancient_gpu(bar) { >> return Err(ENODEV); >> } >> >> is what we would probably do in C, but in Rust we should just call >> >> let spec = Spec::new()?; >> >> from probe() and Spec::new() will return Err(ENODEV) when it run into an ancient >> GPU spec internally. > > This I agree with. The first thing that Spec::new() should do is check whether we're on an > ancient GPU that does not even have boot42. If so, return Err(ENODEV). Otherwise, from that > point onward, no code will ever look at boot0 again. boot0 should never be used to return the > actual architecture/gpu information. >I don't think we have a conflict on this point, if you read through how the code works. The only difference is the point I wrote about above. I'm hoping you'll allow me to proceed with that. thanks, -- John Hubbard