Alexandre Courbot
2025-Mar-04 13:53 UTC
[RFC PATCH v2 0/5] gpu: nova-core: register definitions and basic timer and falcon devices
Hi everyone, This RFC is based on top of Danilo's initial driver stub series v4 [1] and adds very basic support for the Timer and Falcon devices, in order to see and "feel" the proposed register access abstractions and discuss them before moving forward with GSP initialization. It is kept simple and short on purpose, to avoid bumping into a wall with much more device code because my assumptions were incorrect. The main addition is the nv_reg!() register definition macro, which aims at providing safe and convenient access to all useful registers and their fields. I elaborate on its definition in the patch that introduces it ; it is also probably better to look at all the register definitions to understand how it can be used, and the services it provides. Right now it provides accessors and builders for all the fields of a register. It will probably need to be extended with more operations as we deem them useful. The timer device has not changed much from v1, with the exception of having its own Timestamp type to easily obtain Durations between two samples. The falcon implementation is still super incomplete, and just designed to illustrate how the register macros can be used. I have more progress in a private branch, but want to keep the focus on the nv_reg!() macro for this review since the rest will ultimately depend on it. It would be charitable to say that my Rust macro skills are lacking ; so please point out any deficiency in its definition. I am also not entirely sure about the syntax for register definition - I would like to keep things simple and close to OpenRM (notably for the mask definitions) to make it easier to port definition from it into Nova. [1] https://lore.kernel.org/nouveau/20250226175552.29381-1-dakr at kernel.org/T/ Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- Changes in v2: - Don't hold the Bar guard in methods that can sleep. - Added a Timestamp type for Timer to safely and easily get durations between two measurements. - Added a macro to make register definitions easier. - Added a very basic falcon implementation to define more registers and exercise the register definition macro. - Link to v1: https://lore.kernel.org/r/20250217-nova_timer-v1-0-78c5ace2d987 at nvidia.com --- Alexandre Courbot (5): rust: add useful ops for u64 rust: make ETIMEDOUT error available gpu: nova-core: add register definition macro gpu: nova-core: add basic timer device gpu: nova-core: add falcon register definitions and probe code drivers/gpu/nova-core/driver.rs | 4 +- drivers/gpu/nova-core/falcon.rs | 124 +++++++++++++++ drivers/gpu/nova-core/gpu.rs | 70 ++++++++- drivers/gpu/nova-core/nova_core.rs | 2 + drivers/gpu/nova-core/regs.rs | 311 ++++++++++++++++++++++++++++++++----- drivers/gpu/nova-core/timer.rs | 124 +++++++++++++++ rust/kernel/error.rs | 1 + rust/kernel/lib.rs | 1 + rust/kernel/num.rs | 43 +++++ 9 files changed, 639 insertions(+), 41 deletions(-) --- base-commit: 3ac10b625b709d59556cd2c1bf8a009c2bfdbefc change-id: 20250216-nova_timer-c69430184f54 Best regards, -- Alexandre Courbot <acourbot at nvidia.com>
It is common to build a u64 from its high and low parts obtained from two 32-bit registers. Conversely, it is also common to split a u64 into two u32s to write them into registers. Add an extension trait for u64 that implement these methods in a new `num` module. It is expected that this trait will be extended with other useful operations, and similar extension traits implemented for other types. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- rust/kernel/lib.rs | 1 + rust/kernel/num.rs | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index 8e76ef9b4346956009a936b1317f7474a83c8dbd..caee059249cf56993d5db698a876f040eda33dd5 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -61,6 +61,7 @@ pub mod miscdevice; #[cfg(CONFIG_NET)] pub mod net; +pub mod num; pub mod of; pub mod page; #[cfg(CONFIG_PCI)] diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs new file mode 100644 index 0000000000000000000000000000000000000000..f03c82f13643412cc13b0b841dfdf3b06490926d --- /dev/null +++ b/rust/kernel/num.rs @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Numerical and binary utilities for primitive types. + +/// Useful operations for `u64`. +pub trait U64Ext { + /// Build a `u64` by combining its `high` and `low` parts. + /// + /// ``` + /// use kernel::num::U64Ext; + /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef), 0x01234567_89abcdef); + /// ``` + fn from_u32s(high: u32, low: u32) -> Self; + + fn upper_32_bits(self) -> u32; + fn lower_32_bits(self) -> u32; +} + +impl U64Ext for u64 { + fn from_u32s(high: u32, low: u32) -> Self { + ((high as u64) << u32::BITS) | low as u64 + } + + fn upper_32_bits(self) -> u32 { + (self >> u32::BITS) as u32 + } + + fn lower_32_bits(self) -> u32 { + self as u32 + } +} + +pub const fn upper_32_bits(v: u64) -> u32 { + (v >> u32::BITS) as u32 +} + +pub const fn lower_32_bits(v: u64) -> u32 { + v as u32 +} + +pub const fn u32s_to_u64(high: u32, low: u32) -> u64 { + ((high as u64) << u32::BITS) | low as u64 +} -- 2.48.1
Alexandre Courbot
2025-Mar-04 13:53 UTC
[PATCH RFC v2 2/5] rust: make ETIMEDOUT error available
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- rust/kernel/error.rs | 1 + 1 file changed, 1 insertion(+) diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs index 1e510181432cceae46219f7ed3597a88b85ebe0a..475d14a4830774aa7717d3b5e70c7ff9de203dc2 100644 --- a/rust/kernel/error.rs +++ b/rust/kernel/error.rs @@ -65,6 +65,7 @@ macro_rules! declare_err { declare_err!(EDOM, "Math argument out of domain of func."); declare_err!(ERANGE, "Math result not representable."); declare_err!(EOVERFLOW, "Value too large for defined data type."); + declare_err!(ETIMEDOUT, "Connection timed out."); declare_err!(ERESTARTSYS, "Restart the system call."); declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted."); declare_err!(ERESTARTNOHAND, "Restart if no handler."); -- 2.48.1
Alexandre Courbot
2025-Mar-04 13:53 UTC
[PATCH RFC v2 3/5] gpu: nova-core: add register definition macro
Register data manipulation is one of the error-prone areas of a kernel driver. It is particularly easy to mix addresses of registers, masks and shifts of fields, and to proceed with invalid values. This patch introduces the nv_reg!() macro, which creates a safe type definition for a given register, along with field accessors and value builder. The macro is designed to type the same field ranges as the NVIDIA OpenRM project, to facilitate porting its register definitions to Nova. Here is for instance the definition of the Boot0 register: nv_reg!(Boot0 at 0x00000000, "Basic revision information about the GPU"; 3:0 minor_rev as (u8), "minor revision of the chip"; 7:4 major_rev as (u8), "major revision of the chip"; 25:20 chipset try_into (Chipset), "chipset model" ); This definition creates a Boot0 type that includes read() and write() methods that will automatically use the correct register offset (0x0 in this case). Creating a type for each register lets us leverage the type system to make sure register values don't get mix up. It also allows us to create register-specific field extractor methods (here minor_rev(), major_rev(), and chipset()) that present each field in a convenient way and validate its data if relevant. The chipset() accessor, in particular, uses the TryFrom<u32> implementation of Chipset to build a Chipset instance and returns its associated error type if the conversion has failed because of an invalid value. The ending string at the end of each line is optional, and expands to doc comments for the type itself, or each of the field accessors. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drivers/gpu/nova-core/gpu.rs | 2 +- drivers/gpu/nova-core/regs.rs | 195 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 158 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs index 7693a5df0dc11f208513dc043d8c99f85c902119..58b97c7f0b2ab1edacada8346b139f6336b68272 100644 --- a/drivers/gpu/nova-core/gpu.rs +++ b/drivers/gpu/nova-core/gpu.rs @@ -164,7 +164,7 @@ fn new(bar: &Devres<Bar0>) -> Result<Spec> { let boot0 = regs::Boot0::read(&bar); Ok(Self { - chipset: boot0.chipset().try_into()?, + chipset: boot0.chipset()?, revision: Revision::from_boot0(boot0), }) } diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs index 50aefb150b0b1c9b73f07fca3b7a070885785485..a874cb2fa5bedee258a60e5c3b471f52e5f82469 100644 --- a/drivers/gpu/nova-core/regs.rs +++ b/drivers/gpu/nova-core/regs.rs @@ -1,55 +1,174 @@ // SPDX-License-Identifier: GPL-2.0 +use core::{fmt::Debug, marker::PhantomData, ops::Deref}; + use crate::driver::Bar0; +use crate::gpu::Chipset; -// TODO -// -// Create register definitions via generic macros. See task "Generic register -// abstraction" in Documentation/gpu/nova/core/todo.rst. +pub(crate) struct Builder<T>(T, PhantomData<T>); -const BOOT0_OFFSET: usize = 0x00000000; +impl<T> From<T> for Builder<T> { + fn from(value: T) -> Self { + Builder(value, PhantomData) + } +} -// 3:0 - chipset minor revision -const BOOT0_MINOR_REV_SHIFT: u8 = 0; -const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f; +impl<T: Default> Default for Builder<T> { + fn default() -> Self { + Self(Default::default(), PhantomData) + } +} -// 7:4 - chipset major revision -const BOOT0_MAJOR_REV_SHIFT: u8 = 4; -const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0; +impl<T> Deref for Builder<T> { + type Target = T; -// 23:20 - chipset implementation Identifier (depends on architecture) -const BOOT0_IMPL_SHIFT: u8 = 20; -const BOOT0_IMPL_MASK: u32 = 0x00f00000; + fn deref(&self) -> &Self::Target { + &self.0 + } +} -// 28:24 - chipset architecture identifier -const BOOT0_ARCH_MASK: u32 = 0x1f000000; +macro_rules! nv_reg_common { + ($name:ident $(, $type_comment:expr)?) => { + $( + #[doc=concat!($type_comment)] + )? + #[derive(Clone, Copy, Default)] + pub(crate) struct $name(u32); -// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and -// BOOT0_ARCH) -const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT; -const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK; + // TODO: should we display the raw hex value, then the value of all its fields? + impl Debug for $name { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + f.debug_tuple(stringify!($name)) + .field(&format_args!("0x{0:x}", &self.0)) + .finish() + } + } -#[derive(Copy, Clone)] -pub(crate) struct Boot0(u32); + impl core::ops::BitOr for $name { + type Output = Self; -impl Boot0 { - #[inline] - pub(crate) fn read(bar: &Bar0) -> Self { - Self(bar.readl(BOOT0_OFFSET)) - } + fn bitor(self, rhs: Self) -> Self::Output { + Self(self.0 | rhs.0) + } + } - #[inline] - pub(crate) fn chipset(&self) -> u32 { - (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT - } + #[allow(dead_code)] + impl $name { + /// Returns a new builder for the register. Individual fields can be set by the methods + /// of the builder, and the current value obtained by dereferencing it. + #[inline] + pub(crate) fn new() -> Builder<Self> { + Default::default() + } + } + }; +} - #[inline] - pub(crate) fn minor_rev(&self) -> u8 { - ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as u8 - } +macro_rules! nv_reg_field_accessor { + ($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(, $comment:expr)?) => { + $( + #[doc=concat!("Returns the ", $comment)] + )? + #[inline] + pub(crate) fn $field(self) -> $( $as_type )? $( $bit_type )? $( $type )? $( core::result::Result<$try_type, <$try_type as TryFrom<u32>>::Error> )? { + const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1); + const SHIFT: u32 = MASK.trailing_zeros(); + let field = (self.0 & MASK) >> SHIFT; - #[inline] - pub(crate) fn major_rev(&self) -> u8 { - ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as u8 + $( field as $as_type )? + $( + // TODO: it would be nice to throw a compile-time error if $hi != $lo as this means we + // are considering more than one bit but returning a bool... + (if field != 0 { true } else { false }) as $bit_type + )? + $( <$type>::from(field) )? + $( <$try_type>::try_from(field) )? + } } } + +macro_rules! nv_reg_field_builder { + ($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(, $comment:expr)?) => { + $( + #[doc=concat!("Sets the ", $comment)] + )? + #[inline] + pub(crate) fn $field(mut self, value: $( $as_type)? $( $bit_type )? $( $type )? $( $try_type)? ) -> Self { + const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1 << $lo) - 1); + const SHIFT: u32 = MASK.trailing_zeros(); + + let value = ((value as u32) << SHIFT) & MASK; + self.0.0 = self.0.0 | value; + self + } + }; +} + +macro_rules! nv_reg { + ( + $name:ident@$offset:expr $(, $type_comment:expr)?; + $($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(, $field_comment:expr)?);* $(;)? + ) => { + nv_reg_common!($name); + + #[allow(dead_code)] + impl $name { + #[inline] + pub(crate) fn read(bar: &Bar0) -> Self { + Self(bar.readl($offset)) + } + + #[inline] + pub(crate) fn write(self, bar: &Bar0) { + bar.writel(self.0, $offset) + } + + $( + nv_reg_field_accessor!($hi:$lo $field $(as ($as_type))? $(as_bit ($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?); + )* + } + + #[allow(dead_code)] + impl Builder<$name> { + $( + nv_reg_field_builder!($hi:$lo $field $(as ($as_type))? $(as_bit ($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?); + )* + } + }; + ( + $name:ident at +$offset:expr $(, $type_comment:expr)?; + $($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(, $field_comment:expr)?);* $(;)? + ) => { + nv_reg_common!($name); + + #[allow(dead_code)] + impl $name { + #[inline] + pub(crate) fn read(bar: &Bar0, base: usize) -> Self { + Self(bar.readl(base + $offset)) + } + + #[inline] + pub(crate) fn write(self, bar: &Bar0, base: usize) { + bar.writel(self.0, base + $offset) + } + + $( + nv_reg_field_accessor!($hi:$lo $field $(as ($as_type))? $(as_bit ($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?); + )* + } + + #[allow(dead_code)] + impl Builder<$name> { + $( + nv_reg_field_builder!($hi:$lo $field $(as ($as_type))? $(as_bit ($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?); + )* + } + }; +} + +nv_reg!(Boot0 at 0x00000000, "Basic revision information about the GPU"; + 3:0 minor_rev as (u8), "minor revision of the chip"; + 7:4 major_rev as (u8), "major revision of the chip"; + 25:20 chipset try_into (Chipset), "chipset model" +); -- 2.48.1
Alexandre Courbot
2025-Mar-04 13:54 UTC
[PATCH RFC v2 4/5] gpu: nova-core: add basic timer device
Add a basic timer device and exercise it during device probing. This first draft is probably very questionable. One point in particular which should IMHO receive attention: the generic wait_on() method aims at providing similar functionality to Nouveau's nvkm_[num]sec() macros. Since this method will be heavily used with different conditions to test, I'd like to avoid monomorphizing it entirely with each instance ; that's something that is achieved in nvkm_xsec() using functions that the macros invoke. I have tried achieving the same result in Rust using closures (kept as-is in the current code), but they seem to be monomorphized as well. Calling extra functions could work better, but looks also less elegant to me, so I am really open to suggestions here. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drivers/gpu/nova-core/driver.rs | 4 +- drivers/gpu/nova-core/gpu.rs | 58 ++++++++++++++++- drivers/gpu/nova-core/nova_core.rs | 1 + drivers/gpu/nova-core/regs.rs | 8 +++ drivers/gpu/nova-core/timer.rs | 124 +++++++++++++++++++++++++++++++++++++ 5 files changed, 193 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs index 63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7 100644 --- a/drivers/gpu/nova-core/driver.rs +++ b/drivers/gpu/nova-core/driver.rs @@ -10,7 +10,7 @@ pub(crate) struct NovaCore { pub(crate) gpu: Gpu, } -const BAR0_SIZE: usize = 8; +const BAR0_SIZE: usize = 0x9500; pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>; kernel::pci_device_table!( @@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>> GFP_KERNEL, )?; + let _ = this.gpu.test_timer(); + Ok(this) } } diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs index 58b97c7f0b2ab1edacada8346b139f6336b68272..8fa8616c0deccc7297b090fcbe74f3cda5cc9741 100644 --- a/drivers/gpu/nova-core/gpu.rs +++ b/drivers/gpu/nova-core/gpu.rs @@ -1,12 +1,16 @@ // SPDX-License-Identifier: GPL-2.0 +use kernel::device::Device; +use kernel::types::ARef; use kernel::{ device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::BStr, str::CString, }; use crate::driver::Bar0; use crate::regs; +use crate::timer::Timer; use core::fmt; +use core::time::Duration; const fn to_lowercase_bytes<const N: usize>(s: &str) -> [u8; N] { let src = s.as_bytes(); @@ -201,10 +205,12 @@ fn new(dev: &device::Device, spec: &Spec, ver: &str) -> Result<Firmware> { /// Structure holding the resources required to operate the GPU. #[pin_data] pub(crate) struct Gpu { + dev: ARef<Device>, spec: Spec, /// MMIO mapping of PCI BAR 0 bar: Devres<Bar0>, fw: Firmware, + timer: Timer, } impl Gpu { @@ -220,6 +226,56 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit< spec.revision ); - Ok(pin_init!(Self { spec, bar, fw })) + let dev = pdev.as_ref().into(); + let timer = Timer::new(); + + Ok(pin_init!(Self { + dev, + spec, + bar, + fw, + timer, + })) + } + + pub(crate) fn test_timer(&self) -> Result<()> { + let bar = self.bar.try_access().ok_or(ENXIO)?; + dev_info!(&self.dev, "testing timer subdev\n"); + dev_info!(&self.dev, "current timestamp: {}\n", self.timer.read(&bar)); + drop(bar); + + assert!(matches!( + self.timer + .wait_on(&self.bar, Duration::from_millis(10), || Some(())), + Ok(()) + )); + + let bar = self.bar.try_access().ok_or(ENXIO)?; + dev_info!( + &self.dev, + "timestamp after immediate exit: {}\n", + self.timer.read(&bar) + ); + let t1 = self.timer.read(&bar); + drop(bar); + + assert_eq!( + self.timer + .wait_on(&self.bar, Duration::from_millis(10), || Option::<()>::None), + Err(ETIMEDOUT) + ); + + let bar = self.bar.try_access().ok_or(ENXIO)?; + let t2 = self.timer.read(&bar); + assert!(t2 - t1 >= Duration::from_millis(10)); + dev_info!( + &self.dev, + "timestamp after timeout: {} ({:?})\n", + self.timer.read(&bar), + t2 - t1 + ); + drop(bar); + + Ok(()) } } diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs index 8479be2a3f31798e887228863f223d42a63bd8ca..891a93ba7656d2aa5e1fa4357d1d84ee3a054942 100644 --- a/drivers/gpu/nova-core/nova_core.rs +++ b/drivers/gpu/nova-core/nova_core.rs @@ -6,6 +6,7 @@ mod firmware; mod gpu; mod regs; +mod timer; kernel::module_pci_driver! { type: driver::NovaCore, diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs index a874cb2fa5bedee258a60e5c3b471f52e5f82469..35bbd3c0b58972de3a2478ef20f93f31c69940e7 100644 --- a/drivers/gpu/nova-core/regs.rs +++ b/drivers/gpu/nova-core/regs.rs @@ -172,3 +172,11 @@ impl Builder<$name> { 7:4 major_rev as (u8), "major revision of the chip"; 25:20 chipset try_into (Chipset), "chipset model" ); + +nv_reg!(PtimerTime0 at 0x00009400; + 31:0 lo as (u32), "low 32-bits of the timer" +); + +nv_reg!(PtimerTime1 at 0x00009410; + 31:0 hi as (u32), "high 32 bits of the timer" +); diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs new file mode 100644 index 0000000000000000000000000000000000000000..919995bf32141c568206fda165dcac6f4d4ce8b8 --- /dev/null +++ b/drivers/gpu/nova-core/timer.rs @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Nova Core Timer subdevice + +use core::fmt::Display; +use core::ops::{Add, Sub}; +use core::time::Duration; + +use kernel::devres::Devres; +use kernel::num::U64Ext; +use kernel::prelude::*; + +use crate::driver::Bar0; +use crate::regs; + +/// A timestamp with nanosecond granularity obtained from the GPU timer. +/// +/// A timestamp can also be substracted to another in order to obtain a [`Duration`]. +/// +/// TODO: add Kunit tests! +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)] +pub(crate) struct Timestamp(u64); + +impl Display for Timestamp { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + write!(f, "{}", self.0) + } +} + +impl Add<u64> for Timestamp { + type Output = Self; + + fn add(self, rhs: u64) -> Self::Output { + Timestamp(self.0.wrapping_add(rhs)) + } +} + +impl Sub for Timestamp { + type Output = Duration; + + fn sub(self, rhs: Self) -> Self::Output { + Duration::from_nanos(self.0.wrapping_sub(rhs.0)) + } +} + +pub(crate) struct Timer {} + +impl Timer { + pub(crate) fn new() -> Self { + Self {} + } + + /// Read the current timer timestamp. + pub(crate) fn read(&self, bar: &Bar0) -> Timestamp { + loop { + let hi = regs::PtimerTime1::read(bar); + let lo = regs::PtimerTime0::read(bar); + + if hi.hi() == regs::PtimerTime1::read(bar).hi() { + return Timestamp(u64::from_u32s(hi.hi(), lo.lo())); + } + } + } + + #[allow(dead_code)] + pub(crate) fn time(bar: &Bar0, time: u64) { + regs::PtimerTime1::new().hi(time.upper_32_bits()).write(bar); + regs::PtimerTime0::new().lo(time.lower_32_bits()).write(bar); + } + + /// Wait until `cond` is true or `timeout` elapsed, based on GPU time. + /// + /// When `cond` evaluates to `Some`, its return value is returned. + /// + /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without `cond` evaluating to + /// `Some`, or if the timer device is stuck for some reason. + pub(crate) fn wait_on<R, F: Fn() -> Option<R>>( + &self, + dev_bar: &Devres<Bar0>, + timeout: Duration, + cond: F, + ) -> Result<R> { + // Number of consecutive time reads after which we consider the timer frozen if it hasn't + // moved forward. + const MAX_STALLED_READS: usize = 16; + + let (mut cur_time, mut prev_time, deadline) = { + let bar = dev_bar.try_access().ok_or(ENXIO)?; + let cur_time = self.read(&bar); + let deadline = cur_time + u64::try_from(timeout.as_nanos()).unwrap_or(u64::MAX); + + (cur_time, cur_time, deadline) + }; + let mut num_reads = 0; + + loop { + if let Some(ret) = cond() { + return Ok(ret); + } + + (|| { + let bar = dev_bar.try_access().ok_or(ENXIO)?; + cur_time = self.read(&bar); + + /* Check if the timer is frozen for some reason. */ + if cur_time == prev_time { + if num_reads >= MAX_STALLED_READS { + return Err(ETIMEDOUT); + } + num_reads += 1; + } else { + if cur_time >= deadline { + return Err(ETIMEDOUT); + } + + num_reads = 0; + prev_time = cur_time; + } + + Ok(()) + })()?; + } + } +} -- 2.48.1
Alexandre Courbot
2025-Mar-04 13:54 UTC
[PATCH RFC v2 5/5] gpu: nova-core: add falcon register definitions and probe code
This is still very preliminary work, and is mostly designed to show how register fields can be turned into safe types that force us to handle invalid values. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drivers/gpu/nova-core/driver.rs | 2 +- drivers/gpu/nova-core/falcon.rs | 124 +++++++++++++++++++++++++++++++++++++ drivers/gpu/nova-core/gpu.rs | 10 +++ drivers/gpu/nova-core/nova_core.rs | 1 + drivers/gpu/nova-core/regs.rs | 108 ++++++++++++++++++++++++++++++++ 5 files changed, 244 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs index 0cd23aa306e4082405f480afc0530a41131485e7..dee5fd22eecb2ce1f4ea765338b0c1b68853b2d3 100644 --- a/drivers/gpu/nova-core/driver.rs +++ b/drivers/gpu/nova-core/driver.rs @@ -10,7 +10,7 @@ pub(crate) struct NovaCore { pub(crate) gpu: Gpu, } -const BAR0_SIZE: usize = 0x9500; +const BAR0_SIZE: usize = 0x1000000; pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>; kernel::pci_device_table!( diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs new file mode 100644 index 0000000000000000000000000000000000000000..5f8496ed1f91ccd19c0c7716440cbc795a7a025f --- /dev/null +++ b/drivers/gpu/nova-core/falcon.rs @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Falcon microprocessor base support + +use core::hint::unreachable_unchecked; +use kernel::devres::Devres; +use kernel::{pci, prelude::*}; + +use crate::driver::Bar0; +use crate::regs::{FalconCpuCtl, FalconHwCfg1}; + +#[repr(u8)] +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)] +pub(crate) enum FalconCoreRev { + Rev1 = 1, + Rev2 = 2, + Rev3 = 3, + Rev4 = 4, + Rev5 = 5, + Rev6 = 6, + Rev7 = 7, +} + +impl TryFrom<u32> for FalconCoreRev { + type Error = Error; + + fn try_from(value: u32) -> core::result::Result<Self, Self::Error> { + use FalconCoreRev::*; + + let rev = match value { + 1 => Rev1, + 2 => Rev2, + 3 => Rev3, + 4 => Rev4, + 5 => Rev5, + 6 => Rev6, + 7 => Rev7, + _ => return Err(EINVAL), + }; + + Ok(rev) + } +} + +#[repr(u8)] +#[derive(Debug, Copy, Clone)] +pub(crate) enum FalconSecurityModel { + None = 0, + Light = 2, + Heavy = 3, +} + +impl TryFrom<u32> for FalconSecurityModel { + type Error = Error; + + fn try_from(value: u32) -> core::result::Result<Self, Self::Error> { + use FalconSecurityModel::*; + + let sec_model = match value { + 0 => None, + 2 => Light, + 3 => Heavy, + _ => return Err(EINVAL), + }; + + Ok(sec_model) + } +} + +#[repr(u8)] +#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)] +pub(crate) enum FalconCoreRevSubversion { + Subversion0 = 0, + Subversion1 = 1, + Subversion2 = 2, + Subversion3 = 3, +} + +impl From<u32> for FalconCoreRevSubversion { + fn from(value: u32) -> Self { + use FalconCoreRevSubversion::*; + + match value & 0b11 { + 0 => Subversion0, + 1 => Subversion1, + 2 => Subversion2, + 3 => Subversion3, + // SAFETY: the `0b11` mask limits the possible values to `0..=3`. + 4..=u32::MAX => unsafe { unreachable_unchecked() }, + } + } +} + +/// Contains the base parameters common to all Falcon instances. +#[derive(Debug)] +pub(crate) struct Falcon { + /// Base IO address. + base: usize, +} + +impl Falcon { + pub(crate) fn new(pdev: &pci::Device, bar: &Devres<Bar0>, base: usize) -> Result<Self> { + let b = bar.try_access().ok_or(ENXIO)?; + + let hwcfg1 = FalconHwCfg1::read(&b, base); + let rev = hwcfg1.core_rev()?; + let subver = hwcfg1.core_rev_subversion(); + let sec_model = hwcfg1.security_model()?; + + dev_info!( + pdev.as_ref(), + "new falcon: {:?} {:?} {:?}", + rev, + subver, + sec_model + ); + + Ok(Self { base }) + } + + pub(crate) fn cpu_ctl(&self, bar: &Bar0) -> FalconCpuCtl { + FalconCpuCtl::read(bar, self.base) + } +} diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs index 8fa8616c0deccc7297b090fcbe74f3cda5cc9741..8d8b5ee5c9444c4722d1025d4008fc5a8841a247 100644 --- a/drivers/gpu/nova-core/gpu.rs +++ b/drivers/gpu/nova-core/gpu.rs @@ -7,6 +7,7 @@ }; use crate::driver::Bar0; +use crate::falcon::Falcon; use crate::regs; use crate::timer::Timer; use core::fmt; @@ -228,6 +229,15 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit< let dev = pdev.as_ref().into(); let timer = Timer::new(); + let gsp_falcon = Falcon::new(pdev, &bar, regs::FALCON_GSP_BASE)?; + let sec2 = Falcon::new(pdev, &bar, regs::FALCON_SEC2_BASE)?; + let b = bar.try_access().ok_or(ENXIO)?; + dev_info!( + pdev.as_ref(), + "GSP Falcon CpuCtl: {:?}", + gsp_falcon.cpu_ctl(&b) + ); + dev_info!(pdev.as_ref(), "SEC2 Falcon CpuCtl: {:?}", sec2.cpu_ctl(&b)); Ok(pin_init!(Self { dev, diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs index 891a93ba7656d2aa5e1fa4357d1d84ee3a054942..a5817bda30185d4ec7021f3d3e881cd99230ca94 100644 --- a/drivers/gpu/nova-core/nova_core.rs +++ b/drivers/gpu/nova-core/nova_core.rs @@ -3,6 +3,7 @@ //! Nova Core GPU Driver mod driver; +mod falcon; mod firmware; mod gpu; mod regs; diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs index 35bbd3c0b58972de3a2478ef20f93f31c69940e7..12a889a785e0713c6041d50284c211352a39303b 100644 --- a/drivers/gpu/nova-core/regs.rs +++ b/drivers/gpu/nova-core/regs.rs @@ -3,6 +3,7 @@ use core::{fmt::Debug, marker::PhantomData, ops::Deref}; use crate::driver::Bar0; +use crate::falcon::{FalconCoreRev, FalconCoreRevSubversion, FalconSecurityModel}; use crate::gpu::Chipset; pub(crate) struct Builder<T>(T, PhantomData<T>); @@ -180,3 +181,110 @@ impl Builder<$name> { nv_reg!(PtimerTime1 at 0x00009410; 31:0 hi as (u32), "high 32 bits of the timer" ); + +pub(crate) const FALCON_GSP_BASE: usize = 0x00110000; +pub(crate) const FALCON_SEC2_BASE: usize = 0x00840000; + +nv_reg!(FalconIrqsClr at +0x00000004; + 4:4 halt as_bit (bool); + 6:6 swgen0 as_bit (bool); +); + +nv_reg!(FalconMailbox0 at +0x00000040; + 31:0 mailbox0 as (u32) +); +nv_reg!(FalconMailbox1 at +0x00000044; + 31:0 mailbox1 as (u32) +); + +nv_reg!(FalconCpuCtl at +0x00000100; + 1:1 start_cpu as_bit (bool); + 4:4 halted as_bit (bool); + 6:6 alias_en as_bit (bool); +); +nv_reg!(FalconBootVec at +0x00000104; + 31:0 boot_vec as (u32) +); + +nv_reg!(FalconHwCfg at +0x00000108; + 8:0 imem_size as (u32); + 17:9 dmem_size as (u32); +); + +nv_reg!(FalconDmaCtl at +0x0000010c; + 0:0 require_ctx as_bit (bool); + 1:1 dmem_scrubbing as_bit (bool); + 2:2 imem_scrubbing as_bit (bool); + 6:3 dmaq_num as_bit (u8); + 7:7 secure_stat as_bit (bool); +); + +nv_reg!(FalconDmaTrfBase at +0x00000110; + 31:0 base as (u32); +); + +nv_reg!(FalconDmaTrfMOffs at +0x00000114; + 23:0 offs as (u32); +); + +nv_reg!(FalconDmaTrfCmd at +0x00000118; + 0:0 full as_bit (bool); + 1:1 idle as_bit (bool); + 3:2 sec as_bit (u8); + 4:4 imem as_bit (bool); + 5:5 is_write as_bit (bool); + 10:8 size as (u8); + 14:12 ctxdma as (u8); + 16:16 set_dmtag as (u8); +); + +nv_reg!(FalconDmaTrfBOffs at +0x0000011c; + 31:0 offs as (u32); +); + +nv_reg!(FalconDmaTrfBase1 at +0x00000128; + 8:0 base as (u16); +); + +nv_reg!(FalconHwCfg1 at +0x0000012c; + 3:0 core_rev try_into (FalconCoreRev), "core revision of the falcon"; + 5:4 security_model try_into (FalconSecurityModel), "security model of the falcon"; + 7:6 core_rev_subversion into (FalconCoreRevSubversion); + 11:8 imem_ports as (u8); + 15:12 dmem_ports as (u8); +); + +// TODO: This should be able to take an index, like +0x180[16; 8]? Then the constructor or read +// method take the port we want to address as argument. +nv_reg!(FalconImemC at +0x00000180; + 7:2 offs as (u8); + 23:8 blk as (u8); + 24:24 aincw as_bit (bool); + 25:25 aincr as_bit (bool); + 28:28 secure as_bit (bool); + 29:29 sec_atomic as_bit (bool); +); + +nv_reg!(FalconImemD at +0x00000184; + 31:0 data as (u32); +); + +nv_reg!(FalconImemT at +0x00000188; + 15:0 data as (u16); +); + +nv_reg!(FalconDmemC at +0x000001c0; + 23:0 addr as (u32); + 7:2 offs as (u8); + 23:8 blk as (u8); + 24:24 aincw as_bit (bool); + 25:25 aincr as_bit (bool); + 26:26 settag as_bit (bool); + 27:27 setlvl as_bit (bool); + 28:28 va as_bit (bool); + 29:29 miss as_bit (bool); +); + +nv_reg!(FalconDmemD at +0x000001c4; + 31:0 data as (u32); +); -- 2.48.1