Alexandre Courbot
2025-Mar-04 13:53 UTC
[RFC PATCH v2 0/5] gpu: nova-core: register definitions and basic timer and falcon devices
Hi everyone,
This RFC is based on top of Danilo's initial driver stub series
v4 [1] and adds very basic support for the Timer and Falcon devices, in
order to see and "feel" the proposed register access abstractions and
discuss them before moving forward with GSP initialization.
It is kept simple and short on purpose, to avoid bumping into a wall
with much more device code because my assumptions were incorrect.
The main addition is the nv_reg!() register definition macro, which aims
at providing safe and convenient access to all useful registers and
their fields. I elaborate on its definition in the patch that introduces
it ; it is also probably better to look at all the register definitions
to understand how it can be used, and the services it provides. Right
now it provides accessors and builders for all the fields of a register.
It will probably need to be extended with more operations as we deem
them useful.
The timer device has not changed much from v1, with the exception of
having its own Timestamp type to easily obtain Durations between two
samples.
The falcon implementation is still super incomplete, and just designed
to illustrate how the register macros can be used. I have more progress
in a private branch, but want to keep the focus on the nv_reg!() macro
for this review since the rest will ultimately depend on it.
It would be charitable to say that my Rust macro skills are lacking ; so
please point out any deficiency in its definition. I am also not
entirely sure about the syntax for register definition - I would like to
keep things simple and close to OpenRM (notably for the mask
definitions) to make it easier to port definition from it into Nova.
[1] https://lore.kernel.org/nouveau/20250226175552.29381-1-dakr at kernel.org/T/
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
Changes in v2:
- Don't hold the Bar guard in methods that can sleep.
- Added a Timestamp type for Timer to safely and easily get durations
between two measurements.
- Added a macro to make register definitions easier.
- Added a very basic falcon implementation to define more registers and
exercise the register definition macro.
- Link to v1: https://lore.kernel.org/r/20250217-nova_timer-v1-0-78c5ace2d987 at
nvidia.com
---
Alexandre Courbot (5):
rust: add useful ops for u64
rust: make ETIMEDOUT error available
gpu: nova-core: add register definition macro
gpu: nova-core: add basic timer device
gpu: nova-core: add falcon register definitions and probe code
drivers/gpu/nova-core/driver.rs | 4 +-
drivers/gpu/nova-core/falcon.rs | 124 +++++++++++++++
drivers/gpu/nova-core/gpu.rs | 70 ++++++++-
drivers/gpu/nova-core/nova_core.rs | 2 +
drivers/gpu/nova-core/regs.rs | 311 ++++++++++++++++++++++++++++++++-----
drivers/gpu/nova-core/timer.rs | 124 +++++++++++++++
rust/kernel/error.rs | 1 +
rust/kernel/lib.rs | 1 +
rust/kernel/num.rs | 43 +++++
9 files changed, 639 insertions(+), 41 deletions(-)
---
base-commit: 3ac10b625b709d59556cd2c1bf8a009c2bfdbefc
change-id: 20250216-nova_timer-c69430184f54
Best regards,
--
Alexandre Courbot <acourbot at nvidia.com>
It is common to build a u64 from its high and low parts obtained from
two 32-bit registers. Conversely, it is also common to split a u64 into
two u32s to write them into registers. Add an extension trait for u64
that implement these methods in a new `num` module.
It is expected that this trait will be extended with other useful
operations, and similar extension traits implemented for other types.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
rust/kernel/lib.rs | 1 +
rust/kernel/num.rs | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index
8e76ef9b4346956009a936b1317f7474a83c8dbd..caee059249cf56993d5db698a876f040eda33dd5
100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -61,6 +61,7 @@
pub mod miscdevice;
#[cfg(CONFIG_NET)]
pub mod net;
+pub mod num;
pub mod of;
pub mod page;
#[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index
0000000000000000000000000000000000000000..f03c82f13643412cc13b0b841dfdf3b06490926d
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// Useful operations for `u64`.
+pub trait U64Ext {
+ /// Build a `u64` by combining its `high` and `low` parts.
+ ///
+ /// ```
+ /// use kernel::num::U64Ext;
+ /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef),
0x01234567_89abcdef);
+ /// ```
+ fn from_u32s(high: u32, low: u32) -> Self;
+
+ fn upper_32_bits(self) -> u32;
+ fn lower_32_bits(self) -> u32;
+}
+
+impl U64Ext for u64 {
+ fn from_u32s(high: u32, low: u32) -> Self {
+ ((high as u64) << u32::BITS) | low as u64
+ }
+
+ fn upper_32_bits(self) -> u32 {
+ (self >> u32::BITS) as u32
+ }
+
+ fn lower_32_bits(self) -> u32 {
+ self as u32
+ }
+}
+
+pub const fn upper_32_bits(v: u64) -> u32 {
+ (v >> u32::BITS) as u32
+}
+
+pub const fn lower_32_bits(v: u64) -> u32 {
+ v as u32
+}
+
+pub const fn u32s_to_u64(high: u32, low: u32) -> u64 {
+ ((high as u64) << u32::BITS) | low as u64
+}
--
2.48.1
Alexandre Courbot
2025-Mar-04 13:53 UTC
[PATCH RFC v2 2/5] rust: make ETIMEDOUT error available
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
rust/kernel/error.rs | 1 +
1 file changed, 1 insertion(+)
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index
1e510181432cceae46219f7ed3597a88b85ebe0a..475d14a4830774aa7717d3b5e70c7ff9de203dc2
100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -65,6 +65,7 @@ macro_rules! declare_err {
declare_err!(EDOM, "Math argument out of domain of func.");
declare_err!(ERANGE, "Math result not representable.");
declare_err!(EOVERFLOW, "Value too large for defined data
type.");
+ declare_err!(ETIMEDOUT, "Connection timed out.");
declare_err!(ERESTARTSYS, "Restart the system call.");
declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal
and will be restarted.");
declare_err!(ERESTARTNOHAND, "Restart if no handler.");
--
2.48.1
Alexandre Courbot
2025-Mar-04 13:53 UTC
[PATCH RFC v2 3/5] gpu: nova-core: add register definition macro
Register data manipulation is one of the error-prone areas of a kernel
driver. It is particularly easy to mix addresses of registers, masks and
shifts of fields, and to proceed with invalid values.
This patch introduces the nv_reg!() macro, which creates a safe type
definition for a given register, along with field accessors and
value builder. The macro is designed to type the same field ranges as
the NVIDIA OpenRM project, to facilitate porting its register
definitions to Nova.
Here is for instance the definition of the Boot0 register:
nv_reg!(Boot0 at 0x00000000, "Basic revision information about the
GPU";
3:0 minor_rev as (u8), "minor revision of the chip";
7:4 major_rev as (u8), "major revision of the chip";
25:20 chipset try_into (Chipset), "chipset model"
);
This definition creates a Boot0 type that includes read() and write()
methods that will automatically use the correct register offset (0x0 in
this case).
Creating a type for each register lets us leverage the type system to
make sure register values don't get mix up.
It also allows us to create register-specific field extractor methods
(here minor_rev(), major_rev(), and chipset()) that present each field
in a convenient way and validate its data if relevant. The chipset()
accessor, in particular, uses the TryFrom<u32> implementation of Chipset
to build a Chipset instance and returns its associated error type if the
conversion has failed because of an invalid value.
The ending string at the end of each line is optional, and expands to
doc comments for the type itself, or each of the field accessors.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
drivers/gpu/nova-core/gpu.rs | 2 +-
drivers/gpu/nova-core/regs.rs | 195 ++++++++++++++++++++++++++++++++++--------
2 files changed, 158 insertions(+), 39 deletions(-)
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index
7693a5df0dc11f208513dc043d8c99f85c902119..58b97c7f0b2ab1edacada8346b139f6336b68272
100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -164,7 +164,7 @@ fn new(bar: &Devres<Bar0>) ->
Result<Spec> {
let boot0 = regs::Boot0::read(&bar);
Ok(Self {
- chipset: boot0.chipset().try_into()?,
+ chipset: boot0.chipset()?,
revision: Revision::from_boot0(boot0),
})
}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index
50aefb150b0b1c9b73f07fca3b7a070885785485..a874cb2fa5bedee258a60e5c3b471f52e5f82469
100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -1,55 +1,174 @@
// SPDX-License-Identifier: GPL-2.0
+use core::{fmt::Debug, marker::PhantomData, ops::Deref};
+
use crate::driver::Bar0;
+use crate::gpu::Chipset;
-// TODO
-//
-// Create register definitions via generic macros. See task "Generic
register
-// abstraction" in Documentation/gpu/nova/core/todo.rst.
+pub(crate) struct Builder<T>(T, PhantomData<T>);
-const BOOT0_OFFSET: usize = 0x00000000;
+impl<T> From<T> for Builder<T> {
+ fn from(value: T) -> Self {
+ Builder(value, PhantomData)
+ }
+}
-// 3:0 - chipset minor revision
-const BOOT0_MINOR_REV_SHIFT: u8 = 0;
-const BOOT0_MINOR_REV_MASK: u32 = 0x0000000f;
+impl<T: Default> Default for Builder<T> {
+ fn default() -> Self {
+ Self(Default::default(), PhantomData)
+ }
+}
-// 7:4 - chipset major revision
-const BOOT0_MAJOR_REV_SHIFT: u8 = 4;
-const BOOT0_MAJOR_REV_MASK: u32 = 0x000000f0;
+impl<T> Deref for Builder<T> {
+ type Target = T;
-// 23:20 - chipset implementation Identifier (depends on architecture)
-const BOOT0_IMPL_SHIFT: u8 = 20;
-const BOOT0_IMPL_MASK: u32 = 0x00f00000;
+ fn deref(&self) -> &Self::Target {
+ &self.0
+ }
+}
-// 28:24 - chipset architecture identifier
-const BOOT0_ARCH_MASK: u32 = 0x1f000000;
+macro_rules! nv_reg_common {
+ ($name:ident $(, $type_comment:expr)?) => {
+ $(
+ #[doc=concat!($type_comment)]
+ )?
+ #[derive(Clone, Copy, Default)]
+ pub(crate) struct $name(u32);
-// 28:20 - chipset identifier (virtual register field combining BOOT0_IMPL and
-// BOOT0_ARCH)
-const BOOT0_CHIPSET_SHIFT: u8 = BOOT0_IMPL_SHIFT;
-const BOOT0_CHIPSET_MASK: u32 = BOOT0_IMPL_MASK | BOOT0_ARCH_MASK;
+ // TODO: should we display the raw hex value, then the value of all its
fields?
+ impl Debug for $name {
+ fn fmt(&self, f: &mut core::fmt::Formatter<'_>)
-> core::fmt::Result {
+ f.debug_tuple(stringify!($name))
+ .field(&format_args!("0x{0:x}", &self.0))
+ .finish()
+ }
+ }
-#[derive(Copy, Clone)]
-pub(crate) struct Boot0(u32);
+ impl core::ops::BitOr for $name {
+ type Output = Self;
-impl Boot0 {
- #[inline]
- pub(crate) fn read(bar: &Bar0) -> Self {
- Self(bar.readl(BOOT0_OFFSET))
- }
+ fn bitor(self, rhs: Self) -> Self::Output {
+ Self(self.0 | rhs.0)
+ }
+ }
- #[inline]
- pub(crate) fn chipset(&self) -> u32 {
- (self.0 & BOOT0_CHIPSET_MASK) >> BOOT0_CHIPSET_SHIFT
- }
+ #[allow(dead_code)]
+ impl $name {
+ /// Returns a new builder for the register. Individual fields can
be set by the methods
+ /// of the builder, and the current value obtained by dereferencing
it.
+ #[inline]
+ pub(crate) fn new() -> Builder<Self> {
+ Default::default()
+ }
+ }
+ };
+}
- #[inline]
- pub(crate) fn minor_rev(&self) -> u8 {
- ((self.0 & BOOT0_MINOR_REV_MASK) >> BOOT0_MINOR_REV_SHIFT) as
u8
- }
+macro_rules! nv_reg_field_accessor {
+ ($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))?
$(into ($type:ty))? $(try_into ($try_type:ty))? $(, $comment:expr)?) => {
+ $(
+ #[doc=concat!("Returns the ", $comment)]
+ )?
+ #[inline]
+ pub(crate) fn $field(self) -> $( $as_type )? $( $bit_type )? $(
$type )? $( core::result::Result<$try_type, <$try_type as
TryFrom<u32>>::Error> )? {
+ const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1
<< $lo) - 1);
+ const SHIFT: u32 = MASK.trailing_zeros();
+ let field = (self.0 & MASK) >> SHIFT;
- #[inline]
- pub(crate) fn major_rev(&self) -> u8 {
- ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as
u8
+ $( field as $as_type )?
+ $(
+ // TODO: it would be nice to throw a compile-time error if $hi !=
$lo as this means we
+ // are considering more than one bit but returning a bool...
+ (if field != 0 { true } else { false }) as $bit_type
+ )?
+ $( <$type>::from(field) )?
+ $( <$try_type>::try_from(field) )?
+ }
}
}
+
+macro_rules! nv_reg_field_builder {
+ ($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit ($bit_type:ty))?
$(into ($type:ty))? $(try_into ($try_type:ty))? $(, $comment:expr)?) => {
+ $(
+ #[doc=concat!("Sets the ", $comment)]
+ )?
+ #[inline]
+ pub(crate) fn $field(mut self, value: $( $as_type)? $( $bit_type )? $(
$type )? $( $try_type)? ) -> Self {
+ const MASK: u32 = ((((1 << $hi) - 1) << 1) + 1) - ((1
<< $lo) - 1);
+ const SHIFT: u32 = MASK.trailing_zeros();
+
+ let value = ((value as u32) << SHIFT) & MASK;
+ self.0.0 = self.0.0 | value;
+ self
+ }
+ };
+}
+
+macro_rules! nv_reg {
+ (
+ $name:ident@$offset:expr $(, $type_comment:expr)?;
+ $($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit
($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(,
$field_comment:expr)?);* $(;)?
+ ) => {
+ nv_reg_common!($name);
+
+ #[allow(dead_code)]
+ impl $name {
+ #[inline]
+ pub(crate) fn read(bar: &Bar0) -> Self {
+ Self(bar.readl($offset))
+ }
+
+ #[inline]
+ pub(crate) fn write(self, bar: &Bar0) {
+ bar.writel(self.0, $offset)
+ }
+
+ $(
+ nv_reg_field_accessor!($hi:$lo $field $(as ($as_type))? $(as_bit
($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?);
+ )*
+ }
+
+ #[allow(dead_code)]
+ impl Builder<$name> {
+ $(
+ nv_reg_field_builder!($hi:$lo $field $(as ($as_type))? $(as_bit
($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?);
+ )*
+ }
+ };
+ (
+ $name:ident at +$offset:expr $(, $type_comment:expr)?;
+ $($hi:tt:$lo:tt $field:ident $(as ($as_type:ty))? $(as_bit
($bit_type:ty))? $(into ($type:ty))? $(try_into ($try_type:ty))? $(,
$field_comment:expr)?);* $(;)?
+ ) => {
+ nv_reg_common!($name);
+
+ #[allow(dead_code)]
+ impl $name {
+ #[inline]
+ pub(crate) fn read(bar: &Bar0, base: usize) -> Self {
+ Self(bar.readl(base + $offset))
+ }
+
+ #[inline]
+ pub(crate) fn write(self, bar: &Bar0, base: usize) {
+ bar.writel(self.0, base + $offset)
+ }
+
+ $(
+ nv_reg_field_accessor!($hi:$lo $field $(as ($as_type))? $(as_bit
($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?);
+ )*
+ }
+
+ #[allow(dead_code)]
+ impl Builder<$name> {
+ $(
+ nv_reg_field_builder!($hi:$lo $field $(as ($as_type))? $(as_bit
($bit_type))? $(into ($type))? $(try_into ($try_type))? $(, $field_comment)?);
+ )*
+ }
+ };
+}
+
+nv_reg!(Boot0 at 0x00000000, "Basic revision information about the
GPU";
+ 3:0 minor_rev as (u8), "minor revision of the chip";
+ 7:4 major_rev as (u8), "major revision of the chip";
+ 25:20 chipset try_into (Chipset), "chipset model"
+);
--
2.48.1
Alexandre Courbot
2025-Mar-04 13:54 UTC
[PATCH RFC v2 4/5] gpu: nova-core: add basic timer device
Add a basic timer device and exercise it during device probing. This
first draft is probably very questionable.
One point in particular which should IMHO receive attention: the generic
wait_on() method aims at providing similar functionality to Nouveau's
nvkm_[num]sec() macros. Since this method will be heavily used with
different conditions to test, I'd like to avoid monomorphizing it
entirely with each instance ; that's something that is achieved in
nvkm_xsec() using functions that the macros invoke.
I have tried achieving the same result in Rust using closures (kept
as-is in the current code), but they seem to be monomorphized as well.
Calling extra functions could work better, but looks also less elegant
to me, so I am really open to suggestions here.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
drivers/gpu/nova-core/driver.rs | 4 +-
drivers/gpu/nova-core/gpu.rs | 58 ++++++++++++++++-
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 8 +++
drivers/gpu/nova-core/timer.rs | 124 +++++++++++++++++++++++++++++++++++++
5 files changed, 193 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index
63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7
100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
pub(crate) gpu: Gpu,
}
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x9500;
pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
kernel::pci_device_table!(
@@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo)
-> Result<Pin<KBox<Self>>
GFP_KERNEL,
)?;
+ let _ = this.gpu.test_timer();
+
Ok(this)
}
}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index
58b97c7f0b2ab1edacada8346b139f6336b68272..8fa8616c0deccc7297b090fcbe74f3cda5cc9741
100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,12 +1,16 @@
// SPDX-License-Identifier: GPL-2.0
+use kernel::device::Device;
+use kernel::types::ARef;
use kernel::{
device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*,
str::BStr, str::CString,
};
use crate::driver::Bar0;
use crate::regs;
+use crate::timer::Timer;
use core::fmt;
+use core::time::Duration;
const fn to_lowercase_bytes<const N: usize>(s: &str) -> [u8; N] {
let src = s.as_bytes();
@@ -201,10 +205,12 @@ fn new(dev: &device::Device, spec: &Spec, ver:
&str) -> Result<Firmware> {
/// Structure holding the resources required to operate the GPU.
#[pin_data]
pub(crate) struct Gpu {
+ dev: ARef<Device>,
spec: Spec,
/// MMIO mapping of PCI BAR 0
bar: Devres<Bar0>,
fw: Firmware,
+ timer: Timer,
}
impl Gpu {
@@ -220,6 +226,56 @@ pub(crate) fn new(pdev: &pci::Device, bar:
Devres<Bar0>) -> Result<impl PinInit<
spec.revision
);
- Ok(pin_init!(Self { spec, bar, fw }))
+ let dev = pdev.as_ref().into();
+ let timer = Timer::new();
+
+ Ok(pin_init!(Self {
+ dev,
+ spec,
+ bar,
+ fw,
+ timer,
+ }))
+ }
+
+ pub(crate) fn test_timer(&self) -> Result<()> {
+ let bar = self.bar.try_access().ok_or(ENXIO)?;
+ dev_info!(&self.dev, "testing timer subdev\n");
+ dev_info!(&self.dev, "current timestamp: {}\n",
self.timer.read(&bar));
+ drop(bar);
+
+ assert!(matches!(
+ self.timer
+ .wait_on(&self.bar, Duration::from_millis(10), ||
Some(())),
+ Ok(())
+ ));
+
+ let bar = self.bar.try_access().ok_or(ENXIO)?;
+ dev_info!(
+ &self.dev,
+ "timestamp after immediate exit: {}\n",
+ self.timer.read(&bar)
+ );
+ let t1 = self.timer.read(&bar);
+ drop(bar);
+
+ assert_eq!(
+ self.timer
+ .wait_on(&self.bar, Duration::from_millis(10), ||
Option::<()>::None),
+ Err(ETIMEDOUT)
+ );
+
+ let bar = self.bar.try_access().ok_or(ENXIO)?;
+ let t2 = self.timer.read(&bar);
+ assert!(t2 - t1 >= Duration::from_millis(10));
+ dev_info!(
+ &self.dev,
+ "timestamp after timeout: {} ({:?})\n",
+ self.timer.read(&bar),
+ t2 - t1
+ );
+ drop(bar);
+
+ Ok(())
}
}
diff --git a/drivers/gpu/nova-core/nova_core.rs
b/drivers/gpu/nova-core/nova_core.rs
index
8479be2a3f31798e887228863f223d42a63bd8ca..891a93ba7656d2aa5e1fa4357d1d84ee3a054942
100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -6,6 +6,7 @@
mod firmware;
mod gpu;
mod regs;
+mod timer;
kernel::module_pci_driver! {
type: driver::NovaCore,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index
a874cb2fa5bedee258a60e5c3b471f52e5f82469..35bbd3c0b58972de3a2478ef20f93f31c69940e7
100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -172,3 +172,11 @@ impl Builder<$name> {
7:4 major_rev as (u8), "major revision of the chip";
25:20 chipset try_into (Chipset), "chipset model"
);
+
+nv_reg!(PtimerTime0 at 0x00009400;
+ 31:0 lo as (u32), "low 32-bits of the timer"
+);
+
+nv_reg!(PtimerTime1 at 0x00009410;
+ 31:0 hi as (u32), "high 32 bits of the timer"
+);
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
new file mode 100644
index
0000000000000000000000000000000000000000..919995bf32141c568206fda165dcac6f4d4ce8b8
--- /dev/null
+++ b/drivers/gpu/nova-core/timer.rs
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Nova Core Timer subdevice
+
+use core::fmt::Display;
+use core::ops::{Add, Sub};
+use core::time::Duration;
+
+use kernel::devres::Devres;
+use kernel::num::U64Ext;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// A timestamp with nanosecond granularity obtained from the GPU timer.
+///
+/// A timestamp can also be substracted to another in order to obtain a
[`Duration`].
+///
+/// TODO: add Kunit tests!
+#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) struct Timestamp(u64);
+
+impl Display for Timestamp {
+ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) ->
core::fmt::Result {
+ write!(f, "{}", self.0)
+ }
+}
+
+impl Add<u64> for Timestamp {
+ type Output = Self;
+
+ fn add(self, rhs: u64) -> Self::Output {
+ Timestamp(self.0.wrapping_add(rhs))
+ }
+}
+
+impl Sub for Timestamp {
+ type Output = Duration;
+
+ fn sub(self, rhs: Self) -> Self::Output {
+ Duration::from_nanos(self.0.wrapping_sub(rhs.0))
+ }
+}
+
+pub(crate) struct Timer {}
+
+impl Timer {
+ pub(crate) fn new() -> Self {
+ Self {}
+ }
+
+ /// Read the current timer timestamp.
+ pub(crate) fn read(&self, bar: &Bar0) -> Timestamp {
+ loop {
+ let hi = regs::PtimerTime1::read(bar);
+ let lo = regs::PtimerTime0::read(bar);
+
+ if hi.hi() == regs::PtimerTime1::read(bar).hi() {
+ return Timestamp(u64::from_u32s(hi.hi(), lo.lo()));
+ }
+ }
+ }
+
+ #[allow(dead_code)]
+ pub(crate) fn time(bar: &Bar0, time: u64) {
+ regs::PtimerTime1::new().hi(time.upper_32_bits()).write(bar);
+ regs::PtimerTime0::new().lo(time.lower_32_bits()).write(bar);
+ }
+
+ /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
+ ///
+ /// When `cond` evaluates to `Some`, its return value is returned.
+ ///
+ /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without
`cond` evaluating to
+ /// `Some`, or if the timer device is stuck for some reason.
+ pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
+ &self,
+ dev_bar: &Devres<Bar0>,
+ timeout: Duration,
+ cond: F,
+ ) -> Result<R> {
+ // Number of consecutive time reads after which we consider the timer
frozen if it hasn't
+ // moved forward.
+ const MAX_STALLED_READS: usize = 16;
+
+ let (mut cur_time, mut prev_time, deadline) = {
+ let bar = dev_bar.try_access().ok_or(ENXIO)?;
+ let cur_time = self.read(&bar);
+ let deadline = cur_time +
u64::try_from(timeout.as_nanos()).unwrap_or(u64::MAX);
+
+ (cur_time, cur_time, deadline)
+ };
+ let mut num_reads = 0;
+
+ loop {
+ if let Some(ret) = cond() {
+ return Ok(ret);
+ }
+
+ (|| {
+ let bar = dev_bar.try_access().ok_or(ENXIO)?;
+ cur_time = self.read(&bar);
+
+ /* Check if the timer is frozen for some reason. */
+ if cur_time == prev_time {
+ if num_reads >= MAX_STALLED_READS {
+ return Err(ETIMEDOUT);
+ }
+ num_reads += 1;
+ } else {
+ if cur_time >= deadline {
+ return Err(ETIMEDOUT);
+ }
+
+ num_reads = 0;
+ prev_time = cur_time;
+ }
+
+ Ok(())
+ })()?;
+ }
+ }
+}
--
2.48.1
Alexandre Courbot
2025-Mar-04 13:54 UTC
[PATCH RFC v2 5/5] gpu: nova-core: add falcon register definitions and probe code
This is still very preliminary work, and is mostly designed to show how
register fields can be turned into safe types that force us to handle
invalid values.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
drivers/gpu/nova-core/driver.rs | 2 +-
drivers/gpu/nova-core/falcon.rs | 124 +++++++++++++++++++++++++++++++++++++
drivers/gpu/nova-core/gpu.rs | 10 +++
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 108 ++++++++++++++++++++++++++++++++
5 files changed, 244 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index
0cd23aa306e4082405f480afc0530a41131485e7..dee5fd22eecb2ce1f4ea765338b0c1b68853b2d3
100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
pub(crate) gpu: Gpu,
}
-const BAR0_SIZE: usize = 0x9500;
+const BAR0_SIZE: usize = 0x1000000;
pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
new file mode 100644
index
0000000000000000000000000000000000000000..5f8496ed1f91ccd19c0c7716440cbc795a7a025f
--- /dev/null
+++ b/drivers/gpu/nova-core/falcon.rs
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Falcon microprocessor base support
+
+use core::hint::unreachable_unchecked;
+use kernel::devres::Devres;
+use kernel::{pci, prelude::*};
+
+use crate::driver::Bar0;
+use crate::regs::{FalconCpuCtl, FalconHwCfg1};
+
+#[repr(u8)]
+#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRev {
+ Rev1 = 1,
+ Rev2 = 2,
+ Rev3 = 3,
+ Rev4 = 4,
+ Rev5 = 5,
+ Rev6 = 6,
+ Rev7 = 7,
+}
+
+impl TryFrom<u32> for FalconCoreRev {
+ type Error = Error;
+
+ fn try_from(value: u32) -> core::result::Result<Self, Self::Error>
{
+ use FalconCoreRev::*;
+
+ let rev = match value {
+ 1 => Rev1,
+ 2 => Rev2,
+ 3 => Rev3,
+ 4 => Rev4,
+ 5 => Rev5,
+ 6 => Rev6,
+ 7 => Rev7,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(rev)
+ }
+}
+
+#[repr(u8)]
+#[derive(Debug, Copy, Clone)]
+pub(crate) enum FalconSecurityModel {
+ None = 0,
+ Light = 2,
+ Heavy = 3,
+}
+
+impl TryFrom<u32> for FalconSecurityModel {
+ type Error = Error;
+
+ fn try_from(value: u32) -> core::result::Result<Self, Self::Error>
{
+ use FalconSecurityModel::*;
+
+ let sec_model = match value {
+ 0 => None,
+ 2 => Light,
+ 3 => Heavy,
+ _ => return Err(EINVAL),
+ };
+
+ Ok(sec_model)
+ }
+}
+
+#[repr(u8)]
+#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) enum FalconCoreRevSubversion {
+ Subversion0 = 0,
+ Subversion1 = 1,
+ Subversion2 = 2,
+ Subversion3 = 3,
+}
+
+impl From<u32> for FalconCoreRevSubversion {
+ fn from(value: u32) -> Self {
+ use FalconCoreRevSubversion::*;
+
+ match value & 0b11 {
+ 0 => Subversion0,
+ 1 => Subversion1,
+ 2 => Subversion2,
+ 3 => Subversion3,
+ // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
+ 4..=u32::MAX => unsafe { unreachable_unchecked() },
+ }
+ }
+}
+
+/// Contains the base parameters common to all Falcon instances.
+#[derive(Debug)]
+pub(crate) struct Falcon {
+ /// Base IO address.
+ base: usize,
+}
+
+impl Falcon {
+ pub(crate) fn new(pdev: &pci::Device, bar: &Devres<Bar0>,
base: usize) -> Result<Self> {
+ let b = bar.try_access().ok_or(ENXIO)?;
+
+ let hwcfg1 = FalconHwCfg1::read(&b, base);
+ let rev = hwcfg1.core_rev()?;
+ let subver = hwcfg1.core_rev_subversion();
+ let sec_model = hwcfg1.security_model()?;
+
+ dev_info!(
+ pdev.as_ref(),
+ "new falcon: {:?} {:?} {:?}",
+ rev,
+ subver,
+ sec_model
+ );
+
+ Ok(Self { base })
+ }
+
+ pub(crate) fn cpu_ctl(&self, bar: &Bar0) -> FalconCpuCtl {
+ FalconCpuCtl::read(bar, self.base)
+ }
+}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index
8fa8616c0deccc7297b090fcbe74f3cda5cc9741..8d8b5ee5c9444c4722d1025d4008fc5a8841a247
100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -7,6 +7,7 @@
};
use crate::driver::Bar0;
+use crate::falcon::Falcon;
use crate::regs;
use crate::timer::Timer;
use core::fmt;
@@ -228,6 +229,15 @@ pub(crate) fn new(pdev: &pci::Device, bar:
Devres<Bar0>) -> Result<impl PinInit<
let dev = pdev.as_ref().into();
let timer = Timer::new();
+ let gsp_falcon = Falcon::new(pdev, &bar, regs::FALCON_GSP_BASE)?;
+ let sec2 = Falcon::new(pdev, &bar, regs::FALCON_SEC2_BASE)?;
+ let b = bar.try_access().ok_or(ENXIO)?;
+ dev_info!(
+ pdev.as_ref(),
+ "GSP Falcon CpuCtl: {:?}",
+ gsp_falcon.cpu_ctl(&b)
+ );
+ dev_info!(pdev.as_ref(), "SEC2 Falcon CpuCtl: {:?}",
sec2.cpu_ctl(&b));
Ok(pin_init!(Self {
dev,
diff --git a/drivers/gpu/nova-core/nova_core.rs
b/drivers/gpu/nova-core/nova_core.rs
index
891a93ba7656d2aa5e1fa4357d1d84ee3a054942..a5817bda30185d4ec7021f3d3e881cd99230ca94
100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -3,6 +3,7 @@
//! Nova Core GPU Driver
mod driver;
+mod falcon;
mod firmware;
mod gpu;
mod regs;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index
35bbd3c0b58972de3a2478ef20f93f31c69940e7..12a889a785e0713c6041d50284c211352a39303b
100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -3,6 +3,7 @@
use core::{fmt::Debug, marker::PhantomData, ops::Deref};
use crate::driver::Bar0;
+use crate::falcon::{FalconCoreRev, FalconCoreRevSubversion,
FalconSecurityModel};
use crate::gpu::Chipset;
pub(crate) struct Builder<T>(T, PhantomData<T>);
@@ -180,3 +181,110 @@ impl Builder<$name> {
nv_reg!(PtimerTime1 at 0x00009410;
31:0 hi as (u32), "high 32 bits of the timer"
);
+
+pub(crate) const FALCON_GSP_BASE: usize = 0x00110000;
+pub(crate) const FALCON_SEC2_BASE: usize = 0x00840000;
+
+nv_reg!(FalconIrqsClr at +0x00000004;
+ 4:4 halt as_bit (bool);
+ 6:6 swgen0 as_bit (bool);
+);
+
+nv_reg!(FalconMailbox0 at +0x00000040;
+ 31:0 mailbox0 as (u32)
+);
+nv_reg!(FalconMailbox1 at +0x00000044;
+ 31:0 mailbox1 as (u32)
+);
+
+nv_reg!(FalconCpuCtl at +0x00000100;
+ 1:1 start_cpu as_bit (bool);
+ 4:4 halted as_bit (bool);
+ 6:6 alias_en as_bit (bool);
+);
+nv_reg!(FalconBootVec at +0x00000104;
+ 31:0 boot_vec as (u32)
+);
+
+nv_reg!(FalconHwCfg at +0x00000108;
+ 8:0 imem_size as (u32);
+ 17:9 dmem_size as (u32);
+);
+
+nv_reg!(FalconDmaCtl at +0x0000010c;
+ 0:0 require_ctx as_bit (bool);
+ 1:1 dmem_scrubbing as_bit (bool);
+ 2:2 imem_scrubbing as_bit (bool);
+ 6:3 dmaq_num as_bit (u8);
+ 7:7 secure_stat as_bit (bool);
+);
+
+nv_reg!(FalconDmaTrfBase at +0x00000110;
+ 31:0 base as (u32);
+);
+
+nv_reg!(FalconDmaTrfMOffs at +0x00000114;
+ 23:0 offs as (u32);
+);
+
+nv_reg!(FalconDmaTrfCmd at +0x00000118;
+ 0:0 full as_bit (bool);
+ 1:1 idle as_bit (bool);
+ 3:2 sec as_bit (u8);
+ 4:4 imem as_bit (bool);
+ 5:5 is_write as_bit (bool);
+ 10:8 size as (u8);
+ 14:12 ctxdma as (u8);
+ 16:16 set_dmtag as (u8);
+);
+
+nv_reg!(FalconDmaTrfBOffs at +0x0000011c;
+ 31:0 offs as (u32);
+);
+
+nv_reg!(FalconDmaTrfBase1 at +0x00000128;
+ 8:0 base as (u16);
+);
+
+nv_reg!(FalconHwCfg1 at +0x0000012c;
+ 3:0 core_rev try_into (FalconCoreRev), "core revision of the
falcon";
+ 5:4 security_model try_into (FalconSecurityModel), "security model
of the falcon";
+ 7:6 core_rev_subversion into (FalconCoreRevSubversion);
+ 11:8 imem_ports as (u8);
+ 15:12 dmem_ports as (u8);
+);
+
+// TODO: This should be able to take an index, like +0x180[16; 8]? Then the
constructor or read
+// method take the port we want to address as argument.
+nv_reg!(FalconImemC at +0x00000180;
+ 7:2 offs as (u8);
+ 23:8 blk as (u8);
+ 24:24 aincw as_bit (bool);
+ 25:25 aincr as_bit (bool);
+ 28:28 secure as_bit (bool);
+ 29:29 sec_atomic as_bit (bool);
+);
+
+nv_reg!(FalconImemD at +0x00000184;
+ 31:0 data as (u32);
+);
+
+nv_reg!(FalconImemT at +0x00000188;
+ 15:0 data as (u16);
+);
+
+nv_reg!(FalconDmemC at +0x000001c0;
+ 23:0 addr as (u32);
+ 7:2 offs as (u8);
+ 23:8 blk as (u8);
+ 24:24 aincw as_bit (bool);
+ 25:25 aincr as_bit (bool);
+ 26:26 settag as_bit (bool);
+ 27:27 setlvl as_bit (bool);
+ 28:28 va as_bit (bool);
+ 29:29 miss as_bit (bool);
+);
+
+nv_reg!(FalconDmemD at +0x000001c4;
+ 31:0 data as (u32);
+);
--
2.48.1