thr3ads.net - Nouveau - [PATCH RFC 2/3] rust: make ETIMEDOUT error available [Feb 2025]

If this information is useful, please help other people find it:
Share via:

Alexandre Courbot

2025-Feb-17 14:04 UTC

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Hi everyone,

This short RFC is based on top of Danilo's initial driver stub series
[1] and has for goal to initiate discussions and hopefully some design
decisions using the simplest subdevice of the GPU (the timer) as an
example, before implementing more devices allowing the GPU
initialization sequence to progress (Falcon being the logical next step
so we can get the GSP rolling).

It is kept simple and short for that purpose, and to avoid bumping into
a wall with much more device code because my assumptions were incorrect.

This is my first time trying to write Rust kernel code, and some of my
questions below are probably due to me not understanding yet how to use
the core kernel interfaces. So before going further I thought it would
make sense to raise the most obvious questions that came to my mind
while writing this draft:

- Where and how to store subdevices. The timer device is currently a
  direct member of the GPU structure. It might work for GSP devices
  which are IIUC supposed to have at least a few fixed devices required
  to bring the GSP up ; but as a general rule this probably won't scale
  as not all subdevices are present on all GPU variants, or in the same
  numbers. So we will probably need to find an equivalent to the
  `subdev` linked list in Nouveau.

- BAR sharing between subdevices. Right now each subdevice gets access
  to the full BAR range. I am wondering whether we could not split it
  into the relevant slices for each-subdevice, and transfer ownership of
  each slice to the device that is supposed to use it. That way each
  register would have a single owner, which is arguably safer - but
  maybe not as flexible as we will need down the road?

- On a related note, since the BAR is behind a Devres its availability
  must first be secured before any hardware access using try_access().
  Doing this on a per-register or per-operation basis looks overkill, so
  all methods that access the BAR take a reference to it, allowing to
  call try_access() from the highest-level caller and thus reducing the
  number of times this needs to be performed. Doing so comes at the cost
  of an extra argument to most subdevice methods ; but also with the
  benefit that we don't need to put the BAR behind another Arc and share
  it across all subdevices. I don't know which design is better here,
  and input would be very welcome.

- We will probably need sometime like a `Subdevice` trait or something
  down the road, but I'll wait until we have more than one subdevice to
  think about it.

The first 2 patches are small additions to the core Rust modules, that
the following patches make use of and which might be useful for other
drivers as well. The last patch is the naive implementation of the timer
device. I don't expect it to stay this way at all, so please point out
all the deficiencies in this very early code! :)

[1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr at kernel.org/

Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
Alexandre Courbot (3):
      rust: add useful ops for u64
      rust: make ETIMEDOUT error available
      gpu: nova-core: add basic timer device

 drivers/gpu/nova-core/driver.rs    |  4 +-
 drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
 drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
 rust/kernel/error.rs               |  1 +
 rust/kernel/lib.rs                 |  1 +
 rust/kernel/num.rs                 | 32 ++++++++++++++
 8 files changed, 206 insertions(+), 2 deletions(-)
---
base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
change-id: 20250216-nova_timer-c69430184f54

Best regards,
-- 
Alexandre Courbot <acourbot at nvidia.com>

Alexandre Courbot

2025-Feb-17 14:04 UTC

head link

[PATCH RFC 1/3] rust: add useful ops for u64

It is common to build a u64 from its high and low parts obtained from
two 32-bit registers. Conversely, it is also common to split a u64 into
two u32s to write them into registers. Add an extension trait for u64
that implement these methods in a new `num` module.

It is expected that this trait will be extended with other useful
operations, and similar extension traits implemented for other types.

Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/num.rs | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index
496ed32b0911a9fdbce5d26738b9cf7ef910b269..8c0c7c20a16aa96e3d3e444be3e03878650ddf77
100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -59,6 +59,7 @@
 pub mod miscdevice;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod num;
 pub mod of;
 pub mod page;
 #[cfg(CONFIG_PCI)]
diff --git a/rust/kernel/num.rs b/rust/kernel/num.rs
new file mode 100644
index
0000000000000000000000000000000000000000..5e714cbda4575b8d74f50660580dc4c5683f8c2b
--- /dev/null
+++ b/rust/kernel/num.rs
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Numerical and binary utilities for primitive types.
+
+/// Useful operations for `u64`.
+pub trait U64Ext {
+    /// Build a `u64` by combining its `high` and `low` parts.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::from_u32s(0x01234567, 0x89abcdef),
0x01234567_89abcdef);
+    /// ```
+    fn from_u32s(high: u32, low: u32) -> Self;
+
+    /// Returns the `(high, low)` u32s that constitute `self`.
+    ///
+    /// ```
+    /// use kernel::num::U64Ext;
+    /// assert_eq!(u64::into_u32s(0x01234567_89abcdef), (0x1234567,
0x89abcdef));
+    /// ```
+    fn into_u32s(self) -> (u32, u32);
+}
+
+impl U64Ext for u64 {
+    fn from_u32s(high: u32, low: u32) -> Self {
+        ((high as u64) << u32::BITS) | low as u64
+    }
+
+    fn into_u32s(self) -> (u32, u32) {
+        ((self >> u32::BITS) as u32, self as u32)
+    }
+}

-- 
2.48.1

Alexandre Courbot

2025-Feb-17 14:04 UTC

head link

[PATCH RFC 2/3] rust: make ETIMEDOUT error available

Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 rust/kernel/error.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index
f6ecf09cb65f4ebe9b88da68b3830ae79aa4f182..8858eb13b3df674b54572d2a371b8ec1303492dd
100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -64,6 +64,7 @@ macro_rules! declare_err {
     declare_err!(EPIPE, "Broken pipe.");
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal
and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");

-- 
2.48.1

Alexandre Courbot

2025-Feb-17 14:04 UTC

head link

[PATCH RFC 3/3] gpu: nova-core: add basic timer device

Add a basic timer device and exercise it during device probing. This
first draft is probably very questionable.

One point in particular which should IMHO receive attention: the generic
wait_on() method aims at providing similar functionality to Nouveau's
nvkm_[num]sec() macros. Since this method will be heavily used with
different conditions to test, I'd like to avoid monomorphizing it
entirely with each instance ; that's something that is achieved in
nvkm_xsec() using functions that the macros invoke.

I have tried achieving the same result in Rust using closures (kept
as-is in the current code), but they seem to be monomorphized as well.
Calling extra functions could work better, but looks also less elegant
to me, so I am really open to suggestions here.

Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/nova-core/driver.rs    |  4 +-
 drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
 drivers/gpu/nova-core/timer.rs     | 91 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index
63c19f140fbdd65d8fccf81669ac590807cc120f..0cd23aa306e4082405f480afc0530a41131485e7
100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x9500;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
@@ -42,6 +42,8 @@ fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo)
-> Result<Pin<KBox<Self>>
             GFP_KERNEL,
         )?;
 
+        let _ = this.gpu.test_timer();
+
         Ok(this)
     }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index
e7da6a2fa29d82e9624ba8baa2c7281f38cb3133..2fbf4041f6d421583636c7bede449c3416272301
100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -1,12 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use kernel::device::Device;
+use kernel::types::ARef;
 use kernel::{
     device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*,
str::CString,
 };
 
 use crate::driver::Bar0;
 use crate::regs;
+use crate::timer::Timer;
 use core::fmt;
+use core::time::Duration;
 
 /// Enum representation of the GPU chipset.
 #[derive(fmt::Debug)]
@@ -165,10 +169,12 @@ fn new(dev: &device::Device, spec: &Spec, ver:
&str) -> Result<Firmware> {
 /// Structure holding the resources required to operate the GPU.
 #[pin_data]
 pub(crate) struct Gpu {
+    dev: ARef<Device>,
     spec: Spec,
     /// MMIO mapping of PCI BAR 0
     bar: Devres<Bar0>,
     fw: Firmware,
+    timer: Timer,
 }
 
 impl Gpu {
@@ -184,6 +190,33 @@ pub(crate) fn new(pdev: &pci::Device, bar:
Devres<Bar0>) -> Result<impl PinInit<
             spec.revision
         );
 
-        Ok(pin_init!(Self { spec, bar, fw }))
+        let dev = pdev.as_ref().into();
+        let timer = Timer::new();
+
+        Ok(pin_init!(Self {
+            dev,
+            spec,
+            bar,
+            fw,
+            timer,
+        }))
+    }
+
+    pub(crate) fn test_timer(&self) -> Result<()> {
+        let bar = self.bar.try_access().ok_or(ENXIO)?;
+
+        dev_info!(&self.dev, "testing timer subdev\n");
+        assert!(matches!(
+            self.timer
+                .wait_on(&bar, Duration::from_millis(10), || Some(())),
+            Ok(())
+        ));
+        assert_eq!(
+            self.timer
+                .wait_on(&bar, Duration::from_millis(10), ||
Option::<()>::None),
+            Err(ETIMEDOUT)
+        );
+
+        Ok(())
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs
b/drivers/gpu/nova-core/nova_core.rs
index
5d0230042793dae97026146e94f3cdb31ba20642..94b165a340ddfffd448f87cd82200391de075806
100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -5,6 +5,7 @@
 mod driver;
 mod gpu;
 mod regs;
+mod timer;
 
 kernel::module_pci_driver! {
     type: driver::NovaCore,
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index
f2766f95e9d1eeab6734b18525fe504e1e7ea587..5127cc3454c047d64b7aaf599d8bf5f63a08bdfe
100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -54,3 +54,46 @@ pub(crate) fn major_rev(&self) -> u8 {
         ((self.0 & BOOT0_MAJOR_REV_MASK) >> BOOT0_MAJOR_REV_SHIFT) as
u8
     }
 }
+
+const PTIMER_TIME_0: usize = 0x00009400;
+const PTIMER_TIME_1: usize = 0x00009410;
+
+#[derive(Copy, Clone, PartialEq, Eq)]
+pub(crate) struct PtimerTime0(u32);
+
+impl PtimerTime0 {
+    #[inline]
+    pub(crate) fn read(bar: &Bar0) -> Self {
+        Self(bar.readl(PTIMER_TIME_0))
+    }
+
+    #[inline]
+    pub(crate) fn write(bar: &Bar0, val: u32) {
+        bar.writel(val, PTIMER_TIME_0)
+    }
+
+    #[inline]
+    pub(crate) fn lo(&self) -> u32 {
+        self.0
+    }
+}
+
+#[derive(Copy, Clone, PartialEq, Eq)]
+pub(crate) struct PtimerTime1(u32);
+
+impl PtimerTime1 {
+    #[inline]
+    pub(crate) fn read(bar: &Bar0) -> Self {
+        Self(bar.readl(PTIMER_TIME_1))
+    }
+
+    #[inline]
+    pub(crate) fn write(bar: &Bar0, val: u32) {
+        bar.writel(val, PTIMER_TIME_1)
+    }
+
+    #[inline]
+    pub(crate) fn hi(&self) -> u32 {
+        self.0
+    }
+}
diff --git a/drivers/gpu/nova-core/timer.rs b/drivers/gpu/nova-core/timer.rs
new file mode 100644
index
0000000000000000000000000000000000000000..f6a787d4fbdb90b3dc13e322d50da1c7f64818df
--- /dev/null
+++ b/drivers/gpu/nova-core/timer.rs
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Nova Core Timer subdevice
+
+use core::time::Duration;
+
+use kernel::num::U64Ext;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+pub(crate) struct Timer {}
+
+impl Timer {
+    pub(crate) fn new() -> Self {
+        Self {}
+    }
+
+    pub(crate) fn read(bar: &Bar0) -> u64 {
+        loop {
+            let hi = regs::PtimerTime1::read(bar);
+            let lo = regs::PtimerTime0::read(bar);
+
+            if hi == regs::PtimerTime1::read(bar) {
+                return u64::from_u32s(hi.hi(), lo.lo());
+            }
+        }
+    }
+
+    #[allow(dead_code)]
+    pub(crate) fn time(bar: &Bar0, time: u64) {
+        let (hi, lo) = time.into_u32s();
+
+        regs::PtimerTime1::write(bar, hi);
+        regs::PtimerTime0::write(bar, lo);
+    }
+
+    /// Wait until `cond` is true or `timeout` elapsed, based on GPU time.
+    ///
+    /// When `cond` evaluates to `Some`, its return value is returned.
+    ///
+    /// `Err(ETIMEDOUT)` is returned if `timeout` has been reached without
`cond` evaluating to
+    /// `Some`, or if the timer device is stuck for some reason.
+    pub(crate) fn wait_on<R, F: Fn() -> Option<R>>(
+        &self,
+        bar: &Bar0,
+        timeout: Duration,
+        cond: F,
+    ) -> Result<R> {
+        // Number of consecutive time reads after which we consider the timer
frozen if it hasn't
+        // moved forward.
+        const MAX_STALLED_READS: usize = 16;
+
+        let (mut cur_time, mut prev_time, deadline) = (|| {
+            let cur_time = Timer::read(bar);
+            let deadline +               
cur_time.saturating_add(u64::try_from(timeout.as_nanos()).unwrap_or(u64::MAX));
+
+            (cur_time, cur_time, deadline)
+        })();
+        let mut num_reads = 0;
+
+        loop {
+            if let Some(ret) = cond() {
+                return Ok(ret);
+            }
+
+            (|| {
+                cur_time = Timer::read(bar);
+
+                /* Check if the timer is frozen for some reason. */
+                if cur_time == prev_time {
+                    if num_reads >= MAX_STALLED_READS {
+                        return Err(ETIMEDOUT);
+                    }
+                    num_reads += 1;
+                } else {
+                    if cur_time >= deadline {
+                        return Err(ETIMEDOUT);
+                    }
+
+                    num_reads = 0;
+                    prev_time = cur_time;
+                }
+
+                Ok(())
+            })()?;
+        }
+    }
+}

-- 
2.48.1

Simona Vetter

2025-Feb-17 15:48 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot
wrote:> Hi everyone,
> 
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
> 
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
> 
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:
> 
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.
> 
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?
> 
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and
share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.
> 
> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.
It might make sense to go with a full-blown aux bus. Gives you a lot of
structures and answers to these questions, but also might be way too much.

So just throwing this a consideration in here.
-Sima
> 
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
> 
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr at
kernel.org/
> 
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
> 
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91
++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
> 
> Best regards,
> -- 
> Alexandre Courbot <acourbot at nvidia.com>
> 
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Danilo Krummrich

2025-Feb-17 21:33 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Hi Alex,

On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot
wrote:> Hi everyone,
> 
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
> 
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
> 
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:
Thanks for sending this RFC, that makes a lot of sense.

It's great to see you picking up work on Nova and Rust in the kernel in
general!

One nit: For the future, please make sure to copy in the folks listed under the
RUST entry in the maintainers file explicitly.
> 
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.
Hm...I think a Vec should probably do the job for this. Once we know the
chipset, we know the exact topology of subdevices too.
> 
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?
I think for self-contained subdevices we can easily add an abstraction for
pci_iomap_range() to pci::Device. I considered doing that from the get-go, but
then decided to wait until we have some actual use for that.

For where we have to share a mapping of the same set of registers between
multiple structures, I think we have to embedd in into an Arc (unfortunately,
we can't re-use the inner Arc of Devres for that).

An alternative would be to request a whole new mapping, i.e.
Devres<pci::Bar>
instance, but that includes an inner Arc anyways and, hence, is more costly.
> 
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and
share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.
I'm not sure I understand you correctly, because what you describe here seem
to
be two different things to me.

1. How to avoid unnecessary calls to try_access().

This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0>
as argument. I
think we can just call try_access() once and then propage the guard through the
callchain, where necessary.

2. Share the MMIO mapping between subdevices.

This is where I can follow. How does 1. help with that? How are 1. and 2.
related?
> 
> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.
Yeah, that sounds reasonable.
> 
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
> 
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr at
kernel.org/
> 
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
> 
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91
++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
> 
> Best regards,
> -- 
> Alexandre Courbot <acourbot at nvidia.com>
>

Dave Airlie

2025-Feb-18 01:42 UTC

head link

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

On Tue, 18 Feb 2025 at 00:04, Alexandre Courbot <acourbot at nvidia.com>
wrote:>
> Hi everyone,
>
> This short RFC is based on top of Danilo's initial driver stub series
> [1] and has for goal to initiate discussions and hopefully some design
> decisions using the simplest subdevice of the GPU (the timer) as an
> example, before implementing more devices allowing the GPU
> initialization sequence to progress (Falcon being the logical next step
> so we can get the GSP rolling).
>
> It is kept simple and short for that purpose, and to avoid bumping into
> a wall with much more device code because my assumptions were incorrect.
>
> This is my first time trying to write Rust kernel code, and some of my
> questions below are probably due to me not understanding yet how to use
> the core kernel interfaces. So before going further I thought it would
> make sense to raise the most obvious questions that came to my mind
> while writing this draft:
>
> - Where and how to store subdevices. The timer device is currently a
>   direct member of the GPU structure. It might work for GSP devices
>   which are IIUC supposed to have at least a few fixed devices required
>   to bring the GSP up ; but as a general rule this probably won't scale
>   as not all subdevices are present on all GPU variants, or in the same
>   numbers. So we will probably need to find an equivalent to the
>   `subdev` linked list in Nouveau.
I deliberately avoided doing this.

My reasoning is this isn't like nouveau, where we control a bunch of
devices, we have one mission, bring up GSP, if that entails a bunch of
fixed function blocks being setup in a particular order then let's
just deal with that.

If things become optional later we can move to Option<> or just have a
completely new path. But in those cases I'd make the Option
<TuringGSPBootDevices> rather than Option<Sec2>,
Option<NVDEC>, etc,
but I think we need to look at the boot requirements on other GSP
devices to know.

I just don't see any case where we need to iterate over the subdevices
in any form of list that makes sense and doesn't lead to the nouveau
design which is a pain in the ass to tease out any sense of ordering
or hierarchy.

Just be explicit, boot the devices you need in the order you need to
get GSP up and running.
>
> - BAR sharing between subdevices. Right now each subdevice gets access
>   to the full BAR range. I am wondering whether we could not split it
>   into the relevant slices for each-subdevice, and transfer ownership of
>   each slice to the device that is supposed to use it. That way each
>   register would have a single owner, which is arguably safer - but
>   maybe not as flexible as we will need down the road?
This could be useful, again the mission is mostly not to be hitting
registers since GSP will deal with it, the only case I know that it
won't work is, the GSP CPU sequencer code gets a script from the
device, the script tells you to directly hit registers, with no real
bounds checking, so I don't know if this is practical.
>
> - On a related note, since the BAR is behind a Devres its availability
>   must first be secured before any hardware access using try_access().
>   Doing this on a per-register or per-operation basis looks overkill, so
>   all methods that access the BAR take a reference to it, allowing to
>   call try_access() from the highest-level caller and thus reducing the
>   number of times this needs to be performed. Doing so comes at the cost
>   of an extra argument to most subdevice methods ; but also with the
>   benefit that we don't need to put the BAR behind another Arc and
share
>   it across all subdevices. I don't know which design is better here,
>   and input would be very welcome.
We can't pass down the bar, because it takes a devres lock and it
interferes with lockdep ordering, even hanging onto devres too long
caused me lockdep issues.

We should only be doing try access on registers that are runtime
sized, is this a lot of them? Do we expect to be hitting a lot of
registers in an actual fast path?
> - We will probably need sometime like a `Subdevice` trait or something
>   down the road, but I'll wait until we have more than one subdevice to
>   think about it.
Again I'm kinda opposed to this idea, I don't think it buys anything,
with GSP we just want to boot, after that we never touch most of the
subdevices ever again.

Dave.
>
> The first 2 patches are small additions to the core Rust modules, that
> the following patches make use of and which might be useful for other
> drivers as well. The last patch is the naive implementation of the timer
> device. I don't expect it to stay this way at all, so please point out
> all the deficiencies in this very early code! :)
>
> [1] https://lore.kernel.org/nouveau/20250209173048.17398-1-dakr at
kernel.org/
>
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
> Alexandre Courbot (3):
>       rust: add useful ops for u64
>       rust: make ETIMEDOUT error available
>       gpu: nova-core: add basic timer device
>
>  drivers/gpu/nova-core/driver.rs    |  4 +-
>  drivers/gpu/nova-core/gpu.rs       | 35 ++++++++++++++-
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 43 ++++++++++++++++++
>  drivers/gpu/nova-core/timer.rs     | 91
++++++++++++++++++++++++++++++++++++++
>  rust/kernel/error.rs               |  1 +
>  rust/kernel/lib.rs                 |  1 +
>  rust/kernel/num.rs                 | 32 ++++++++++++++
>  8 files changed, 206 insertions(+), 2 deletions(-)
> ---
> base-commit: 6484e46f33eac8dd42aa36fa56b51d8daa5ae1c1
> change-id: 20250216-nova_timer-c69430184f54
>
> Best regards,
> --
> Alexandre Courbot <acourbot at nvidia.com>
>

Nouveau - Feb 2025 - [PATCH RFC 2/3] rust: make ETIMEDOUT error available

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[PATCH RFC 1/3] rust: add useful ops for u64

[PATCH RFC 2/3] rust: make ETIMEDOUT error available

[PATCH RFC 3/3] gpu: nova-core: add basic timer device

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation