Danilo Krummrich
2025-Feb-25 15:06 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote: > > CC: Gary > > > > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote: > >> This inability to sleep while we are accessing registers seems very > >> constraining to me, if not dangerous. It is pretty common to have > >> functions intermingle hardware accesses with other operations that might > >> sleep, and this constraint means that in such cases the caller would > >> need to perform guard lifetime management manually: > >> > >> let bar_guard = bar.try_access()?; > >> /* do something non-sleeping with bar_guard */ > >> drop(bar_guard); > >> > >> /* do something that might sleep */ > >> > >> let bar_guard = bar.try_access()?; > >> /* do something non-sleeping with bar_guard */ > >> drop(bar_guard); > >> > >> ... > >> > >> Failure to drop the guard potentially introduces a race condition, which > >> will receive no compile-time warning and potentialy not even a runtime > >> one unless lockdep is enabled. This problem does not exist with the > >> equivalent C code AFAICT, which makes the Rust version actually more > >> error-prone and dangerous, the opposite of what we are trying to achieve > >> with Rust. Or am I missing something? > > > > Generally you are right, but you have to see it from a different perspective. > > > > What you describe is not an issue that comes from the design of the API, but is > > a limitation of Rust in the kernel. People are aware of the issue and with klint > > [1] there are solutions for that in the pipeline, see also [2] and [3]. > > > > [1] https://rust-for-linux.com/klint > > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md > > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/ > > Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by > its own admission. But even if the ignore the safety issue, the other one > (ergonomics) is still there. > > Basically this way of accessing registers imposes quite a mental burden on its > users. It requires a very different (and harsher) discipline than when writing > the same code in CWe need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and drm_dev_unplug().> and the correct granularity to use is unclear to me. > > For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a > particular register in a busy loop, should I call try_access() once before the > loop? Or every time before accessing the register?I think we should re-acquire the guard in each iteration and drop it before the delay. I think a simple closure would work very well for this pattern?> I'm afraid having to check > that the resource is still alive before accessing any register is going to > become tedious very quickly. > > I understand that we want to protect against accessing the IO region of an > unplugged device ; but still there is no guarantee that the device won't be > unplugged in the middle of a critical section, however short. Thus the driver > code should be able to recognize that the device has fallen off the bus when it > e.g. gets a bunch of 0xff instead of a valid value. So do we really need to > extra protection that AFAICT isn't used in C?As mentioned above, we already do similar things in C. Also, think about what's the alternative. If we remove the Devres wrapper of pci::Bar, we lose the control over the lifetime of the bar mapping and it can easily out-live the device / driver binding. This makes the API unsound. With this drivers would be able to keep resources acquired. What if after a hotplug the physical address region is re-used and to be mapped by another driver?
Alexandre Courbot
2025-Feb-25 15:23 UTC
[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:> On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote: >> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote: >> > CC: Gary >> > >> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote: >> >> This inability to sleep while we are accessing registers seems very >> >> constraining to me, if not dangerous. It is pretty common to have >> >> functions intermingle hardware accesses with other operations that might >> >> sleep, and this constraint means that in such cases the caller would >> >> need to perform guard lifetime management manually: >> >> >> >> let bar_guard = bar.try_access()?; >> >> /* do something non-sleeping with bar_guard */ >> >> drop(bar_guard); >> >> >> >> /* do something that might sleep */ >> >> >> >> let bar_guard = bar.try_access()?; >> >> /* do something non-sleeping with bar_guard */ >> >> drop(bar_guard); >> >> >> >> ... >> >> >> >> Failure to drop the guard potentially introduces a race condition, which >> >> will receive no compile-time warning and potentialy not even a runtime >> >> one unless lockdep is enabled. This problem does not exist with the >> >> equivalent C code AFAICT, which makes the Rust version actually more >> >> error-prone and dangerous, the opposite of what we are trying to achieve >> >> with Rust. Or am I missing something? >> > >> > Generally you are right, but you have to see it from a different perspective. >> > >> > What you describe is not an issue that comes from the design of the API, but is >> > a limitation of Rust in the kernel. People are aware of the issue and with klint >> > [1] there are solutions for that in the pipeline, see also [2] and [3]. >> > >> > [1] https://rust-for-linux.com/klint >> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md >> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/ >> >> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by >> its own admission. But even if the ignore the safety issue, the other one >> (ergonomics) is still there. >> >> Basically this way of accessing registers imposes quite a mental burden on its >> users. It requires a very different (and harsher) discipline than when writing >> the same code in C > > We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and > drm_dev_unplug().Granted, but the use of these is much more coarsed-grained than what is expected of IO resources, right?> >> and the correct granularity to use is unclear to me. >> >> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a >> particular register in a busy loop, should I call try_access() once before the >> loop? Or every time before accessing the register? > > I think we should re-acquire the guard in each iteration and drop it before the > delay. I think a simple closure would work very well for this pattern? > >> I'm afraid having to check >> that the resource is still alive before accessing any register is going to >> become tedious very quickly. >> >> I understand that we want to protect against accessing the IO region of an >> unplugged device ; but still there is no guarantee that the device won't be >> unplugged in the middle of a critical section, however short. Thus the driver >> code should be able to recognize that the device has fallen off the bus when it >> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to >> extra protection that AFAICT isn't used in C? > > As mentioned above, we already do similar things in C. > > Also, think about what's the alternative. If we remove the Devres wrapper of > pci::Bar, we lose the control over the lifetime of the bar mapping and it can > easily out-live the device / driver binding. This makes the API unsound.Oh my issue is not with the Devres wrapper, I think it makes sense - it's more the use of RCU to control access to the resource that I find too constraining. And I'm pretty sure there will be more users of the same opinion as more drivers using it get written.> > With this drivers would be able to keep resources acquired. What if after a > hotplug the physical address region is re-used and to be mapped by another > driver?Actually - wouldn't that issue also be addressed by a PCI equivalent to drm_dev_enter() and friends that ensures the device (and thus its devres resources) stay in place? Using Rust, I can imagine (but not picture precisely yet) some method of the device that returns a reference to an inner structure containing its resources, available with immediate access. Since it would be coarser-grained, it could rely on something less constraining than RCU without a noticeable performance penalty.