On Fri, May 31, 2024 at 04:50:32PM +0900, FUJITA Tomonori
wrote:> On Thu, 30 May 2024 08:47:25 +0200
> Danilo Krummrich <dakr at redhat.com> wrote:
>
> >> >> >> For a Rust PHY driver, you know that you have a
valid pointer to C's
> >> >> >> device object of C's PHY device during the
probe callback. The driver
> >> >> >> creates a Rust device object to wrap the C
pointer to the C's device
> >> >> >> object and passes it to the firmware
abstractions. The firmware
> >> >> >> abstractions gets the C's pointer from the
Rust object and calls C's
> >> >> >> function to load firmware, returns the result.
> >> >> >>
> >> >> >> You have concerns about the simple code like the
following?
> >> >> >>
> >> >> >>
> >> >> >> diff --git a/rust/kernel/device.rs
b/rust/kernel/device.rs
> >> >> >> new file mode 100644
> >> >> >> index 000000000000..6144437984a9
> >> >> >> --- /dev/null
> >> >> >> +++ b/rust/kernel/device.rs
> >> >> >> @@ -0,0 +1,30 @@
> >> >> >> +// SPDX-License-Identifier: GPL-2.0
> >> >> >> +
> >> >> >> +//! Generic devices that are part of the
kernel's driver model.
> >> >> >> +//!
> >> >> >> +//! C header:
[`include/linux/device.h`](srctree/include/linux/device.h)
> >> >> >> +
> >> >> >> +use crate::types::Opaque;
> >> >> >> +
> >> >> >> +#[repr(transparent)]
> >> >> >> +pub struct
Device(Opaque<bindings::device>);
> >> >> >> +
> >> >> >> +impl Device {
> >> >> >> + /// Creates a new [`Device`] instance from
a raw pointer.
> >> >> >> + ///
> >> >> >> + /// # Safety
> >> >> >> + ///
> >> >> >> + /// For the duration of 'a, the pointer
must point at a valid `device`.
> >> >> >
> >> >> > If the following rust code does what this comment
says, then sure, I'm
> >> >> > ok with it for now if it helps you all out with
stuff like the firmware
> >> >> > interface for the phy rust code.
> >> >>
> >> >> Great, thanks a lot!
> >> >>
> >> >> Danilo and Wedson, are there any concerns about pushing
this patch [1]
> >> >> for the firmware abstractions?
> >> >
> >> > Well, if everyone is fine with this one I don't see why
we can't we go with [1]
> >> > directly? AFAICS, we'd only need the following fix:
> >> >
> >> > -//! C header:
[`include/linux/device.h`](../../../../include/linux/device.h)
> >> > +//! C header:
[`include/linux/device.h`](srctree/include/linux/device.h)
> >> >
> >> > [1]
https://lore.kernel.org/rust-for-linux/20240520172554.182094-2-dakr at
redhat.com/
> >>
> >> The difference is that your patch touches the reference count of a
> >> struct device. My patch doesn't.
> >>
> >> The following part in your patch allows the rust code to freely
play
> >> with the reference count of a struct device. Your Rust drm driver
> >> interact with the reference count in a different way than C's
drm
> >> drivers if I understand correctly. I'm not sure that Greg will
be
> >> convinenced with that approach.
> >
> > I don't see how this is different than what we do in C. If you
(for whatever
> > reason) want to keep a pointer to a struct device you should make sure
to
> > increase the reference count of this device, such that it can't
get freed for
> > the time being.
> >
> > This is a 1:1 representation of that and conceptually identical.
>
> A drm driver does such? If a drm driver allocates two types of
> driver-specific data and keep a pointer to the pci device, then the
> driver calls get_device() twice to increase the reference count of the
> pci's device?
Think about it more generically. If you store a pointer to a device in some
structure (driver private data, generic subsystem structure, etc.), can you
guarantee that without acquiring a reference that the device' reference
count
doesn't drop to zero while your structure is alive?
There are cases where you can't, e.g. with Arc<device::Data>. How do
you
guarantee (generically for every driver) the last reference of your device data
is dropped with the driver's remove callback? If we make it the driver's
responsibility to care about this the whole thing is unsafe as in C.
In some cases you can, mostly the ones where you free a structure in the
driver's remove callback. But again, then there is no advantage over what we
do
in C. What if you change the lifetime of your structure later on, then you may
introduce a bug.
>
> At the very least, network device drivers don't. a driver doesn't
play
> with the reference count. The network subsystem does. And, the network
A quick 'grep' shows 23 occurrences of get_device() in network drivers.
> subsystem doesn't care about how many pointers to a pci device a
> driver keeps.
If none of the drivers has structures storing a pointer to a pci device that out
live the driver's remove callback that's fine.
>
> With the patch [1], a driver plays with the reference count of a
> device and directly calls get/put_device. It's a different than C
> drivers for me (not sure about drm subsystem though).
It's not different. Again, when a C driver stores a device pointer in a
structure whose lifetime is not bound a special boundary, like the device remove
callback, it must increase the reference count.
It's just that most of the time we are moving within this boundary and in C
we
just rely on that. In Rust we can only get this safe by taking a reference for
every pointer we store. And making it safe to use seems to be the goal.
Generally, the whole point of a reference count is that everyone who owns a
pointer to this shared structure increases / decreases the reference count
accordingly.
>
> But I might be misunderstanding that Greg isn't convinced with this
> reference count thing. We need to wait for his response.
>
>
> >> +// SAFETY: Instances of `Device` are always ref-counted.
> >> +unsafe impl crate::types::AlwaysRefCounted for Device {
> >> + fn inc_ref(&self) {
> >> + // SAFETY: The existence of a shared reference guarantees
that the refcount is nonzero.
> >> + unsafe { bindings::get_device(self.as_raw()) };
> >> + }
> >> +
> >> + unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> >> + // SAFETY: The safety requirements guarantee that the
refcount is nonzero.
> >> + unsafe { bindings::put_device(obj.cast().as_ptr()) }
> >> + }
> >> +}
> >>
> >> The following comments give the impression that Rust abstractions
> >> wrongly interact with the reference count; callers check out the
> >> reference counter. Nobody should do that.
> >
> > No, saying that the caller must ensure that the device "has a
non-zero reference
> > count" is a perfectly valid requirement.
> >
> > It doensn't mean that the calling code has to peek the actual
reference count,
> > but it means that it must be ensured that while we try to increase the
reference
> > count it can't drop to zero unexpectedly.
>
> Yeah, the same requirements but expressed differently.
Well, instead of
"ensure that `ptr` is valid, non-null, and has a non-zero reference
count"
you propose
"the pointer must point at a valid `device`".
When I ask you to specify what "pointer to valid device" means, what
would be
the answer?
Since we're still discussing this in the thread of the firmware abstraction,
if
you have any further concerns, let's discuss them in the thread for the
corresponding patch.
Once we get to a conclusion I can send a series with only the device and firmare
abstractions such that we can get them in outside of the scope of the reset of
both series to get your driver going.
- Danilo
>
> But again, I might be misunderstanding. Greg might have a different reason.
>
> > Your patch, as a subset of this one, has the same requirements as
listed below.
> >
> >>
> >> + /// Callers must ensure that `ptr` is valid, non-null, and
has a non-zero reference count.
> >> + pub unsafe fn from_raw(ptr: *mut bindings::device) ->
ARef<Self> {
> >>
> >> + /// Callers must ensure that `ptr` is valid, non-null, and
has a non-zero reference count for
> >> + /// the entire duration when the returned reference exists.
> >> + pub unsafe fn as_ref<'a>(ptr: *mut
bindings::device) -> &'a Self {
> >> + // SAFETY: Guaranteed by the safety requirements of the
function.
> >> + unsafe { &*ptr.cast() }
> >> + }
> >>
> >
> >
>