thr3ads.net - Nouveau - [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs [Jul 2024]

If this information is useful, please help other people find it:
Share via:

Danilo Krummrich

2024-Jul-31 14:54 UTC

[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs

On Tue, Jul 30, 2024 at 09:49:15PM +0000, Timur Tabi
wrote:> > "Driver exit" is a bit of an undefined term, the closest
thing is probably
> > "driver detach" (from a device). In this case I think you
actually mean
> > "module
> > exit" though.
> 
> Yes, I use driver and module interchangeably, but I guess that's not
> accurate.
Yes, it's not. They're entirely different things. Technically you
don't even
need a module to register a driver. And within a module you can register N
drivers, including zero.

It's just that in the typical application, you register a driver in
module_init() and unregister it in module_exit().
> > > + * backing resources, such as logging buffers.
> > > + */
> > > +struct nvif_log {
> > > +	struct list_head entry;
> > > +	void (*shutdown)(struct nvif_log *log);
> > > +};
> > > +
> > > +#define NVIF_LOGS_DECLARE(_log) \
> > > +	struct nvif_log _log = { LIST_HEAD_INIT(_log.entry) }
> > 
> > If you declare this as
> > 
> > 	#define NVIF_LOGS_DECLARE(_log) \
> > 		struct nvif_log _log = { LIST_HEAD_INIT(_log.entry),
nvif_log_shutdown }
> > 
> > and change the signature of nvif_log_shutdown() to
> > 
> > 	static inline void nvif_log_shutdown(struct nvif_log *logs)
> > 
> > you can just call
> > 
> > 	gsp_logs.shutdown(&gsp_logs);
> > 
> > in nouveau_drm_exit().
> > 
> > Admittedly, maybe a bit too sneaky though. :)
> 
> gsp_logs.shutdown(&gsp_logs) -- are you sure you want this? ?This is
some
> weird C++ wanna-be code, IMHO. ?I don't think this is an improvement. 
I'd
> rather keep it as-is.
That's why I wrote "maybe a bit too sneaky". :)

I think what I asked for originally in one of the last versions of this patch
is having both, struct nvif_log (exactly the way you have it) and a separate
struct nvif_logs:

	struct nvif_logs {
		struct list_head head;
	};

Then you use this in NVIF_LOGS_DECLARE() and nvif_log_shutdown()

	static inline void nvif_log_shutdown(struct nvif_logs *logs)

and in nouveau_drm_exit() you just pass &gsp_logs.

	nvif_log_shutdown(&gsp_logs);

This way things are more type safe, i.e. nvif_log_shutdown() can't be called
with a random list_head and struct nvif_log::shutdown can't be called with
the
"head instance" of struct struct nvif_log.
> 
> > > +/*
> > > + * GSP-RM uses a pseudo-class mechanism to define of a variety
of per-"engine"
> > > + * data structures, and each engine has a "class ID"
genererated by a
> > > + * pre-processor. This is the class ID for the PMU.
> > > + */
> > > +#define NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU		0xf3d722
> > > +
> > > +/**
> > > + * rpc_ucode_libos_print_v1E_08 - RPC payload for libos print
buffers
> > 
> > Is this structure versioned? If so, does it relate to a specific
GSP-RM version?
> 
> Yes, the structure is versioned, but it's a little weird.  When a new
RPC is
> added, the developer can choose to use any version number that matches the
> "current" VGPU version or earlier. 
src/nvidia/generated/g_rpc-structures.h
> is generated from a special "def" file, and the developer just
chooses
> whatever number he wants.  Typically, the developer will choose whatever
> version that VGPU is at when he starts working on the code.  But since this
> is the first and only version of rpc_ucode_libos_print, he could have
chosen
> v01_00.
> 
> So "previous RPC version (if it exists)" < "new RPC
version" <= "current
> VGPU version" at the time the RPC was created/updated.
Pretty interesting, thanks for clarifying! (No need to document it though.)
> > > +static void r535_gsp_retain_logging(struct nvkm_gsp *gsp)
> > > +{
> > > +	struct device *dev = gsp->subdev.device->dev;
> > > +	struct dentry *root, *dir;
> > > +	struct r535_gsp_log *log;
> > > +	int ret;
> > > +
> > > +	/* If we were told not to keep logs, then don't. */
> > > +	if (!keep_gsp_logging)
> > > +		return;
> > > +
> > > +	/* Check to make sure at least one buffer has data. */
> > > +	if (is_empty(&gsp->blob_init) &&
is_empty(&gsp->blob_intr) &&
> > > +	    is_empty(&gsp->blob_rm) &&
is_empty(&gsp->blob_rm)) {
> > > +		nvkm_warn(&gsp->subdev, "all logging buffers are
empty\n");
> > > +		return;
> > > +	}
> > > +
> > > +	/* Find the 'dri' root debugfs entry. Every GPU has a
dentry under it */
> > > +	root = debugfs_lookup("dri", NULL);
> > > +	if (IS_ERR(root)) {
> > 
> > I don't think this can ever happen. This entry is created in
drm_core_init().
Oh, I see how this reads. I think I didn't quite finish the reply. Sorry for
the
confusion.
> 
> So first, the check is wrong.  It should be "if (!root)".
That's what I meant, it can't be an ERR_PTR.
> 
> However, I think it can happen.  drm_core_init() does not fail if
> debugfs_create_dir() fails:
> 
> 	drm_debugfs_root = debugfs_create_dir("dri", NULL);
> 
> It doesn't even print a warning message.  It just keeps going.  So I
think
> there should be some error checking somewhere.
For DRM probably not, since the ERR_PTR is honored by other debugfs functions
as described in [1].
> 
> I tested this, and if drm_core_init() fails to create the dentry, then
> r535_gsp_retain_logging() will just keep going trying to create debugfs
> entries, because a root of NULL is actually valid, and the entries are
> created in /sys/kernel/debug/0000:65:00.0/ instead of
> /sys/kernel/debug/dri/0000:65:00.0/
This is because debugfs_lookup() doesn't return an ERR_PTR, but NULL.
It'd
probably better go along with what is documented in [1] if debugfs_lookup()
would return ERR_PTR(-ENOENT) if no entry was found.

(This is where I was heading to in my previous reply.)
> 
> In fact, I think I found a small bug in dput():
> 
> void dput(struct dentry *dentry)
> {
> 	if (!dentry)
> 		return;
> 
> This should probably be "if (IS_ERR_OR_NULL(dentry))".
Yes, I agree, good catch.

[1] https://elixir.bootlin.com/linux/v6.10/source/fs/debugfs/inode.c#L586

Timur Tabi

2024-Jul-31 16:22 UTC

head link

[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs

On Wed, 2024-07-31 at 16:54 +0200, Danilo Krummrich
wrote:> > 
> > gsp_logs.shutdown(&gsp_logs) -- are you sure you want this? ?This
is some
> > weird C++ wanna-be code, IMHO. ?I don't think this is an
improvement.  I'd
> > rather keep it as-is.
> 
> That's why I wrote "maybe a bit too sneaky". :)
> 
> I think what I asked for originally in one of the last versions of this
patch
> is having both, struct nvif_log (exactly the way you have it) and a
separate
> struct nvif_logs:
> 
> 	struct nvif_logs {
> 		struct list_head head;
> 	};
> 
> Then you use this in NVIF_LOGS_DECLARE() and nvif_log_shutdown()
> 
> 	static inline void nvif_log_shutdown(struct nvif_logs *logs)
> 
> and in nouveau_drm_exit() you just pass &gsp_logs.
> 
> 	nvif_log_shutdown(&gsp_logs);
> 
> This way things are more type safe, i.e. nvif_log_shutdown() can't be
called
> with a random list_head and struct nvif_log::shutdown can't be called
with the
> "head instance" of struct struct nvif_log.
Ok, I think I got it this time.  I'll post a v7 soon.
> > However, I think it can happen.  drm_core_init() does not fail if
> > debugfs_create_dir() fails:
> > 
> > 	drm_debugfs_root = debugfs_create_dir("dri", NULL);
> > 
> > It doesn't even print a warning message.  It just keeps going.  So
I think
> > there should be some error checking somewhere.
> 
> For DRM probably not, since the ERR_PTR is honored by other debugfs
functions
> as described in [1].
From that comment:

 * Drivers should generally work fine even if debugfs fails to init anyway.

So technically you are correct, that Nouveau will still "work" if
debugfs
fails to init, but since this code is all about debugfs, and since I don't
want to blindly allocate buffers and linked lists that won't actually do
anything, I would prefer that the code bails early if the infrastructure is
not there.
> > I tested this, and if drm_core_init() fails to create the dentry, then
> > r535_gsp_retain_logging() will just keep going trying to create
debugfs
> > entries, because a root of NULL is actually valid, and the entries are
> > created in /sys/kernel/debug/0000:65:00.0/ instead of
> > /sys/kernel/debug/dri/0000:65:00.0/
> 
> This is because debugfs_lookup() doesn't return an ERR_PTR, but NULL.
It'd
> probably better go along with what is documented in [1] if debugfs_lookup()
> would return ERR_PTR(-ENOENT) if no entry was found.
> 
> (This is where I was heading to in my previous reply.)
So I'm not sure what you're asking now.  I definitely think that the
"if
(!root)" check is necessary, because we don't want to accidentally
create
these debugfs entries in /sys/kernel/debug/0000:65:00.0/.

So that leaves the error checks for debugfs_create_dir() and
debugfs_create_blob() in r535_gsp_copy_log().  Both of these functions could
fail. 

If I ignore the error from debugfs_create_dir(), then the code will allocate
buffers that are never used, and make false statements about the existence
of them.

Same thing is true with debugfs_create_blob().

	dir = debugfs_create_blob(name, 0444, parent, d);
	if (IS_ERR(dir)) {
		kfree(d->data);
		memset(d, 0, sizeof(*d));
		return PTR_ERR(dir);
	}

The one thing I could do is that is ignore errors from r535_gsp_copy_log(),
and just blindly try to create all logs even if some of them fail.  I can't
imagine a situation where create the loginit debugfs entry could fail, but
then creating logintr succeeds.> 
> > In fact, I think I found a small bug in dput():
> > 
> > void dput(struct dentry *dentry)
> > {
> > 	if (!dentry)
> > 		return;
> > 
> > This should probably be "if (IS_ERR_OR_NULL(dentry))".
> 
> Yes, I agree, good catch.
I will submit a separate patch for that.

Maybe Matching Threads

Search for more maybe matching threads

Nouveau - Jul 2024 - [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs

[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs

[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs

Maybe Matching Threads