Danilo Krummrich
2024-Jul-31 14:54 UTC
[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
On Tue, Jul 30, 2024 at 09:49:15PM +0000, Timur Tabi wrote:> > "Driver exit" is a bit of an undefined term, the closest thing is probably > > "driver detach" (from a device). In this case I think you actually mean > > "module > > exit" though. > > Yes, I use driver and module interchangeably, but I guess that's not > accurate.Yes, it's not. They're entirely different things. Technically you don't even need a module to register a driver. And within a module you can register N drivers, including zero. It's just that in the typical application, you register a driver in module_init() and unregister it in module_exit().> > > + * backing resources, such as logging buffers. > > > + */ > > > +struct nvif_log { > > > + struct list_head entry; > > > + void (*shutdown)(struct nvif_log *log); > > > +}; > > > + > > > +#define NVIF_LOGS_DECLARE(_log) \ > > > + struct nvif_log _log = { LIST_HEAD_INIT(_log.entry) } > > > > If you declare this as > > > > #define NVIF_LOGS_DECLARE(_log) \ > > struct nvif_log _log = { LIST_HEAD_INIT(_log.entry), nvif_log_shutdown } > > > > and change the signature of nvif_log_shutdown() to > > > > static inline void nvif_log_shutdown(struct nvif_log *logs) > > > > you can just call > > > > gsp_logs.shutdown(&gsp_logs); > > > > in nouveau_drm_exit(). > > > > Admittedly, maybe a bit too sneaky though. :) > > gsp_logs.shutdown(&gsp_logs) -- are you sure you want this? ?This is some > weird C++ wanna-be code, IMHO. ?I don't think this is an improvement. I'd > rather keep it as-is.That's why I wrote "maybe a bit too sneaky". :) I think what I asked for originally in one of the last versions of this patch is having both, struct nvif_log (exactly the way you have it) and a separate struct nvif_logs: struct nvif_logs { struct list_head head; }; Then you use this in NVIF_LOGS_DECLARE() and nvif_log_shutdown() static inline void nvif_log_shutdown(struct nvif_logs *logs) and in nouveau_drm_exit() you just pass &gsp_logs. nvif_log_shutdown(&gsp_logs); This way things are more type safe, i.e. nvif_log_shutdown() can't be called with a random list_head and struct nvif_log::shutdown can't be called with the "head instance" of struct struct nvif_log.> > > > +/* > > > + * GSP-RM uses a pseudo-class mechanism to define of a variety of per-"engine" > > > + * data structures, and each engine has a "class ID" genererated by a > > > + * pre-processor. This is the class ID for the PMU. > > > + */ > > > +#define NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU 0xf3d722 > > > + > > > +/** > > > + * rpc_ucode_libos_print_v1E_08 - RPC payload for libos print buffers > > > > Is this structure versioned? If so, does it relate to a specific GSP-RM version? > > Yes, the structure is versioned, but it's a little weird. When a new RPC is > added, the developer can choose to use any version number that matches the > "current" VGPU version or earlier. src/nvidia/generated/g_rpc-structures.h > is generated from a special "def" file, and the developer just chooses > whatever number he wants. Typically, the developer will choose whatever > version that VGPU is at when he starts working on the code. But since this > is the first and only version of rpc_ucode_libos_print, he could have chosen > v01_00. > > So "previous RPC version (if it exists)" < "new RPC version" <= "current > VGPU version" at the time the RPC was created/updated.Pretty interesting, thanks for clarifying! (No need to document it though.)> > > +static void r535_gsp_retain_logging(struct nvkm_gsp *gsp) > > > +{ > > > + struct device *dev = gsp->subdev.device->dev; > > > + struct dentry *root, *dir; > > > + struct r535_gsp_log *log; > > > + int ret; > > > + > > > + /* If we were told not to keep logs, then don't. */ > > > + if (!keep_gsp_logging) > > > + return; > > > + > > > + /* Check to make sure at least one buffer has data. */ > > > + if (is_empty(&gsp->blob_init) && is_empty(&gsp->blob_intr) && > > > + is_empty(&gsp->blob_rm) && is_empty(&gsp->blob_rm)) { > > > + nvkm_warn(&gsp->subdev, "all logging buffers are empty\n"); > > > + return; > > > + } > > > + > > > + /* Find the 'dri' root debugfs entry. Every GPU has a dentry under it */ > > > + root = debugfs_lookup("dri", NULL); > > > + if (IS_ERR(root)) { > > > > I don't think this can ever happen. This entry is created in drm_core_init().Oh, I see how this reads. I think I didn't quite finish the reply. Sorry for the confusion.> > So first, the check is wrong. It should be "if (!root)".That's what I meant, it can't be an ERR_PTR.> > However, I think it can happen. drm_core_init() does not fail if > debugfs_create_dir() fails: > > drm_debugfs_root = debugfs_create_dir("dri", NULL); > > It doesn't even print a warning message. It just keeps going. So I think > there should be some error checking somewhere.For DRM probably not, since the ERR_PTR is honored by other debugfs functions as described in [1].> > I tested this, and if drm_core_init() fails to create the dentry, then > r535_gsp_retain_logging() will just keep going trying to create debugfs > entries, because a root of NULL is actually valid, and the entries are > created in /sys/kernel/debug/0000:65:00.0/ instead of > /sys/kernel/debug/dri/0000:65:00.0/This is because debugfs_lookup() doesn't return an ERR_PTR, but NULL. It'd probably better go along with what is documented in [1] if debugfs_lookup() would return ERR_PTR(-ENOENT) if no entry was found. (This is where I was heading to in my previous reply.)> > In fact, I think I found a small bug in dput(): > > void dput(struct dentry *dentry) > { > if (!dentry) > return; > > This should probably be "if (IS_ERR_OR_NULL(dentry))".Yes, I agree, good catch. [1] https://elixir.bootlin.com/linux/v6.10/source/fs/debugfs/inode.c#L586
Timur Tabi
2024-Jul-31 16:22 UTC
[PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
On Wed, 2024-07-31 at 16:54 +0200, Danilo Krummrich wrote:> > > > gsp_logs.shutdown(&gsp_logs) -- are you sure you want this? ?This is some > > weird C++ wanna-be code, IMHO. ?I don't think this is an improvement. I'd > > rather keep it as-is. > > That's why I wrote "maybe a bit too sneaky". :) > > I think what I asked for originally in one of the last versions of this patch > is having both, struct nvif_log (exactly the way you have it) and a separate > struct nvif_logs: > > struct nvif_logs { > struct list_head head; > }; > > Then you use this in NVIF_LOGS_DECLARE() and nvif_log_shutdown() > > static inline void nvif_log_shutdown(struct nvif_logs *logs) > > and in nouveau_drm_exit() you just pass &gsp_logs. > > nvif_log_shutdown(&gsp_logs); > > This way things are more type safe, i.e. nvif_log_shutdown() can't be called > with a random list_head and struct nvif_log::shutdown can't be called with the > "head instance" of struct struct nvif_log.Ok, I think I got it this time. I'll post a v7 soon.> > However, I think it can happen. drm_core_init() does not fail if > > debugfs_create_dir() fails: > > > > drm_debugfs_root = debugfs_create_dir("dri", NULL); > > > > It doesn't even print a warning message. It just keeps going. So I think > > there should be some error checking somewhere. > > For DRM probably not, since the ERR_PTR is honored by other debugfs functions > as described in [1].From that comment: * Drivers should generally work fine even if debugfs fails to init anyway. So technically you are correct, that Nouveau will still "work" if debugfs fails to init, but since this code is all about debugfs, and since I don't want to blindly allocate buffers and linked lists that won't actually do anything, I would prefer that the code bails early if the infrastructure is not there.> > I tested this, and if drm_core_init() fails to create the dentry, then > > r535_gsp_retain_logging() will just keep going trying to create debugfs > > entries, because a root of NULL is actually valid, and the entries are > > created in /sys/kernel/debug/0000:65:00.0/ instead of > > /sys/kernel/debug/dri/0000:65:00.0/ > > This is because debugfs_lookup() doesn't return an ERR_PTR, but NULL. It'd > probably better go along with what is documented in [1] if debugfs_lookup() > would return ERR_PTR(-ENOENT) if no entry was found. > > (This is where I was heading to in my previous reply.)So I'm not sure what you're asking now. I definitely think that the "if (!root)" check is necessary, because we don't want to accidentally create these debugfs entries in /sys/kernel/debug/0000:65:00.0/. So that leaves the error checks for debugfs_create_dir() and debugfs_create_blob() in r535_gsp_copy_log(). Both of these functions could fail. If I ignore the error from debugfs_create_dir(), then the code will allocate buffers that are never used, and make false statements about the existence of them. Same thing is true with debugfs_create_blob(). dir = debugfs_create_blob(name, 0444, parent, d); if (IS_ERR(dir)) { kfree(d->data); memset(d, 0, sizeof(*d)); return PTR_ERR(dir); } The one thing I could do is that is ignore errors from r535_gsp_copy_log(), and just blindly try to create all logs even if some of them fail. I can't imagine a situation where create the loginit debugfs entry could fail, but then creating logintr succeeds.>> > In fact, I think I found a small bug in dput(): > > > > void dput(struct dentry *dentry) > > { > > if (!dentry) > > return; > > > > This should probably be "if (IS_ERR_OR_NULL(dentry))". > > Yes, I agree, good catch.I will submit a separate patch for that.
Apparently Analagous Threads
- [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
- [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
- [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
- [PATCH 2/2] [v6] drm/nouveau: expose GSP-RM logging buffers via debugfs
- [PATCH 2/2] [v8] drm/nouveau: expose GSP-RM logging buffers via debugfs