thr3ads.net - Nouveau - [PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs [Feb 2024]

If this information is useful, please help other people find it:
Share via:

David Airlie

2024-Feb-06 00:35 UTC

[PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs

On Tue, Feb 6, 2024 at 7:45?AM Timur Tabi <ttabi at nvidia.com>
wrote:>
> The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers
> that have printf-like logs from GSP-RM and PMU encoded in them.
>
> LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA
> addresses are passed to GSP-RM during initialization.  The buffers are
> required for GSP-RM to initialize properly.
>
> LOGPMU is also allocated by Nouveau, but its contents are updated
> when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from
> GSP-RM.  Nouveau then copies the RPC to the buffer.
>
> The messages are encoded as an array of variable-length structures that
> contain the parameters to an NV_PRINTF call.  The format string and
> parameter count are stored in a special ELF image that contains only
> logging strings.  This image is not currently shipped with the Nvidia
> driver.
>
> There are two methods to extract the logs.
>
> OpenRM tries to load the logging ELF, and if present, parses the log
> buffers in real time and outputs the strings to the kernel console.
>
> Alternatively, and this is the method used by this patch, the buffers
> can be exposed to user space, and a user-space tool (along with the
> logging ELF image) can parse the buffer and dump the logs.
>
> This method has the advantage that it allows the buffers to be parsed
> even when the logging ELF file is not available to the user.  However,
> it has the disadvantage the debubfs entries need to remain until the
> driver is unloaded.
>
> The buffers are exposed via debugfs.  The debugfs entries must be
> created before GSP-RM is started, to ensure that they are available
> during GSP-RM initialization.
>
> If GSP-RM fails to initialize, then Nouveau immediately shuts down
> the GSP interface.  This would normally also deallocate the logging
> buffers, thereby preventing the user from capturing the debug logs.
> To avoid this, the keep-gsp-logging command line parameter can be
> specified.  This parmater is marked as *unsafe* (thereby taining the
> kernel) because the DMA buffer and debugfs entries are never
> deallocated, even if the driver unloads.  This gives the user the
> time to capture the logs, but it also means that resources can only
> be recovered by a reboot.
>
> An end-user can capture the logs using the following commands:
>
>     cp /sys/kernel/debug/nouveau/loginit loginit
>     cp /sys/kernel/debug/nouveau/logrm logrm
>     cp /sys/kernel/debug/nouveau/logintr logintr
>     cp /sys/kernel/debug/nouveau/logpmu logpmu
If we have 2 GPUs won't this conflict on driver load?

Do we need to at least make subdirs or if two early in boot to have
ids, use the pci path?

Dave.
>
> Since LOGPMU is not needed for GSP-RM initialization, it is only
> created if debugfs is available.  Otherwise, the
> NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored.
>
> Signed-off-by: Timur Tabi <ttabi at nvidia.com>
> ---
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  12 ++
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 182 +++++++++++++++++-
>  2 files changed, 190 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index 3fbc57b16a05..999e3be3f38c 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -5,6 +5,8 @@
>  #include <core/falcon.h>
>  #include <core/firmware.h>
>
> +#include <linux/debugfs.h>
> +
>  #define GSP_PAGE_SHIFT 12
>  #define GSP_PAGE_SIZE  BIT(GSP_PAGE_SHIFT)
>
> @@ -217,6 +219,16 @@ struct nvkm_gsp {
>
>         /* The size of the registry RPC */
>         size_t registry_rpc_size;
> +
> +       /*
> +        * Logging buffers in debugfs.  The wrapper objects need to remain
> +        * in memory until the dentry is deleted.
> +        */
> +       struct dentry *debugfs_logging_dir;
> +       struct debugfs_blob_wrapper blob_init;
> +       struct debugfs_blob_wrapper blob_intr;
> +       struct debugfs_blob_wrapper blob_rm;
> +       struct debugfs_blob_wrapper blob_pmu;
>  };
>
>  static inline bool
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index d065389e3618..8dc2729f5321 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1972,6 +1972,151 @@ r535_gsp_rmargs_init(struct nvkm_gsp *gsp, bool
resume)
>         return 0;
>  }
>
> +#define NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU         0xf3d722
> +
> +/**
> + * r535_gsp_msg_libos_print - capture log message from the PMU
> + * @priv: gsp pointer
> + * @fn: function number (ignored)
> + * @repv: pointer to libos print RPC
> + * @repc: message size
> + *
> + * See _kgspRpcUcodeLibosPrint
> + */
> +static int r535_gsp_msg_libos_print(void *priv, u32 fn, void *repv, u32
repc)
> +{
> +       struct nvkm_gsp *gsp = priv;
> +       struct nvkm_subdev *subdev = &gsp->subdev;
> +       struct {
> +               u32 ucodeEngDesc;
> +               u32 libosPrintBufSize;
> +               u8 libosPrintBuf[];
> +       } *rpc = repv;
> +       unsigned int data = rpc->ucodeEngDesc >> 8;
> +
> +       nvkm_debug(subdev, "received libos print from class 0x%x for
%u bytes\n",
> +                  data, rpc->libosPrintBufSize);
> +
> +       if (data != NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU) {
> +               nvkm_warn(subdev,
> +                         "received libos print from unknown class
0x%x\n",
> +                         data);
> +               return -ENOMSG;
> +       }
> +       if (rpc->libosPrintBufSize > GSP_PAGE_SIZE) {
> +               nvkm_error(subdev, "libos print is too large (%u
bytes)\n",
> +                          rpc->libosPrintBufSize);
> +               return -E2BIG;
> +
> +       }
> +       memcpy(gsp->blob_pmu.data, rpc->libosPrintBuf,
rpc->libosPrintBufSize);
> +
> +       return 0;
> +}
> +
> +/**
> + * r535_gsp_libos_debugfs_init - create logging debugfs entries
> + *
> + * Create the debugfs entries.  This exposes the log buffers to
> + * userspace so that an external tool can parse it.
> + *
> + * The 'logpmu' contains exception dumps from the PMU. It is
written via an
> + * RPC sent from GSP-RM and must be only 4KB.  We create it here because
it's
> + * only useful if there is a debugfs entry to expose it.  If we get the
PMU
> + * logging RPC and there is no debugfs entry, the RPC is just ignored.
> + *
> + * The blob_init, blob_rm, and blob_pmu objects can't be transient
> + * because debugfs_create_blob doesn't copy them.
> + *
> + * NOTE: OpenRM loads the logging elf image and prints the log messages
> + * in real-time. We may add that capability in the future, but that
> + * requires loading an ELF images that are not distributed with the
driver,
> + * and adding the parsing code to Nouveau.
> + *
> + * Ideally, this should be part of nouveau_debugfs_init(), but that
function
> + * is called much too late.  We really want to create these debugfs
entries
> + * before r535_gsp_booter_load() is called, so that if GSP-RM fails to
> + * initialize, there could still be a log to capture.
> + *
> + * If the unsafe command line pararameter 'keep-gsp-logging' is
specified,
> + * then the logging buffer and debugfs entries will be retained when the
> + * driver shuts down.  This is necessary to debug initialization failures,
> + * because otherwise the buffers will disappear before the logs can be
> + * captured.
> + */
> +static void r535_gsp_libos_debugfs_init(struct nvkm_gsp *gsp)
> +{
> +       struct dentry *dir_init, *dir_intr, *dir_rm, *dir_pmu;
> +       struct dentry *dir;
> +
> +       dir = debugfs_create_dir("nouveau", NULL);
> +       if (IS_ERR(dir)) {
> +               /* No debugfs */
> +               return;
> +       }
> +
> +       if (IS_ERR_OR_NULL(dir)) {
> +               nvkm_error(&gsp->subdev,
> +                          "error %li creating
/sys/kernel/debug/nouveau/\n", PTR_ERR(dir));
> +               return;
> +       }
> +
> +       gsp->blob_init.data = gsp->loginit.data;
> +       gsp->blob_init.size = gsp->loginit.size;
> +       dir_init = debugfs_create_blob("loginit", 0444, dir,
&gsp->blob_init);
> +       if (IS_ERR_OR_NULL(dir_init)) {
> +               nvkm_error(&gsp->subdev,
> +                          "failed to create
/sys/kernel/debug/nouveau/%s\n", "loginit");
> +               debugfs_remove(dir);
> +               return;
> +       }
> +
> +       gsp->blob_intr.data = gsp->logintr.data;
> +       gsp->blob_intr.size = gsp->logintr.size;
> +       dir_intr = debugfs_create_blob("logintr", 0444, dir,
&gsp->blob_intr);
> +       if (IS_ERR_OR_NULL(dir_intr)) {
> +               nvkm_error(&gsp->subdev,
> +                          "failed to create
/sys/kernel/debug/nouveau/%s\n", "logintr");
> +               debugfs_remove(dir);
> +               return;
> +       }
> +
> +       gsp->blob_rm.data = gsp->logrm.data;
> +       gsp->blob_rm.size = gsp->logrm.size;
> +       dir_rm = debugfs_create_blob("logrm", 0444, dir,
&gsp->blob_rm);
> +       if (IS_ERR_OR_NULL(dir_rm)) {
> +               nvkm_error(&gsp->subdev,
> +                          "failed to create
/sys/kernel/debug/nouveau/%s\n", "logrm");
> +               debugfs_remove(dir);
> +               return;
> +       }
> +
> +       /*
> +        * Since the PMU buffer is copied from an RPC, it doesn't need
to be
> +        * a DMA buffer.
> +        */
> +       gsp->blob_pmu.size = GSP_PAGE_SIZE;
> +       gsp->blob_pmu.data = kzalloc(gsp->blob_pmu.size, GFP_KERNEL);
> +       if (!gsp->blob_pmu.data) {
> +               debugfs_remove(dir);
> +               return;
> +       }
> +
> +       dir_pmu = debugfs_create_blob("logpmu", 0444, dir,
&gsp->blob_pmu);
> +       if (IS_ERR_OR_NULL(dir_pmu)) {
> +               nvkm_error(&gsp->subdev,
> +                          "failed to create
/sys/kernel/debug/nouveau/%s\n", "logpmu");
> +               kfree(gsp->blob_pmu.data);
> +               debugfs_remove(dir);
> +               return;
> +       }
> +
> +       r535_gsp_msg_ntfy_add(gsp, 0x0000100C, r535_gsp_msg_libos_print,
gsp);
> +       gsp->debugfs_logging_dir = dir;
> +
> +       nvkm_debug(&gsp->subdev, "created debugfs GSP-RM
logging entries\n");
> +}
> +
>  static inline u64
>  r535_gsp_libos_id8(const char *name)
>  {
> @@ -2021,7 +2166,11 @@ static void create_pte_array(u64 *ptes, dma_addr_t
addr, size_t size)
>   * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE.
>   *
>   * The physical address map for the log buffer is stored in the buffer
> - * itself, starting with offset 1. Offset 0 contains the "put"
pointer.
> + * itself, starting with offset 1. Offset 0 contains the "put"
pointer (pp).
> + * Initially, pp is equal to 0.  If the buffer has valid logging data in
it,
> + * then pp points to index into the buffer where the next logging entry
will
> + * be written.  Therefore, the logging data is valid if:
> + *   1 <= pp < sizeof(buffer)/sizeof(u64)
>   *
>   * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the
kernel is
>   * configured for a larger page size (e.g. 64K pages), we need to give
> @@ -2092,6 +2241,9 @@ r535_gsp_libos_init(struct nvkm_gsp *gsp)
>         args[3].size = gsp->rmargs.size;
>         args[3].kind = LIBOS_MEMORY_REGION_CONTIGUOUS;
>         args[3].loc  = LIBOS_MEMORY_REGION_LOC_SYSMEM;
> +
> +       r535_gsp_libos_debugfs_init(gsp);
> +
>         return 0;
>  }
>
> @@ -2373,6 +2525,18 @@ r535_gsp_dtor_fws(struct nvkm_gsp *gsp)
>         gsp->fws.rm = NULL;
>  }
>
> +/*
> + * If GSP-RM load fails, then the GSP nvkm object will be deleted, the
> + * logging debugfs entries will be deleted, and it will not be possible to
> + * debug the load failure.  The keep_gsp_logging parameter tells Nouveau
> + * not to free these resources, even if the driver is unloading.  In this
> + * case, the only recovery is a reboot.
> + */
> +static bool keep_gsp_logging;
> +module_param_unsafe(keep_gsp_logging, bool, 0600);
> +MODULE_PARM_DESC(keep_gsp_logging,
> +                "Do not remove the GSP-RM logging debugfs entries
upon exit");
> +
>  void
>  r535_gsp_dtor(struct nvkm_gsp *gsp)
>  {
> @@ -2392,9 +2556,19 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
>         r535_gsp_dtor_fws(gsp);
>
>         nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem);
> -       nvkm_gsp_mem_dtor(gsp, &gsp->loginit);
> -       nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
> -       nvkm_gsp_mem_dtor(gsp, &gsp->logrm);
> +
> +       if (keep_gsp_logging && gsp->debugfs_logging_dir)
> +               nvkm_warn(&gsp->subdev,
> +                       "GSP-RM logging buffers retained, reboot
required to recover\n");
> +       else {
> +               debugfs_remove(gsp->debugfs_logging_dir);
> +               gsp->debugfs_logging_dir = NULL;
> +
> +               kfree(gsp->blob_pmu.data);
> +               nvkm_gsp_mem_dtor(gsp, &gsp->loginit);
> +               nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
> +               nvkm_gsp_mem_dtor(gsp, &gsp->logrm);
> +       }
>  }
>
>  int
> --
> 2.34.1
>

Timur Tabi

2024-Feb-06 03:55 UTC

head link

[PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs

On Tue, 2024-02-06 at 10:35 +1000, David Airlie wrote:> > 
> > An end-user can capture the logs using the following commands:
> > 
> > ??? cp /sys/kernel/debug/nouveau/loginit loginit
> > ??? cp /sys/kernel/debug/nouveau/logrm logrm
> > ??? cp /sys/kernel/debug/nouveau/logintr logintr
> > ??? cp /sys/kernel/debug/nouveau/logpmu logpmu
> 
> If we have 2 GPUs won't this conflict on driver load?
> 
> Do we need to at least make subdirs or if two early in boot to have
> ids, use the pci path?
I knew I was forgetting something.  I will fix this to use per-GPU paths.  Any
suggestion as to what to use to differentiate?  PCI ID?  Is there some other
identifier?

Nouveau - Feb 2024 - [PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs

[PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs

[PATCH] drm/nouveau: expose GSP-RM logging buffers via debugfs