thr3ads.net - Nouveau - [Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM [Nov 2023]

If this information is useful, please help other people find it:
Share via:

Timur Tabi

2023-Nov-21 23:53 UTC

[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM

Document a few aspects of communication with GSP-RM.  These comments
are derived from notes made during early development of GSP-RM
support in Nouveau, but were not included in the initial patch set.

Signed-off-by: Timur Tabi <ttabi at nvidia.com>
---
 .../common/shared/msgq/inc/msgq/msgq_priv.h   | 79 +++++++++++++++--
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 86 ++++++++++++++++++-
 2 files changed, 154 insertions(+), 11 deletions(-)

diff --git
a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
index 5a2f273d95c8..1e94bf087b23 100644
---
a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
+++
b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
@@ -26,21 +26,82 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
+#define GSP_MESSAGE_COMMAND_QUEUE_SIZE	0x40000
+#define GSP_MESSAGE_STATUS_QUEUE_SIZE	0x40000
+
+/**
+ * msgqTxHeader -- TX queue data structure
+ * @version: the version of this structure, must be 0
+ * @size: the size of the entire queue, including this header
+ * @msgSize: the padded size of queue element, must be power-of-2, 16 is
+ *         minimum
+ * @msgCount: the number of elements in this queue
+ * @writePtr: head index of this queue
+ * @flags: 1 = swap the RX pointers
+ * @rxHdrOff: offset of readPtr in this structure
+ * @entryOff: offset of beginning of queue (msgqRxHeader), relative to
+ *          beginning of this structure
+ *
+ * The command queue is a queue of RPCs that are sent from the driver to the
+ * GSP.  The status queue is a queue of messages/responses from the GSP to the
+ * driver.  Although the driver allocates memory for both queues (inside the
+ * same memory block), the command queue is owned by the driver and the status
+ * queue is owned by the GSP.  In addition, the two headers must not share the
+ * same 4K page.
+ *
+ * Unfortunately, depsite the fact that the queue size is a field in this
+ * structure, the GSP has a hard-coded expectation of the sizes.  So the
+ * command queue size must be GSP_MESSAGE_COMMAND_QUEUE_SIZE and the status
+ * queue size must be GSP_MESSAGE_STATUS_QUEUE_SIZE.
+ *
+ * Each queue is prefixed with this data structure.  The idea is that a queue
+ * and its header are written to only by their owner.  That is, only the
+ * driver writes to the command queue and command queue header, and only the
+ * GSP writes to the status (receive) queue and its header.
+ *
+ * This is enforced by the concept of "swapping" the RX pointers. 
This is
+ * why the 'flags' field must be set to 1.  'rxHdrOff' is how
the GSP knows
+ * where the where the tail pointer of its status queue.
+ *
+ * When the driver writes a new RPC to the command queue, it updates writePtr.
+ * When it reads a new message from the status queue, it updates readPtr.  In
+ * this way, the GSP knows when a new command is in the queue (it polls
+ * writePtr) and it knows how much free space is in the status queue (it
+ * checks readPtr).  The driver never cares about how much free space is in
+ * the status queue.
+ *
+ * As usual, producers write to the head pointer, and consumers read from the
+ * tail pointer.  When head == tail, the queue is empty.
+ *
+ * So to summarize:
+ * command.writePtr = head of command queue
+ * command.readPtr = tail of status queue
+ * status.writePtr = head of status queue
+ * status.readPtr = tail of command queue
+ */
 typedef struct
 {
-    NvU32 version;   // queue version
-    NvU32 size;      // bytes, page aligned
-    NvU32 msgSize;   // entry size, bytes, must be power-of-2, 16 is minimum
-    NvU32 msgCount;  // number of entries in queue
-    NvU32 writePtr;  // message id of next slot
-    NvU32 flags;     // if set it means "i want to swap RX"
-    NvU32 rxHdrOff;  // Offset of msgqRxHeader from start of backing store.
-    NvU32 entryOff;  // Offset of entries from start of backing store.
+	NvU32 version;
+	NvU32 size;
+	NvU32 msgSize;
+	NvU32 msgCount;
+	NvU32 writePtr;
+	NvU32 flags;
+	NvU32 rxHdrOff;
+	NvU32 entryOff;
 } msgqTxHeader;
 
+/**
+ * msgqRxHeader - RX queue data structure
+ * @readPtr: tail index of the other queue
+ *
+ * Although this is a separate struct, it could easily be merged into
+ * msgqTxHeader.  msgqTxHeader.rxHdrOff is simply the offset of readPtr
+ * from the beginning of msgqTxHeader.
+ */
 typedef struct
 {
-    NvU32 readPtr; // message id of last message read
+	NvU32 readPtr;
 } msgqRxHeader;
 
 #endif
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index dc44f5c7833f..265c0a413ea8 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1379,6 +1379,13 @@ r535_gsp_msg_post_event(void *priv, u32 fn, void *repv,
u32 repc)
 	return 0;
 }
 
+/**
+ * r535_gsp_msg_run_cpu_sequencer() -- process I/O commands from the GSP
+ *
+ * The GSP sequencer is a list of I/O commands that the GSP can send to
+ * the driver to perform for various purposes.  The most common usage is to
+ * perform a special mid-initialization reset.
+ */
 static int
 r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc)
 {
@@ -1628,8 +1635,8 @@ r535_gsp_shared_init(struct nvkm_gsp *gsp)
 	} *cmdq, *msgq;
 	int ret, i;
 
-	gsp->shm.cmdq.size = 0x40000;
-	gsp->shm.msgq.size = 0x40000;
+	gsp->shm.cmdq.size = GSP_MESSAGE_COMMAND_QUEUE_SIZE;
+	gsp->shm.msgq.size = GSP_MESSAGE_STATUS_QUEUE_SIZE;
 
 	gsp->shm.ptes.nr  = (gsp->shm.cmdq.size + gsp->shm.msgq.size)
>> GSP_PAGE_SHIFT;
 	gsp->shm.ptes.nr += DIV_ROUND_UP(gsp->shm.ptes.nr * sizeof(u64),
GSP_PAGE_SIZE);
@@ -1718,6 +1725,23 @@ r535_gsp_libos_id8(const char *name)
 	return id;
 }
 
+/**
+ * create_pte_array() - creates a PTE array of a physically contiguous buffer
+ * @ptes: pointer to the array
+ * @addr: base address of physically contiguous buffer (GSP_PAGE_SIZE aligned)
+ * @size: size of the buffer
+ *
+ * GSP-RM sometimes expects physically-contiguous buffers to have an array of
+ * "PTEs" for each page in that buffer.  Although in theory that
allows for
+ * the buffer to be physically discontiguous, GSP-RM does not currently
+ * support that.
+ *
+ * In this case, the PTEs are DMA addresses of each page of the buffer.  Since
+ * the buffer is physically contiguous, calculating all the PTEs is simple
+ * math.
+ *
+ * See memdescGetPhysAddrsForGpu()
+ */
 static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size)
 {
 	unsigned int num_pages = DIV_ROUND_UP_ULL(size, GSP_PAGE_SIZE);
@@ -1727,6 +1751,35 @@ static void create_pte_array(u64 *ptes, dma_addr_t addr,
size_t size)
 		ptes[i] = (u64)addr + (i << GSP_PAGE_SHIFT);
 }
 
+/**
+ * r535_gsp_libos_init() -- create the libos arguments structure
+ *
+ * The logging buffers are byte queues that contain encoded printf-like
+ * messages from GSP-RM.  They need to be decoded by a special application
+ * that can parse the buffers.
+ *
+ * The 'loginit' buffer contains logs from early GSP-RM init and
+ * exception dumps.  The 'logrm' buffer contains the subsequent logs.
Both are
+ * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE.
+ *
+ * The physical address map for the log buffer is stored in the buffer
+ * itself, starting with offset 1. Offset 0 contains the "put"
pointer.
+ *
+ * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the kernel is
+ * configured for a larger page size (e.g. 64K pages), we need to give
+ * the GSP an array of 4K pages. Fortunately, since the buffer is
+ * physically contiguous, it's simple math to calculate the addresses.
+ *
+ * The buffers must be a multiple of GSP_PAGE_SIZE.  GSP-RM also currently
+ * ignores the @kind field for LOGINIT, LOGINTR, and LOGRM, but expects the
+ * buffers to be physically contiguous anyway.
+ *
+ * The memory allocated for the arguments must remain until the GSP sends the
+ * init_done RPC.
+ *
+ * See _kgspInitLibosLoggingStructures (allocates memory for buffers)
+ * See kgspSetupLibosInitArgs_IMPL (creates pLibosInitArgs[] array)
+ */
 static int
 r535_gsp_libos_init(struct nvkm_gsp *gsp)
 {
@@ -1837,6 +1890,35 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct
nvkm_gsp_radix3 *rx3)
 		nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]);
 }
 
+/**
+ * nvkm_gsp_radix3_sg - build a radix3 table from a S/G list
+ *
+ * The GSP uses a three-level page table, called radix3, to map the firmware.
+ * Each 64-bit "pointer" in the table is either the bus address of an
entry in
+ * the next table (for levels 0 and 1) or the bus address of the next page in
+ * the GSP firmware image itself.
+ *
+ * Level 0 contains a single entry in one page that points to the first page
+ * of level 1.
+ *
+ * Level 1, since it's also only one page in size, contains up to 512
entries,
+ * one for each page in Level 2.
+ *
+ * Level 2 can be up to 512 pages in size, and each of those entries points to
+ * the next page of the firmware image.  Since there can be up to 512*512
+ * pages, that limits the size of the firmware to 512*512*GSP_PAGE_SIZE = 1GB.
+ *
+ * Internally, the GSP has its window into system memory, but the base
+ * physical address of the aperture is not 0.  In fact, it varies depending on
+ * the GPU architure.  Since the GPU is a PCI device, this window is accessed
+ * via DMA and is therefore bound by IOMMU translation.  The end result is
+ * that GSP-RM must translate the bus addresses in the table to GSP physical
+ * addresses.  All this should happen transparently.
+ *
+ * Returns 0 on success, or negative error code
+ *
+ * See kgspCreateRadix3_IMPL
+ */
 static int
 nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64 size,
 		   struct nvkm_gsp_radix3 *rx3)
-- 
2.34.1

David Airlie

2023-Nov-22 00:52 UTC

head link

[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM

On Wed, Nov 22, 2023 at 9:53?AM Timur Tabi <ttabi at nvidia.com>
wrote:>
> Document a few aspects of communication with GSP-RM.  These comments
> are derived from notes made during early development of GSP-RM
> support in Nouveau, but were not included in the initial patch set.
>
> Signed-off-by: Timur Tabi <ttabi at nvidia.com>
> ---
>  .../common/shared/msgq/inc/msgq/msgq_priv.h   | 79 +++++++++++++++--
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 86 ++++++++++++++++++-
>  2 files changed, 154 insertions(+), 11 deletions(-)
>
> diff --git
a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
> index 5a2f273d95c8..1e94bf087b23 100644
> ---
a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
> +++
b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h
> @@ -26,21 +26,82 @@
>   * DEALINGS IN THE SOFTWARE.
>   */
>
> +#define GSP_MESSAGE_COMMAND_QUEUE_SIZE 0x40000
> +#define GSP_MESSAGE_STATUS_QUEUE_SIZE  0x40000
> +
> +/**
> + * msgqTxHeader -- TX queue data structure
> + * @version: the version of this structure, must be 0
> + * @size: the size of the entire queue, including this header
> + * @msgSize: the padded size of queue element, must be power-of-2, 16 is
> + *         minimum
I don't think this is true anymore, I think 4k is the minimum size
since one of the 535 series.

> + * @msgCount: the number of elements in this queue
> + * @writePtr: head index of this queue
> + * @flags: 1 = swap the RX pointers
> + * @rxHdrOff: offset of readPtr in this structure
> + * @entryOff: offset of beginning of queue (msgqRxHeader), relative to
> + *          beginning of this structure
> + *
> + * The command queue is a queue of RPCs that are sent from the driver to
the
> + * GSP.  The status queue is a queue of messages/responses from the GSP to
the
> + * driver.  Although the driver allocates memory for both queues (inside
the
> + * same memory block), the command queue is owned by the driver and the
status
> + * queue is owned by the GSP.  In addition, the two headers must not share
the
> + * same 4K page.
> + *
> + * Unfortunately, depsite the fact that the queue size is a field in this
^ typo
> + * structure, the GSP has a hard-coded expectation of the sizes.  So the
> + * command queue size must be GSP_MESSAGE_COMMAND_QUEUE_SIZE and the
status
> + * queue size must be GSP_MESSAGE_STATUS_QUEUE_SIZE.
> + *
> + * Each queue is prefixed with this data structure.  The idea is that a
queue
> + * and its header are written to only by their owner.  That is, only the
> + * driver writes to the command queue and command queue header, and only
the
> + * GSP writes to the status (receive) queue and its header.
> + *
> + * This is enforced by the concept of "swapping" the RX
pointers.  This is
> + * why the 'flags' field must be set to 1.  'rxHdrOff' is
how the GSP knows
> + * where the where the tail pointer of its status queue.
> + *
> + * When the driver writes a new RPC to the command queue, it updates
writePtr.
> + * When it reads a new message from the status queue, it updates readPtr. 
In
> + * this way, the GSP knows when a new command is in the queue (it polls
> + * writePtr) and it knows how much free space is in the status queue (it
> + * checks readPtr).  The driver never cares about how much free space is
in
> + * the status queue.
> + *
> + * As usual, producers write to the head pointer, and consumers read from
the
> + * tail pointer.  When head == tail, the queue is empty.
> + *
> + * So to summarize:
> + * command.writePtr = head of command queue
> + * command.readPtr = tail of status queue
> + * status.writePtr = head of status queue
> + * status.readPtr = tail of command queue
> + */
>  typedef struct
>  {
> -    NvU32 version;   // queue version
> -    NvU32 size;      // bytes, page aligned
> -    NvU32 msgSize;   // entry size, bytes, must be power-of-2, 16 is
minimum
> -    NvU32 msgCount;  // number of entries in queue
> -    NvU32 writePtr;  // message id of next slot
> -    NvU32 flags;     // if set it means "i want to swap RX"
> -    NvU32 rxHdrOff;  // Offset of msgqRxHeader from start of backing
store.
> -    NvU32 entryOff;  // Offset of entries from start of backing store.
> +       NvU32 version;
> +       NvU32 size;
> +       NvU32 msgSize;
> +       NvU32 msgCount;
> +       NvU32 writePtr;
> +       NvU32 flags;
> +       NvU32 rxHdrOff;
> +       NvU32 entryOff;
>  } msgqTxHeader;
>
> +/**
> + * msgqRxHeader - RX queue data structure
> + * @readPtr: tail index of the other queue
> + *
> + * Although this is a separate struct, it could easily be merged into
> + * msgqTxHeader.  msgqTxHeader.rxHdrOff is simply the offset of readPtr
> + * from the beginning of msgqTxHeader.
> + */
>  typedef struct
>  {
> -    NvU32 readPtr; // message id of last message read
> +       NvU32 readPtr;
>  } msgqRxHeader;
>
>  #endif
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index dc44f5c7833f..265c0a413ea8 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1379,6 +1379,13 @@ r535_gsp_msg_post_event(void *priv, u32 fn, void
*repv, u32 repc)
>         return 0;
>  }
>
> +/**
> + * r535_gsp_msg_run_cpu_sequencer() -- process I/O commands from the GSP
> + *
> + * The GSP sequencer is a list of I/O commands that the GSP can send to
> + * the driver to perform for various purposes.  The most common usage is
to
> + * perform a special mid-initialization reset.
> + */
>  static int
>  r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc)
>  {
> @@ -1628,8 +1635,8 @@ r535_gsp_shared_init(struct nvkm_gsp *gsp)
>         } *cmdq, *msgq;
>         int ret, i;
>
> -       gsp->shm.cmdq.size = 0x40000;
> -       gsp->shm.msgq.size = 0x40000;
> +       gsp->shm.cmdq.size = GSP_MESSAGE_COMMAND_QUEUE_SIZE;
> +       gsp->shm.msgq.size = GSP_MESSAGE_STATUS_QUEUE_SIZE;
>
>         gsp->shm.ptes.nr  = (gsp->shm.cmdq.size +
gsp->shm.msgq.size) >> GSP_PAGE_SHIFT;
>         gsp->shm.ptes.nr += DIV_ROUND_UP(gsp->shm.ptes.nr *
sizeof(u64), GSP_PAGE_SIZE);
> @@ -1718,6 +1725,23 @@ r535_gsp_libos_id8(const char *name)
>         return id;
>  }
>
> +/**
> + * create_pte_array() - creates a PTE array of a physically contiguous
buffer
> + * @ptes: pointer to the array
> + * @addr: base address of physically contiguous buffer (GSP_PAGE_SIZE
aligned)
> + * @size: size of the buffer
> + *
> + * GSP-RM sometimes expects physically-contiguous buffers to have an array
of
> + * "PTEs" for each page in that buffer.  Although in theory that
allows for
> + * the buffer to be physically discontiguous, GSP-RM does not currently
> + * support that.
> + *
> + * In this case, the PTEs are DMA addresses of each page of the buffer. 
Since
> + * the buffer is physically contiguous, calculating all the PTEs is simple
> + * math.
> + *
> + * See memdescGetPhysAddrsForGpu()
> + */
>  static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size)
>  {
>         unsigned int num_pages = DIV_ROUND_UP_ULL(size, GSP_PAGE_SIZE);
> @@ -1727,6 +1751,35 @@ static void create_pte_array(u64 *ptes, dma_addr_t
addr, size_t size)
>                 ptes[i] = (u64)addr + (i << GSP_PAGE_SHIFT);
>  }
>
> +/**
> + * r535_gsp_libos_init() -- create the libos arguments structure
> + *
> + * The logging buffers are byte queues that contain encoded printf-like
> + * messages from GSP-RM.  They need to be decoded by a special application
> + * that can parse the buffers.
> + *
> + * The 'loginit' buffer contains logs from early GSP-RM init and
> + * exception dumps.  The 'logrm' buffer contains the subsequent
logs. Both are
> + * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE.
> + *
> + * The physical address map for the log buffer is stored in the buffer
> + * itself, starting with offset 1. Offset 0 contains the "put"
pointer.
> + *
> + * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the
kernel is
> + * configured for a larger page size (e.g. 64K pages), we need to give
> + * the GSP an array of 4K pages. Fortunately, since the buffer is
> + * physically contiguous, it's simple math to calculate the addresses.
> + *
> + * The buffers must be a multiple of GSP_PAGE_SIZE.  GSP-RM also currently
> + * ignores the @kind field for LOGINIT, LOGINTR, and LOGRM, but expects
the
> + * buffers to be physically contiguous anyway.
> + *
> + * The memory allocated for the arguments must remain until the GSP sends
the
> + * init_done RPC.
> + *
> + * See _kgspInitLibosLoggingStructures (allocates memory for buffers)
> + * See kgspSetupLibosInitArgs_IMPL (creates pLibosInitArgs[] array)
> + */
>  static int
>  r535_gsp_libos_init(struct nvkm_gsp *gsp)
>  {
> @@ -1837,6 +1890,35 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct
nvkm_gsp_radix3 *rx3)
>                 nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]);
>  }
>
> +/**
> + * nvkm_gsp_radix3_sg - build a radix3 table from a S/G list
> + *
> + * The GSP uses a three-level page table, called radix3, to map the
firmware.
> + * Each 64-bit "pointer" in the table is either the bus address
of an entry in
> + * the next table (for levels 0 and 1) or the bus address of the next page
in
> + * the GSP firmware image itself.
> + *
> + * Level 0 contains a single entry in one page that points to the first
page
> + * of level 1.
> + *
> + * Level 1, since it's also only one page in size, contains up to 512
entries,
> + * one for each page in Level 2.
> + *
> + * Level 2 can be up to 512 pages in size, and each of those entries
points to
> + * the next page of the firmware image.  Since there can be up to 512*512
> + * pages, that limits the size of the firmware to 512*512*GSP_PAGE_SIZE =
1GB.
> + *
> + * Internally, the GSP has its window into system memory, but the base
> + * physical address of the aperture is not 0.  In fact, it varies
depending on
> + * the GPU architure.  Since the GPU is a PCI device, this window is
accessed
^ typo
> + * via DMA and is therefore bound by IOMMU translation.  The end result is
> + * that GSP-RM must translate the bus addresses in the table to GSP
physical
> + * addresses.  All this should happen transparently.
> + *
> + * Returns 0 on success, or negative error code
> + *
> + * See kgspCreateRadix3_IMPL
> + */
>  static int
>  nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64
size,
>
Otherwise seems fine, with those fixed feel free to resend and add

a Reviewed-by: Dave Airlie <airlied at redhat.com> tag.

Dave.

Nouveau - Nov 2023 - [PATCH] nouveau/gsp: document some aspects of GSP-RM

[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM

[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM