Timur Tabi
2023-Nov-21 23:53 UTC
[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM
Document a few aspects of communication with GSP-RM. These comments are derived from notes made during early development of GSP-RM support in Nouveau, but were not included in the initial patch set. Signed-off-by: Timur Tabi <ttabi at nvidia.com> --- .../common/shared/msgq/inc/msgq/msgq_priv.h | 79 +++++++++++++++-- .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 86 ++++++++++++++++++- 2 files changed, 154 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h index 5a2f273d95c8..1e94bf087b23 100644 --- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h +++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h @@ -26,21 +26,82 @@ * DEALINGS IN THE SOFTWARE. */ +#define GSP_MESSAGE_COMMAND_QUEUE_SIZE 0x40000 +#define GSP_MESSAGE_STATUS_QUEUE_SIZE 0x40000 + +/** + * msgqTxHeader -- TX queue data structure + * @version: the version of this structure, must be 0 + * @size: the size of the entire queue, including this header + * @msgSize: the padded size of queue element, must be power-of-2, 16 is + * minimum + * @msgCount: the number of elements in this queue + * @writePtr: head index of this queue + * @flags: 1 = swap the RX pointers + * @rxHdrOff: offset of readPtr in this structure + * @entryOff: offset of beginning of queue (msgqRxHeader), relative to + * beginning of this structure + * + * The command queue is a queue of RPCs that are sent from the driver to the + * GSP. The status queue is a queue of messages/responses from the GSP to the + * driver. Although the driver allocates memory for both queues (inside the + * same memory block), the command queue is owned by the driver and the status + * queue is owned by the GSP. In addition, the two headers must not share the + * same 4K page. + * + * Unfortunately, depsite the fact that the queue size is a field in this + * structure, the GSP has a hard-coded expectation of the sizes. So the + * command queue size must be GSP_MESSAGE_COMMAND_QUEUE_SIZE and the status + * queue size must be GSP_MESSAGE_STATUS_QUEUE_SIZE. + * + * Each queue is prefixed with this data structure. The idea is that a queue + * and its header are written to only by their owner. That is, only the + * driver writes to the command queue and command queue header, and only the + * GSP writes to the status (receive) queue and its header. + * + * This is enforced by the concept of "swapping" the RX pointers. This is + * why the 'flags' field must be set to 1. 'rxHdrOff' is how the GSP knows + * where the where the tail pointer of its status queue. + * + * When the driver writes a new RPC to the command queue, it updates writePtr. + * When it reads a new message from the status queue, it updates readPtr. In + * this way, the GSP knows when a new command is in the queue (it polls + * writePtr) and it knows how much free space is in the status queue (it + * checks readPtr). The driver never cares about how much free space is in + * the status queue. + * + * As usual, producers write to the head pointer, and consumers read from the + * tail pointer. When head == tail, the queue is empty. + * + * So to summarize: + * command.writePtr = head of command queue + * command.readPtr = tail of status queue + * status.writePtr = head of status queue + * status.readPtr = tail of command queue + */ typedef struct { - NvU32 version; // queue version - NvU32 size; // bytes, page aligned - NvU32 msgSize; // entry size, bytes, must be power-of-2, 16 is minimum - NvU32 msgCount; // number of entries in queue - NvU32 writePtr; // message id of next slot - NvU32 flags; // if set it means "i want to swap RX" - NvU32 rxHdrOff; // Offset of msgqRxHeader from start of backing store. - NvU32 entryOff; // Offset of entries from start of backing store. + NvU32 version; + NvU32 size; + NvU32 msgSize; + NvU32 msgCount; + NvU32 writePtr; + NvU32 flags; + NvU32 rxHdrOff; + NvU32 entryOff; } msgqTxHeader; +/** + * msgqRxHeader - RX queue data structure + * @readPtr: tail index of the other queue + * + * Although this is a separate struct, it could easily be merged into + * msgqTxHeader. msgqTxHeader.rxHdrOff is simply the offset of readPtr + * from the beginning of msgqTxHeader. + */ typedef struct { - NvU32 readPtr; // message id of last message read + NvU32 readPtr; } msgqRxHeader; #endif diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index dc44f5c7833f..265c0a413ea8 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -1379,6 +1379,13 @@ r535_gsp_msg_post_event(void *priv, u32 fn, void *repv, u32 repc) return 0; } +/** + * r535_gsp_msg_run_cpu_sequencer() -- process I/O commands from the GSP + * + * The GSP sequencer is a list of I/O commands that the GSP can send to + * the driver to perform for various purposes. The most common usage is to + * perform a special mid-initialization reset. + */ static int r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc) { @@ -1628,8 +1635,8 @@ r535_gsp_shared_init(struct nvkm_gsp *gsp) } *cmdq, *msgq; int ret, i; - gsp->shm.cmdq.size = 0x40000; - gsp->shm.msgq.size = 0x40000; + gsp->shm.cmdq.size = GSP_MESSAGE_COMMAND_QUEUE_SIZE; + gsp->shm.msgq.size = GSP_MESSAGE_STATUS_QUEUE_SIZE; gsp->shm.ptes.nr = (gsp->shm.cmdq.size + gsp->shm.msgq.size) >> GSP_PAGE_SHIFT; gsp->shm.ptes.nr += DIV_ROUND_UP(gsp->shm.ptes.nr * sizeof(u64), GSP_PAGE_SIZE); @@ -1718,6 +1725,23 @@ r535_gsp_libos_id8(const char *name) return id; } +/** + * create_pte_array() - creates a PTE array of a physically contiguous buffer + * @ptes: pointer to the array + * @addr: base address of physically contiguous buffer (GSP_PAGE_SIZE aligned) + * @size: size of the buffer + * + * GSP-RM sometimes expects physically-contiguous buffers to have an array of + * "PTEs" for each page in that buffer. Although in theory that allows for + * the buffer to be physically discontiguous, GSP-RM does not currently + * support that. + * + * In this case, the PTEs are DMA addresses of each page of the buffer. Since + * the buffer is physically contiguous, calculating all the PTEs is simple + * math. + * + * See memdescGetPhysAddrsForGpu() + */ static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size) { unsigned int num_pages = DIV_ROUND_UP_ULL(size, GSP_PAGE_SIZE); @@ -1727,6 +1751,35 @@ static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size) ptes[i] = (u64)addr + (i << GSP_PAGE_SHIFT); } +/** + * r535_gsp_libos_init() -- create the libos arguments structure + * + * The logging buffers are byte queues that contain encoded printf-like + * messages from GSP-RM. They need to be decoded by a special application + * that can parse the buffers. + * + * The 'loginit' buffer contains logs from early GSP-RM init and + * exception dumps. The 'logrm' buffer contains the subsequent logs. Both are + * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE. + * + * The physical address map for the log buffer is stored in the buffer + * itself, starting with offset 1. Offset 0 contains the "put" pointer. + * + * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the kernel is + * configured for a larger page size (e.g. 64K pages), we need to give + * the GSP an array of 4K pages. Fortunately, since the buffer is + * physically contiguous, it's simple math to calculate the addresses. + * + * The buffers must be a multiple of GSP_PAGE_SIZE. GSP-RM also currently + * ignores the @kind field for LOGINIT, LOGINTR, and LOGRM, but expects the + * buffers to be physically contiguous anyway. + * + * The memory allocated for the arguments must remain until the GSP sends the + * init_done RPC. + * + * See _kgspInitLibosLoggingStructures (allocates memory for buffers) + * See kgspSetupLibosInitArgs_IMPL (creates pLibosInitArgs[] array) + */ static int r535_gsp_libos_init(struct nvkm_gsp *gsp) { @@ -1837,6 +1890,35 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3) nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]); } +/** + * nvkm_gsp_radix3_sg - build a radix3 table from a S/G list + * + * The GSP uses a three-level page table, called radix3, to map the firmware. + * Each 64-bit "pointer" in the table is either the bus address of an entry in + * the next table (for levels 0 and 1) or the bus address of the next page in + * the GSP firmware image itself. + * + * Level 0 contains a single entry in one page that points to the first page + * of level 1. + * + * Level 1, since it's also only one page in size, contains up to 512 entries, + * one for each page in Level 2. + * + * Level 2 can be up to 512 pages in size, and each of those entries points to + * the next page of the firmware image. Since there can be up to 512*512 + * pages, that limits the size of the firmware to 512*512*GSP_PAGE_SIZE = 1GB. + * + * Internally, the GSP has its window into system memory, but the base + * physical address of the aperture is not 0. In fact, it varies depending on + * the GPU architure. Since the GPU is a PCI device, this window is accessed + * via DMA and is therefore bound by IOMMU translation. The end result is + * that GSP-RM must translate the bus addresses in the table to GSP physical + * addresses. All this should happen transparently. + * + * Returns 0 on success, or negative error code + * + * See kgspCreateRadix3_IMPL + */ static int nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64 size, struct nvkm_gsp_radix3 *rx3) -- 2.34.1
David Airlie
2023-Nov-22 00:52 UTC
[Nouveau] [PATCH] nouveau/gsp: document some aspects of GSP-RM
On Wed, Nov 22, 2023 at 9:53?AM Timur Tabi <ttabi at nvidia.com> wrote:> > Document a few aspects of communication with GSP-RM. These comments > are derived from notes made during early development of GSP-RM > support in Nouveau, but were not included in the initial patch set. > > Signed-off-by: Timur Tabi <ttabi at nvidia.com> > --- > .../common/shared/msgq/inc/msgq/msgq_priv.h | 79 +++++++++++++++-- > .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 86 ++++++++++++++++++- > 2 files changed, 154 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h > index 5a2f273d95c8..1e94bf087b23 100644 > --- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h > +++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/shared/msgq/inc/msgq/msgq_priv.h > @@ -26,21 +26,82 @@ > * DEALINGS IN THE SOFTWARE. > */ > > +#define GSP_MESSAGE_COMMAND_QUEUE_SIZE 0x40000 > +#define GSP_MESSAGE_STATUS_QUEUE_SIZE 0x40000 > + > +/** > + * msgqTxHeader -- TX queue data structure > + * @version: the version of this structure, must be 0 > + * @size: the size of the entire queue, including this header > + * @msgSize: the padded size of queue element, must be power-of-2, 16 is > + * minimumI don't think this is true anymore, I think 4k is the minimum size since one of the 535 series.> + * @msgCount: the number of elements in this queue > + * @writePtr: head index of this queue > + * @flags: 1 = swap the RX pointers > + * @rxHdrOff: offset of readPtr in this structure > + * @entryOff: offset of beginning of queue (msgqRxHeader), relative to > + * beginning of this structure > + * > + * The command queue is a queue of RPCs that are sent from the driver to the > + * GSP. The status queue is a queue of messages/responses from the GSP to the > + * driver. Although the driver allocates memory for both queues (inside the > + * same memory block), the command queue is owned by the driver and the status > + * queue is owned by the GSP. In addition, the two headers must not share the > + * same 4K page. > + * > + * Unfortunately, depsite the fact that the queue size is a field in this^ typo> + * structure, the GSP has a hard-coded expectation of the sizes. So the > + * command queue size must be GSP_MESSAGE_COMMAND_QUEUE_SIZE and the status > + * queue size must be GSP_MESSAGE_STATUS_QUEUE_SIZE. > + * > + * Each queue is prefixed with this data structure. The idea is that a queue > + * and its header are written to only by their owner. That is, only the > + * driver writes to the command queue and command queue header, and only the > + * GSP writes to the status (receive) queue and its header. > + * > + * This is enforced by the concept of "swapping" the RX pointers. This is > + * why the 'flags' field must be set to 1. 'rxHdrOff' is how the GSP knows > + * where the where the tail pointer of its status queue. > + * > + * When the driver writes a new RPC to the command queue, it updates writePtr. > + * When it reads a new message from the status queue, it updates readPtr. In > + * this way, the GSP knows when a new command is in the queue (it polls > + * writePtr) and it knows how much free space is in the status queue (it > + * checks readPtr). The driver never cares about how much free space is in > + * the status queue. > + * > + * As usual, producers write to the head pointer, and consumers read from the > + * tail pointer. When head == tail, the queue is empty. > + * > + * So to summarize: > + * command.writePtr = head of command queue > + * command.readPtr = tail of status queue > + * status.writePtr = head of status queue > + * status.readPtr = tail of command queue > + */ > typedef struct > { > - NvU32 version; // queue version > - NvU32 size; // bytes, page aligned > - NvU32 msgSize; // entry size, bytes, must be power-of-2, 16 is minimum > - NvU32 msgCount; // number of entries in queue > - NvU32 writePtr; // message id of next slot > - NvU32 flags; // if set it means "i want to swap RX" > - NvU32 rxHdrOff; // Offset of msgqRxHeader from start of backing store. > - NvU32 entryOff; // Offset of entries from start of backing store. > + NvU32 version; > + NvU32 size; > + NvU32 msgSize; > + NvU32 msgCount; > + NvU32 writePtr; > + NvU32 flags; > + NvU32 rxHdrOff; > + NvU32 entryOff; > } msgqTxHeader; > > +/** > + * msgqRxHeader - RX queue data structure > + * @readPtr: tail index of the other queue > + * > + * Although this is a separate struct, it could easily be merged into > + * msgqTxHeader. msgqTxHeader.rxHdrOff is simply the offset of readPtr > + * from the beginning of msgqTxHeader. > + */ > typedef struct > { > - NvU32 readPtr; // message id of last message read > + NvU32 readPtr; > } msgqRxHeader; > > #endif > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > index dc44f5c7833f..265c0a413ea8 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > @@ -1379,6 +1379,13 @@ r535_gsp_msg_post_event(void *priv, u32 fn, void *repv, u32 repc) > return 0; > } > > +/** > + * r535_gsp_msg_run_cpu_sequencer() -- process I/O commands from the GSP > + * > + * The GSP sequencer is a list of I/O commands that the GSP can send to > + * the driver to perform for various purposes. The most common usage is to > + * perform a special mid-initialization reset. > + */ > static int > r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc) > { > @@ -1628,8 +1635,8 @@ r535_gsp_shared_init(struct nvkm_gsp *gsp) > } *cmdq, *msgq; > int ret, i; > > - gsp->shm.cmdq.size = 0x40000; > - gsp->shm.msgq.size = 0x40000; > + gsp->shm.cmdq.size = GSP_MESSAGE_COMMAND_QUEUE_SIZE; > + gsp->shm.msgq.size = GSP_MESSAGE_STATUS_QUEUE_SIZE; > > gsp->shm.ptes.nr = (gsp->shm.cmdq.size + gsp->shm.msgq.size) >> GSP_PAGE_SHIFT; > gsp->shm.ptes.nr += DIV_ROUND_UP(gsp->shm.ptes.nr * sizeof(u64), GSP_PAGE_SIZE); > @@ -1718,6 +1725,23 @@ r535_gsp_libos_id8(const char *name) > return id; > } > > +/** > + * create_pte_array() - creates a PTE array of a physically contiguous buffer > + * @ptes: pointer to the array > + * @addr: base address of physically contiguous buffer (GSP_PAGE_SIZE aligned) > + * @size: size of the buffer > + * > + * GSP-RM sometimes expects physically-contiguous buffers to have an array of > + * "PTEs" for each page in that buffer. Although in theory that allows for > + * the buffer to be physically discontiguous, GSP-RM does not currently > + * support that. > + * > + * In this case, the PTEs are DMA addresses of each page of the buffer. Since > + * the buffer is physically contiguous, calculating all the PTEs is simple > + * math. > + * > + * See memdescGetPhysAddrsForGpu() > + */ > static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size) > { > unsigned int num_pages = DIV_ROUND_UP_ULL(size, GSP_PAGE_SIZE); > @@ -1727,6 +1751,35 @@ static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size) > ptes[i] = (u64)addr + (i << GSP_PAGE_SHIFT); > } > > +/** > + * r535_gsp_libos_init() -- create the libos arguments structure > + * > + * The logging buffers are byte queues that contain encoded printf-like > + * messages from GSP-RM. They need to be decoded by a special application > + * that can parse the buffers. > + * > + * The 'loginit' buffer contains logs from early GSP-RM init and > + * exception dumps. The 'logrm' buffer contains the subsequent logs. Both are > + * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE. > + * > + * The physical address map for the log buffer is stored in the buffer > + * itself, starting with offset 1. Offset 0 contains the "put" pointer. > + * > + * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the kernel is > + * configured for a larger page size (e.g. 64K pages), we need to give > + * the GSP an array of 4K pages. Fortunately, since the buffer is > + * physically contiguous, it's simple math to calculate the addresses. > + * > + * The buffers must be a multiple of GSP_PAGE_SIZE. GSP-RM also currently > + * ignores the @kind field for LOGINIT, LOGINTR, and LOGRM, but expects the > + * buffers to be physically contiguous anyway. > + * > + * The memory allocated for the arguments must remain until the GSP sends the > + * init_done RPC. > + * > + * See _kgspInitLibosLoggingStructures (allocates memory for buffers) > + * See kgspSetupLibosInitArgs_IMPL (creates pLibosInitArgs[] array) > + */ > static int > r535_gsp_libos_init(struct nvkm_gsp *gsp) > { > @@ -1837,6 +1890,35 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3) > nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]); > } > > +/** > + * nvkm_gsp_radix3_sg - build a radix3 table from a S/G list > + * > + * The GSP uses a three-level page table, called radix3, to map the firmware. > + * Each 64-bit "pointer" in the table is either the bus address of an entry in > + * the next table (for levels 0 and 1) or the bus address of the next page in > + * the GSP firmware image itself. > + * > + * Level 0 contains a single entry in one page that points to the first page > + * of level 1. > + * > + * Level 1, since it's also only one page in size, contains up to 512 entries, > + * one for each page in Level 2. > + * > + * Level 2 can be up to 512 pages in size, and each of those entries points to > + * the next page of the firmware image. Since there can be up to 512*512 > + * pages, that limits the size of the firmware to 512*512*GSP_PAGE_SIZE = 1GB. > + * > + * Internally, the GSP has its window into system memory, but the base > + * physical address of the aperture is not 0. In fact, it varies depending on > + * the GPU architure. Since the GPU is a PCI device, this window is accessed^ typo> + * via DMA and is therefore bound by IOMMU translation. The end result is > + * that GSP-RM must translate the bus addresses in the table to GSP physical > + * addresses. All this should happen transparently. > + * > + * Returns 0 on success, or negative error code > + * > + * See kgspCreateRadix3_IMPL > + */ > static int > nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64 size, >Otherwise seems fine, with those fixed feel free to resend and add a Reviewed-by: Dave Airlie <airlied at redhat.com> tag. Dave.