Hi folks: Here are some fixes of weird bugs I encountered when I was enabling vGPU [1] on core-driver aka NVKM. They are exposed because of the new RPCs required by vGPU. For testing, I tried to run Uniengine Heaven[2] on my RTX 4060 for 8 hours and the GL CTS runner[3] (commandline: ./cts-runner --type-gl40) from Khronos without any problem. v2: - Remove the Fixes: tags as the vanilla nouveau aren't going to hit these bugs. (Danilo) - Test the patchset on VK-GL-CTS. (Danilo) - Remove the ambiguous empty line in PATCH 2. (Danilo) - Rename the r535_gsp_msgq_wait to gsp_msgq_recv. (Danilo) - Introduce a data structure to hold the params of gsp_msgq_recv(). (Danilo) - Document the params and the states they are related to. (Danilo) [1] https://lore.kernel.org/kvm/20240922124951.1946072-1-zhiw at nvidia.com/T/#t [2] https://benchmark.unigine.com/heaven [3] https://github.com/KhronosGroup/VK-GL-CTS Zhi Wang (3): nvkm/gsp: correctly advance the read pointer of GSP message queue nvkm: correctly calculate the available space of the GSP cmdq buffer nvkm: handle the return of large RPC .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 251 +++++++++++++----- 1 file changed, 184 insertions(+), 67 deletions(-) -- 2.34.1
Zhi Wang
2024-Oct-17 07:19 UTC
[PATCH 1/3] nvkm/gsp: correctly advance the read pointer of GSP message queue
A GSP event message consists three parts: message header, RPC header, message body. GSP calculates the number of pages to write from the total size of a GSP message. This behavior can be observed from the movement of the write pointer. However, nvkm takes only the size of RPC header and message body as the message size when advancing the read pointer. When handling a two-page GSP message in the non rollback case, It wrongly takes the message body of the previous message as the message header of the next message. As the "message length" tends to be zero, in the calculation of size needs to be copied (0 - size of (message header)), the size needs to be copied will be "0xffffffxx". It also triggers a kernel panic due to a NULL pointer error. [ 547.614102] msg: 00000f90: ff ff ff ff ff ff ff ff 40 d7 18 fb 8b 00 00 00 ........ at ....... [ 547.622533] msg: 00000fa0: 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 ................ [ 547.630965] msg: 00000fb0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff ................ [ 547.639397] msg: 00000fc0: ff ff ff ff 00 00 00 00 ff ff ff ff ff ff ff ff ................ [ 547.647832] nvkm 0000:c1:00.0: gsp: peek msg rpc fn:0 len:0x0/0xffffffffffffffe0 [ 547.655225] nvkm 0000:c1:00.0: gsp: get msg rpc fn:0 len:0x0/0xffffffffffffffe0 [ 547.662532] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 547.669485] #PF: supervisor read access in kernel mode [ 547.674624] #PF: error_code(0x0000) - not-present page [ 547.679755] PGD 0 P4D 0 [ 547.682294] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 547.686643] CPU: 22 PID: 322 Comm: kworker/22:1 Tainted: G E 6.9.0-rc6+ #1 [ 547.694893] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022 [ 547.702626] Workqueue: events r535_gsp_msgq_work [nvkm] [ 547.707921] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 547.713375] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05 [ 547.732119] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203 [ 547.737335] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f [ 547.744461] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010 [ 547.751585] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0 [ 547.758707] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000 [ 547.765834] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000 [ 547.772958] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000 [ 547.781035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 547.786771] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0 [ 547.793896] PKRU: 55555554 [ 547.796600] Call Trace: [ 547.799046] <TASK> [ 547.801152] ? __die+0x20/0x70 [ 547.804211] ? page_fault_oops+0x75/0x170 [ 547.808221] ? print_hex_dump+0x100/0x160 [ 547.812226] ? exc_page_fault+0x64/0x150 [ 547.816152] ? asm_exc_page_fault+0x22/0x30 [ 547.820341] ? r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 547.825184] r535_gsp_msgq_work+0x42/0x50 [nvkm] [ 547.829845] process_one_work+0x196/0x3d0 [ 547.833861] worker_thread+0x2fc/0x410 [ 547.837613] ? __pfx_worker_thread+0x10/0x10 [ 547.841885] kthread+0xdf/0x110 [ 547.845031] ? __pfx_kthread+0x10/0x10 [ 547.848775] ret_from_fork+0x30/0x50 [ 547.852354] ? __pfx_kthread+0x10/0x10 [ 547.856097] ret_from_fork_asm+0x1a/0x30 [ 547.860019] </TASK> [ 547.862208] Modules linked in: nvkm(E) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) mlx5_ib(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) mxm_wmi(E) joydev(E) rapl(E) ptdma(E) i2c_piix4(E) acpi_cpufreq(E) wmi_bmof(E) pcspkr(E) k10temp(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) drm_shmem_helper(E) crct10dif_pclmul(E) drm_kms_helper(E) ahci(E) crc32_pclmul(E) nvme_tcp(E) libahci(E) nvme(E) crc32c_intel(E) nvme_fabrics(E) cdc_ether(E) nvme_core(E) usbnet(E) mlx5_core(E) ghash_clmulni_intel(E) drm(E) libata(E) ccp(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E) [ 547.862283] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)] [ 547.962691] CR2: 0000000000000020 [ 547.966003] ---[ end trace 0000000000000000 ]--- [ 549.012012] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1370499158 wd_nsec: 1370498904 [ 549.043676] pstore: backend (erst) writing error (-28) [ 549.050924] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 549.056389] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05 [ 549.075138] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203 [ 549.080361] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f [ 549.087484] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010 [ 549.094609] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0 [ 549.101733] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000 [ 549.108857] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000 [ 549.115982] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000 [ 549.124061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.129807] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0 [ 549.136940] PKRU: 55555554 [ 549.139653] Kernel panic - not syncing: Fatal exception [ 549.145054] Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 549.165074] ---[ end Kernel panic - not syncing: Fatal exception ]--- Also, nvkm wrongly advances the read pointer when handling a two-page GSP message in the rollback case. In the rollback case, the GSP message will be copied in two rounds. When handling a two-page GSP message, nvkm first copies amount of (GSP_PAGE_SIZE - header) data into the buffer, then advances the read pointer by the result of DIV_ROUND_UP(size, GSP_PAGE_SIZE). Thus, the read pointer is advanced by 1. Next, nvkm copies the amount of (total size - (GSP_PAGE_SIZE - header)) data into the buffer. The left amount of the data will be always larger than one page since the message header is not taken into account in the first copy. Thus, the read pointer is advanced by DIV_ROUND_UP( size(larger than one page), GSP_PAGE_SIZE) = 2. In the end, the read pointer is wrongly advanced by 3 when handling a two-page GSP message in the rollback case. Fix the problems by taking the total size of the message into account when advancing the read pointer and calculate the read pointer in the end of the all copies for the rollback case. BTW: the two-page GSP message can be observed in the msgq when vGPU is enabled. Signed-off-by: Zhi Wang <zhiw at nvidia.com> --- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index cf58f9da9139..736cde1987d0 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -121,6 +121,8 @@ r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) return mqe->data; } + size = ALIGN(repc + GSP_MSG_HDR_SIZE, GSP_PAGE_SIZE); + msg = kvmalloc(repc, GFP_KERNEL); if (!msg) return ERR_PTR(-ENOMEM); @@ -129,19 +131,15 @@ r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) len = min_t(u32, repc, len); memcpy(msg, mqe->data, len); - rptr += DIV_ROUND_UP(len, GSP_PAGE_SIZE); - if (rptr == gsp->msgq.cnt) - rptr = 0; - repc -= len; if (repc) { mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000); memcpy(msg + len, mqe, repc); - - rptr += DIV_ROUND_UP(repc, GSP_PAGE_SIZE); } + rptr = (rptr + DIV_ROUND_UP(size, GSP_PAGE_SIZE)) % gsp->msgq.cnt; + mb(); (*gsp->msgq.rptr) = rptr; return msg; -- 2.34.1
Zhi Wang
2024-Oct-17 07:19 UTC
[PATCH 2/3] nvkm: correctly calculate the available space of the GSP cmdq buffer
r535_gsp_cmdq_push() waits for the available page in the GSP cmdq buffer when handling a large RPC request. When it sees at least one available page in the cmdq, it quits the waiting with the amount of free buffer pages in the queue. Unfortunately, it always takes the [write pointer, buf_size) as available buffer pages before rolling back and wrongly calculates the size of the data should be copied. Thus, it can overwrite the RPC request that GSP is currently reading, which causes GSP hang due to corrupted RPC request: [ 549.209389] ------------[ cut here ]------------ [ 549.214010] WARNING: CPU: 8 PID: 6314 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:116 r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.225678] Modules linked in: nvkm(E+) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) mlx5_ib(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) mxm_wmi(E) ipmi_devintf(E) rapl(E) i2c_piix4(E) wmi_bmof(E) joydev(E) ptdma(E) acpi_cpufreq(E) k10temp(E) pcspkr(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) crct10dif_pclmul(E) drm_shmem_helper(E) nvme_tcp(E) crc32_pclmul(E) ahci(E) drm_kms_helper(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) nvme(E) cdc_ether(E) mlx5_core(E) nvme_core(E) usbnet(E) drm(E) libata(E) ccp(E) ghash_clmulni_intel(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E) [ 549.225752] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)] [ 549.326293] CPU: 8 PID: 6314 Comm: insmod Tainted: G E 6.9.0-rc6+ #1 [ 549.334039] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022 [ 549.341781] RIP: 0010:r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.347343] Code: 08 00 00 89 da c1 e2 0c 48 8d ac 11 00 10 00 00 48 8b 0c 24 48 85 c9 74 1f c1 e0 0c 4c 8d 6d 30 83 e8 30 89 01 e9 68 ff ff ff <0f> 0b 49 c7 c5 92 ff ff ff e9 5a ff ff ff ba ff ff ff ff be c0 0c [ 549.366090] RSP: 0018:ffffacbccaaeb7d0 EFLAGS: 00010246 [ 549.371315] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000923e28 [ 549.378451] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffacbccaaeb730 [ 549.385590] RBP: 0000000000000001 R08: ffff8bd14d235f70 R09: ffff8bd14d235f70 [ 549.392721] R10: 0000000000000002 R11: ffff8bd14d233864 R12: 0000000000000020 [ 549.399854] R13: ffffacbccaaeb818 R14: 0000000000000020 R15: ffff8bb298c67000 [ 549.406988] FS: 00007f5179244740(0000) GS:ffff8bd14d200000(0000) knlGS:0000000000000000 [ 549.415076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.420829] CR2: 00007fa844000010 CR3: 00000001567dc005 CR4: 0000000000770ef0 [ 549.427963] PKRU: 55555554 [ 549.430672] Call Trace: [ 549.433126] <TASK> [ 549.435233] ? __warn+0x7f/0x130 [ 549.438473] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.443426] ? report_bug+0x18a/0x1a0 [ 549.447098] ? handle_bug+0x3c/0x70 [ 549.450589] ? exc_invalid_op+0x14/0x70 [ 549.454430] ? asm_exc_invalid_op+0x16/0x20 [ 549.458619] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.463565] r535_gsp_msg_recv+0x46/0x230 [nvkm] [ 549.468257] r535_gsp_rpc_push+0x106/0x160 [nvkm] [ 549.473033] r535_gsp_rpc_rm_ctrl_push+0x40/0x130 [nvkm] [ 549.478422] nvidia_grid_init_vgpu_types+0xbc/0xe0 [nvkm] [ 549.483899] nvidia_grid_init+0xb1/0xd0 [nvkm] [ 549.488420] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.493213] nvkm_device_pci_probe+0x305/0x420 [nvkm] [ 549.498338] local_pci_probe+0x46/0xa0 [ 549.502096] pci_call_probe+0x56/0x170 [ 549.505851] pci_device_probe+0x79/0xf0 [ 549.509690] ? driver_sysfs_add+0x59/0xc0 [ 549.513702] really_probe+0xd9/0x380 [ 549.517282] __driver_probe_device+0x78/0x150 [ 549.521640] driver_probe_device+0x1e/0x90 [ 549.525746] __driver_attach+0xd2/0x1c0 [ 549.529594] ? __pfx___driver_attach+0x10/0x10 [ 549.534045] bus_for_each_dev+0x78/0xd0 [ 549.537893] bus_add_driver+0x112/0x210 [ 549.541750] driver_register+0x5c/0x120 [ 549.545596] ? __pfx_nvkm_init+0x10/0x10 [nvkm] [ 549.550224] do_one_initcall+0x44/0x300 [ 549.554063] ? do_init_module+0x23/0x240 [ 549.557989] do_init_module+0x64/0x240 Calculate the available buffer page before rolling back based on the result from the waiting. Signed-off-by: Zhi Wang <zhiw at nvidia.com> --- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index 736cde1987d0..50ae56013344 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -161,7 +161,7 @@ r535_gsp_cmdq_push(struct nvkm_gsp *gsp, void *argv) u64 *end; u64 csum = 0; int free, time = 1000000; - u32 wptr, size; + u32 wptr, size, step; u32 off = 0; argc = ALIGN(GSP_MSG_HDR_SIZE + argc, GSP_PAGE_SIZE); @@ -195,7 +195,9 @@ r535_gsp_cmdq_push(struct nvkm_gsp *gsp, void *argv) } cqe = (void *)((u8 *)gsp->shm.cmdq.ptr + 0x1000 + wptr * 0x1000); - size = min_t(u32, argc, (gsp->cmdq.cnt - wptr) * GSP_PAGE_SIZE); + step = min_t(u32, free, (gsp->cmdq.cnt - wptr)); + size = min_t(u32, argc, step * GSP_PAGE_SIZE); + memcpy(cqe, (u8 *)cmd + off, size); wptr += DIV_ROUND_UP(size, 0x1000); -- 2.34.1
The max RPC size is 16 pages (including the RPC header). To send an RPC larger than 16 pages, nvkm should split it into multiple RPCs and send it accordingly. The first of the split RPCs has the expected function number, while the rest of the split RPCs are sent with function number as NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD. GSP will consume the split RPCs from the cmdq and always write the result back to the msgq. The result is also formed as split RPCs. However, NVKM is able to send split RPC when dealing with large RPCs, but totally not aware of handling the return of the large RPCs, which are the split RPC in the msgq. Thus, it keeps dumping the unknown RPC messages from msgq, which is actually CONTINUATION_RECORD message, discard them unexpectly. Thus, the caller will not be able to consume the result from GSP. Introduce the handling of split RPCs on the msgq path. Slightly re-factor the low-level part of receiving RPCs from the msgq, RPC vehicle handling to merge the split RPCs back into a large RPC before handling it to the upper level. Thus, the upper-level of RPC APIs don't need to be heavily changed. Signed-off-by: Zhi Wang <zhiw at nvidia.com> --- .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 237 +++++++++++++----- 1 file changed, 177 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index 50ae56013344..9c422644c9e7 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -72,6 +72,21 @@ struct r535_gsp_msg { #define GSP_MSG_HDR_SIZE offsetof(struct r535_gsp_msg, data) +struct nvfw_gsp_rpc { + u32 header_version; + u32 signature; + u32 length; + u32 function; + u32 rpc_result; + u32 rpc_result_private; + u32 sequence; + union { + u32 spare; + u32 cpuRmGfid; + }; + u8 data[]; +}; + static int r535_rpc_status_to_errno(uint32_t rpc_status) { @@ -86,16 +101,34 @@ r535_rpc_status_to_errno(uint32_t rpc_status) } } +struct gsp_msgq_recv_args { + /* timeout in us */ + int time; + /* if set, peek the msgq, otherwise copy it */ + u32 *prepc; + /* + * the size (without message header) of message to + * wait(when peek)/copy from the msgq + */ + u32 repc; + /* the message buffer */ + u8 *msg; + /* + * skip copying the rpc header, used when handling a large RPC. + * rpc header only shows up in the first segment of a large RPC. + */ + bool skip_copy_rpc_header; +}; + static void * -r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) +gsp_msgq_recv(struct nvkm_gsp *gsp, struct gsp_msgq_recv_args *args) { struct r535_gsp_msg *mqe; - u32 size, rptr = *gsp->msgq.rptr; + u32 rptr = *gsp->msgq.rptr; int used; - u8 *msg; - u32 len; + u32 size, len, repc; - size = DIV_ROUND_UP(GSP_MSG_HDR_SIZE + repc, GSP_PAGE_SIZE); + size = DIV_ROUND_UP(GSP_MSG_HDR_SIZE + args->repc, GSP_PAGE_SIZE); if (WARN_ON(!size || size >= gsp->msgq.cnt)) return ERR_PTR(-EINVAL); @@ -109,46 +142,149 @@ r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) break; usleep_range(1, 2); - } while (--(*ptime)); + } while (--(args->time)); - if (WARN_ON(!*ptime)) + if (WARN_ON(!args->time)) return ERR_PTR(-ETIMEDOUT); mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + rptr * 0x1000); - if (prepc) { - *prepc = (used * GSP_PAGE_SIZE) - sizeof(*mqe); + if (args->prepc) { + *args->prepc = (used * GSP_PAGE_SIZE) - sizeof(*mqe); return mqe->data; } + repc = args->repc; size = ALIGN(repc + GSP_MSG_HDR_SIZE, GSP_PAGE_SIZE); - msg = kvmalloc(repc, GFP_KERNEL); - if (!msg) - return ERR_PTR(-ENOMEM); - len = ((gsp->msgq.cnt - rptr) * GSP_PAGE_SIZE) - sizeof(*mqe); len = min_t(u32, repc, len); - memcpy(msg, mqe->data, len); + if (!args->skip_copy_rpc_header) + memcpy(args->msg, mqe->data, len); + else + memcpy(args->msg, mqe->data + sizeof(struct nvfw_gsp_rpc), + len - sizeof(struct nvfw_gsp_rpc)); repc -= len; if (repc) { mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000); - memcpy(msg + len, mqe, repc); + memcpy(args->msg + len, mqe, repc); } rptr = (rptr + DIV_ROUND_UP(size, GSP_PAGE_SIZE)) % gsp->msgq.cnt; mb(); (*gsp->msgq.rptr) = rptr; - return msg; + return args->msg; +} + +static void +r535_gsp_msg_dump(struct nvkm_gsp *gsp, struct nvfw_gsp_rpc *msg, int lvl) +{ + if (gsp->subdev.debug >= lvl) { + nvkm_printk__(&gsp->subdev, lvl, info, + "msg fn:%d len:0x%x/0x%zx res:0x%x resp:0x%x\n", + msg->function, msg->length, msg->length - sizeof(*msg), + msg->rpc_result, msg->rpc_result_private); + print_hex_dump(KERN_INFO, "msg: ", DUMP_PREFIX_OFFSET, 16, 1, + msg->data, msg->length - sizeof(*msg), true); + } +} + +static void * +r535_gsp_msgq_recv_continuation(struct nvkm_gsp *gsp, u32 *payload_size, + u8 *buf, int time) +{ + struct nvkm_subdev *subdev = &gsp->subdev; + struct nvfw_gsp_rpc *msg; + struct gsp_msgq_recv_args args = { 0 }; + u32 size; + + /* Peek the header of message */ + args.time = time; + args.repc = sizeof(*msg); + args.prepc = &size; + + msg = gsp_msgq_recv(gsp, &args); + if (IS_ERR_OR_NULL(msg)) + return msg; + + if (msg->function != NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD) { + nvkm_error(subdev, "Not a continuation of a large RPC\n"); + r535_gsp_msg_dump(gsp, msg, NV_DBG_ERROR); + return ERR_PTR(-EIO); + } + + *payload_size = msg->length - sizeof(*msg); + + /* Recv the continuation message */ + args.time = time; + args.repc = msg->length; + args.prepc = NULL; + args.msg = buf; + args.skip_copy_rpc_header = true; + + return gsp_msgq_recv(gsp, &args); } static void * -r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 repc, int *ptime) +r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 msg_repc, u32 total_repc, + int time) { - return r535_gsp_msgq_wait(gsp, repc, NULL, ptime); + struct gsp_msgq_recv_args args = { 0 }; + struct nvfw_gsp_rpc *msg; + const u32 max_msg_size = (16 * 0x1000) - sizeof(struct r535_gsp_msg); + const u32 max_rpc_size = max_msg_size - sizeof(*msg); + u32 repc = total_repc; + u8 *buf, *next; + + if (WARN_ON(msg_repc > max_msg_size)) + return NULL; + + buf = kvmalloc(max_t(u32, msg_repc, total_repc + sizeof(*msg)), GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + /* Recv the message */ + args.time = time; + args.repc = msg_repc; + args.prepc = NULL; + args.msg = buf; + args.skip_copy_rpc_header = false; + + msg = gsp_msgq_recv(gsp, &args); + if (IS_ERR_OR_NULL(msg)) { + kfree(buf); + return msg; + } + + if (total_repc <= max_rpc_size) + return buf; + + /* Gather the message from the following continuation messages. */ + next = buf; + + next += msg_repc; + repc -= msg_repc - sizeof(*msg); + + while (repc) { + struct nvfw_gsp_rpc *cont_msg; + u32 size; + + cont_msg = r535_gsp_msgq_recv_continuation(gsp, &size, next, + time); + if (IS_ERR_OR_NULL(cont_msg)) { + kfree(buf); + return cont_msg; + } + repc -= size; + next += size; + } + + /* Patch the message length. The caller sees a consolidated message */ + msg->length = total_repc + sizeof(*msg); + return buf; } static int @@ -234,54 +370,33 @@ r535_gsp_cmdq_get(struct nvkm_gsp *gsp, u32 argc) return cmd->data; } -struct nvfw_gsp_rpc { - u32 header_version; - u32 signature; - u32 length; - u32 function; - u32 rpc_result; - u32 rpc_result_private; - u32 sequence; - union { - u32 spare; - u32 cpuRmGfid; - }; - u8 data[]; -}; - static void r535_gsp_msg_done(struct nvkm_gsp *gsp, struct nvfw_gsp_rpc *msg) { kvfree(msg); } -static void -r535_gsp_msg_dump(struct nvkm_gsp *gsp, struct nvfw_gsp_rpc *msg, int lvl) -{ - if (gsp->subdev.debug >= lvl) { - nvkm_printk__(&gsp->subdev, lvl, info, - "msg fn:%d len:0x%x/0x%zx res:0x%x resp:0x%x\n", - msg->function, msg->length, msg->length - sizeof(*msg), - msg->rpc_result, msg->rpc_result_private); - print_hex_dump(KERN_INFO, "msg: ", DUMP_PREFIX_OFFSET, 16, 1, - msg->data, msg->length - sizeof(*msg), true); - } -} - static struct nvfw_gsp_rpc * r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 repc) { struct nvkm_subdev *subdev = &gsp->subdev; + struct gsp_msgq_recv_args args = { 0 }; struct nvfw_gsp_rpc *msg; int time = 4000000, i; u32 size; retry: - msg = r535_gsp_msgq_wait(gsp, sizeof(*msg), &size, &time); + /* Peek the header of message */ + args.time = time; + args.repc = sizeof(*msg); + args.prepc = &size; + + msg = gsp_msgq_recv(gsp, &args); if (IS_ERR_OR_NULL(msg)) return msg; - msg = r535_gsp_msgq_recv(gsp, msg->length, &time); + /* Recv the message */ + msg = r535_gsp_msgq_recv(gsp, msg->length, repc, time); if (IS_ERR_OR_NULL(msg)) return msg; @@ -734,6 +849,7 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc) mutex_lock(&gsp->cmdq.mutex); if (rpc_size > max_rpc_size) { const u32 fn = rpc->function; + u32 remain_rpc_size = rpc_size; /* Adjust length, and send initial RPC. */ rpc->length = sizeof(*rpc) + max_rpc_size; @@ -744,11 +860,11 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc) goto done; argv += max_rpc_size; - rpc_size -= max_rpc_size; + remain_rpc_size -= max_rpc_size; /* Remaining chunks sent as CONTINUATION_RECORD RPCs. */ - while (rpc_size) { - u32 size = min(rpc_size, max_rpc_size); + while (remain_rpc_size) { + u32 size = min(remain_rpc_size, max_rpc_size); void *next; next = r535_gsp_rpc_get(gsp, NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD, size); @@ -764,19 +880,20 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc) goto done; argv += size; - rpc_size -= size; + remain_rpc_size -= size; } /* Wait for reply. */ - if (wait) { - rpc = r535_gsp_msg_recv(gsp, fn, repc); - if (!IS_ERR_OR_NULL(rpc)) + rpc = r535_gsp_msg_recv(gsp, fn, rpc_size); + if (!IS_ERR_OR_NULL(rpc)) { + if (wait) repv = rpc->data; - else - repv = rpc; - } else { - repv = NULL; - } + else { + nvkm_gsp_rpc_done(gsp, rpc); + repv = NULL; + } + } else + repv = wait ? rpc : NULL; } else { repv = r535_gsp_rpc_send(gsp, argv, wait, repc); } -- 2.34.1
On Thu, Oct 17, 2024 at 12:19:19AM -0700, Zhi Wang wrote:> Hi folks: > > Here are some fixes of weird bugs I encountered when I was enabling vGPU [1] on > core-driver aka NVKM. They are exposed because of the new RPCs required by > vGPU. > > For testing, I tried to run Uniengine Heaven[2] on my RTX 4060 for 8 hours and > the GL CTS runner[3] (commandline: ./cts-runner --type-gl40) from Khronos > without any problem. > > v2: > > - Remove the Fixes: tags as the vanilla nouveau aren't going to hit these bugs. (Danilo) > - Test the patchset on VK-GL-CTS. (Danilo) > - Remove the ambiguous empty line in PATCH 2. (Danilo) > - Rename the r535_gsp_msgq_wait to gsp_msgq_recv. (Danilo) > - Introduce a data structure to hold the params of gsp_msgq_recv(). (Danilo) > - Document the params and the states they are related to. (Danilo)When you send a v2, please make sure to pass `-v2` to `git format-patch`, otherwise it's hard to distinguish patch versions from their subject.> > [1] https://lore.kernel.org/kvm/20240922124951.1946072-1-zhiw at nvidia.com/T/#t > [2] https://benchmark.unigine.com/heaven > [3] https://github.com/KhronosGroup/VK-GL-CTS > > Zhi Wang (3): > nvkm/gsp: correctly advance the read pointer of GSP message queue > nvkm: correctly calculate the available space of the GSP cmdq buffer > nvkm: handle the return of large RPC > > .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 251 +++++++++++++----- > 1 file changed, 184 insertions(+), 67 deletions(-) > > -- > 2.34.1 >
On 10/17/24 9:19 AM, Zhi Wang wrote:> Hi folks: > > Here are some fixes of weird bugs I encountered when I was enabling vGPU [1] on > core-driver aka NVKM. They are exposed because of the new RPCs required by > vGPU. > > For testing, I tried to run Uniengine Heaven[2] on my RTX 4060 for 8 hours and > the GL CTS runner[3] (commandline: ./cts-runner --type-gl40) from Khronos > without any problem. > > v2: > > - Remove the Fixes: tags as the vanilla nouveau aren't going to hit these bugs. (Danilo) > - Test the patchset on VK-GL-CTS. (Danilo) > - Remove the ambiguous empty line in PATCH 2. (Danilo) > - Rename the r535_gsp_msgq_wait to gsp_msgq_recv. (Danilo) > - Introduce a data structure to hold the params of gsp_msgq_recv(). (Danilo) > - Document the params and the states they are related to. (Danilo) > > [1] https://lore.kernel.org/kvm/20240922124951.1946072-1-zhiw at nvidia.com/T/#t > [2] https://benchmark.unigine.com/heaven > [3] https://github.com/KhronosGroup/VK-GL-CTS > > Zhi Wang (3): > nvkm/gsp: correctly advance the read pointer of GSP message queue > nvkm: correctly calculate the available space of the GSP cmdq bufferApplied patches 1 and 2 to drm-misc-next, thanks! Sorry for the delay, Danilo> nvkm: handle the return of large RPC > > .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 251 +++++++++++++----- > 1 file changed, 184 insertions(+), 67 deletions(-) >