Frank Winter via llvm-dev
2020-Apr-30 19:09 UTC
[llvm-dev] AMDGPU workgroup size as metadata
From LLVM IR, how can you get the 'workgroup size' value? It seems to be set by the AMDGPU backend as metadata since in AMDGPUMetadata.h there are things defined like constexpr char ReqdWorkGroupSize[] = "ReqdWorkGroupSize"; and struct Metadata final { /// 'reqd_work_group_size' attribute. Optional. std::vector<uint32_t> mReqdWorkGroupSize = std::vector<uint32_t>(); ... } Is this metadata set to the kernel function or to the module? What IR instructions would give access to the value of, say, the workgroup size in dimension x? Frank
Matt Arsenault via llvm-dev
2020-May-01 00:32 UTC
[llvm-dev] AMDGPU workgroup size as metadata
> On Apr 30, 2020, at 15:09, Frank Winter via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > From LLVM IR, how can you get the 'workgroup size' value? > It seems to be set by the AMDGPU backend as metadata since in AMDGPUMetadata.h there are things defined like > > constexpr char ReqdWorkGroupSize[] = "ReqdWorkGroupSize"; > > and > > struct Metadata final { > /// 'reqd_work_group_size' attribute. Optional. > std::vector<uint32_t> mReqdWorkGroupSize = std::vector<uint32_t>(); > ... > } > > Is this metadata set to the kernel function or to the module? > > What IR instructions would give access to the value of, say, the workgroup size in dimension x? > > > Frank >The code object metadata is only for statically known workgroup size information The metadata you found here corresponds to !reqd_work_group_size, corresponding to the OpenCL attribute of the same name. We have a variety of other static attributes useful related to workgroup sizes, as documented here: https://llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes <https://llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes>. The "uniform-work-group-size” (corresponding to the OpenCL flag -cl-uniform-work-group-size) may also be of interest. Dynamically, there isn’t a single instruction to get the group size and it depends on the runtime/driver how to implement it. You need to get a pointer to somewhere, and load from it. For HSA/ROCm, these are loaded from an ABI struct pointed to by a special kernel input SGPR. Recently the core implementation was moved into clang builtin so we can annotate the load with !range metadata: https://github.com/llvm/llvm-project/blob/a1bd5cd539f9e2fd34e522b848e751342985e882/clang/lib/CodeGen/CGBuiltin.cpp#L13985 <https://github.com/llvm/llvm-project/blob/a1bd5cd539f9e2fd34e522b848e751342985e882/clang/lib/CodeGen/CGBuiltin.cpp#L13985>. You can see how these are used here: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/workitem.cl <https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/workitem.cl> -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200430/f435d2a7/attachment.html>