Ees via llvm-dev
2020-Sep-23 10:27 UTC
[llvm-dev] Information about the number of indices in memory accesses
Hi all, For loads and stores i want to extract information about the number of indices accessed. For instance: struct S {int X, int *Y}; __global__ void kernel(int *A, int **B, struct S) { int x = A[..][..]; // -> L: A[..][..] int y = *B[2]; // -> L: B[0][2] int z = S.y[..]; // -> L: S.1[..] // etc.. } I am performing some preprocessing on IR to: 1. Move constant inline GEPs into instructions 2. For loads and stores without a GEP operand, explicitly create a (trivial) GEP with index 0 So now the operand of every load and store is a GEP instruction. For simple stuff i am getting the right answer but when the index expression becomes more complex multiple GEPs are introduced. For instance: *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; produces: %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %8 = mul i32 %6, %7, %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() %10 = add i32 %8, %9, %11 = add i32 %10, 1, %12 = mul i32 2, %11, %13 = zext i32 %12 to i64, %14 = getelementptr inbounds i32, i32* %0, i64 %13 %15 = getelementptr inbounds i32, i32* %14, i64 2 %16 = getelementptr inbounds i32, i32* %15, i64 3 store i32 5, i32* %16, align 4, So i guess relying on the number of GEPs to figure the number of indices is only a heuristic. Is there a more robust way to go on about it? Or some example i can look at? PS: I'm only interested about CUDA kernels. Ees
Ees via llvm-dev
2020-Oct-02 23:54 UTC
[llvm-dev] Information about the number of indices in memory accesses
Anyone? I'd really appreciate any hints to look up as i'm somewhat stuck with this. Cheers. On 23-09-2020 12:27, Ees wrote:> Hi all, > > For loads and stores i want to extract information about the number of > indices accessed. For instance: > > struct S {int X, int *Y}; > > __global__ void kernel(int *A, int **B, struct S) { > int x = A[..][..]; // -> L: A[..][..] > int y = *B[2]; // -> L: B[0][2] > int z = S.y[..]; // -> L: S.1[..] > > // etc.. > } > > I am performing some preprocessing on IR to: > 1. Move constant inline GEPs into instructions > 2. For loads and stores without a GEP operand, explicitly create a > (trivial) GEP with index 0 > > So now the operand of every load and store is a GEP instruction. > > For simple stuff i am getting the right answer but when the index > expression becomes more complex multiple GEPs are introduced. For > instance: > > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; > > produces: > > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() > %8 = mul i32 %6, %7, > %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() > %10 = add i32 %8, %9, > %11 = add i32 %10, 1, > %12 = mul i32 2, %11, > %13 = zext i32 %12 to i64, > %14 = getelementptr inbounds i32, i32* %0, i64 %13 > %15 = getelementptr inbounds i32, i32* %14, i64 2 > %16 = getelementptr inbounds i32, i32* %15, i64 3 > store i32 5, i32* %16, align 4, > > So i guess relying on the number of GEPs to figure the number of > indices is only a heuristic. Is there a more robust way to go on about > it? Or some example i can look at? > > PS: I'm only interested about CUDA kernels. > > Ees >
Stefanos Baziotis via llvm-dev
2020-Oct-03 00:24 UTC
[llvm-dev] Information about the number of indices in memory accesses
Hi Ees, SCEV Delinearization is the closest I know. But it has its problems. Well for one your expression should be SCEVable. But more importantly, SCEV Delinearization is trying to deduce something that is high-level (actually source-level) from a low-level IR in which a lot of this info has been lost. So, since there's not a 1-1 mapping from high-level code to LLVM IR, going backwards will always be imperfect. And so since you are too trying to deduce a high-level thing, I believe that any solution will be imperfect. Best, Stefanos On Sat, Oct 3, 2020, 02:55 Ees via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Anyone? I'd really appreciate any hints to look up as i'm somewhat stuck > with this. > > Cheers. > > On 23-09-2020 12:27, Ees wrote: > > Hi all, > > > > For loads and stores i want to extract information about the number of > > indices accessed. For instance: > > > > struct S {int X, int *Y}; > > > > __global__ void kernel(int *A, int **B, struct S) { > > int x = A[..][..]; // -> L: A[..][..] > > int y = *B[2]; // -> L: B[0][2] > > int z = S.y[..]; // -> L: S.1[..] > > > > // etc.. > > } > > > > I am performing some preprocessing on IR to: > > 1. Move constant inline GEPs into instructions > > 2. For loads and stores without a GEP operand, explicitly create a > > (trivial) GEP with index 0 > > > > So now the operand of every load and store is a GEP instruction. > > > > For simple stuff i am getting the right answer but when the index > > expression becomes more complex multiple GEPs are introduced. For > > instance: > > > > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; > > > > produces: > > > > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() > > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() > > %8 = mul i32 %6, %7, > > %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() > > %10 = add i32 %8, %9, > > %11 = add i32 %10, 1, > > %12 = mul i32 2, %11, > > %13 = zext i32 %12 to i64, > > %14 = getelementptr inbounds i32, i32* %0, i64 %13 > > %15 = getelementptr inbounds i32, i32* %14, i64 2 > > %16 = getelementptr inbounds i32, i32* %15, i64 3 > > store i32 5, i32* %16, align 4, > > > > So i guess relying on the number of GEPs to figure the number of > > indices is only a heuristic. Is there a more robust way to go on about > > it? Or some example i can look at? > > > > PS: I'm only interested about CUDA kernels. > > > > Ees > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201003/1c3576db/attachment.html>
Reasonably Related Threads
- Information about the number of indices in memory accesses
- Information about the number of indices in memory accesses
- [LLVMdev] How to unroll reduction loop with caching accumulator on register?
- [LLVMdev] How to unroll reduction loop with caching accumulator on register?
- [LLVMdev] Attaching range metadata to IntrinsicInst