Hi, I hope it's a right place to ask. I'm looking for a specific compiler transformation, that I'm not sure it exists, and I hoped someone could point me in the right direction. I think the easiest way to explain it is by using an example: Example (1): void zero(int64_t A, int32_t N) { for(int32_t i = 0; i < N; i++) { ((int*)A)[i] = 0; } } Example (2): void zero(int64_t A, int32_t N) { for(int32_t i = 0; i < N; i++) { *(int*)(A + i * 4) = 0; } } In principle both snippets do the same thing, they set zero to N elements of an array A (passed as an address of A). In (1) the compiler (armv8-a clang 11.0.1) can detect that the operation is a memset, and when compiled with -O3 it indeed emits a call to memset. However, the compiler cannot optimize code in (2). The big difference in emitted LLVM IR (when no optimizations are enabled) is that in (1) the compiler emits inttoptr to cast the address of A followed by getelementptr. Whereas for (2) the address of the array element is first calculated using basic integer operations (mul and add) and only then casted to pointer using inttoptr. I believe that the compiler can optimize (1) as getelementptr provides certain guarantees that enable safe optimizations, but simple cast of the element address with inttoptr is too broad. So, the question is if there is a transformation (pointers to proof-of-concept academic impelemntation would be equally appreciated) in LLVM toolchain that can transform problem (2) into (1), so it can be further optimized. I understand this may be a complex problem in the genral case, so asking for such transformation may sound a bit naive, but I thought I'd try. Many thanks, Igor Wodiany -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220118/c4b1a1a2/attachment.html>
Jessica Clarke via llvm-dev
2022-Jan-18 18:15 UTC
[llvm-dev] Memory accesses transformation
On 18 Jan 2022, at 17:59, Igor Wodiany via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi, > > I hope it's a right place to ask. I'm looking for a specific compiler transformation, that I'm not sure it exists, and I hoped someone could point me in the right direction. > > I think the easiest way to explain it is by using an example: > > Example (1): > > void zero(int64_t A, int32_t N) { > for(int32_t i = 0; i < N; i++) { > ((int*)A)[i] = 0; > } > } > > Example (2): > > void zero(int64_t A, int32_t N) { > for(int32_t i = 0; i < N; i++) { > *(int*)(A + i * 4) = 0; > } > } > > In principle both snippets do the same thing, they set zero to N elements of an array A (passed as an address of A). In (1) the compiler (armv8-a clang 11.0.1) can detect that the operation is a memset, and when compiled with -O3 it indeed emits a call to memset. However, the compiler cannot optimize code in (2). > > The big difference in emitted LLVM IR (when no optimizations are enabled) is that in (1) the compiler emits inttoptr to cast the address of A followed by getelementptr. Whereas for (2) the address of the array element is first calculated using basic integer operations (mul and add) and only then casted to pointer using inttoptr. I believe that the compiler can optimize (1) as getelementptr provides certain guarantees that enable safe optimizations, but simple cast of the element address with inttoptr is too broad. > > So, the question is if there is a transformation (pointers to proof-of-concept academic impelemntation would be equally appreciated) in LLVM toolchain that can transform problem (2) into (1), so it can be further optimized. I understand this may be a complex problem in the genral case, so asking for such transformation may sound a bit naive, but I thought I'd try. > > Many thanks, > Igor WodianyI don’t think such a transformation would be sound in the case that you have two adjacent objects that A ends up covering in the latter case. In the former case by always using A as the pointer you’re only allowed to access whatever object A refers to, but in the latter case because you do the arithmetic first you’re allowed to access whatever object A + i*4 refers to, and thus can straddle multiple objects if they happen to be adjacent. So unless it can prove that’s not the case, which it can’t without more context, that’s all it can legally do. But I could be wrong, uintptr_t is messy and ill-defined. Jess