thr3ads.net - llvm dev - [llvm-dev] Memory accesses transformation [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Igor Wodiany via llvm-dev

2022-Jan-18 17:59 UTC

[llvm-dev] Memory accesses transformation

Hi,

I hope it's a right place to ask. I'm looking for a specific compiler
transformation, that I'm not sure it exists, and I hoped someone could point
me in the right direction.

I think the easiest way to explain it is by using an example:

Example (1):

void zero(int64_t A, int32_t N) {
    for(int32_t i = 0; i < N; i++) {
        ((int*)A)[i] = 0;
    }
}

Example (2):

void zero(int64_t A, int32_t N) {
    for(int32_t i = 0; i < N; i++) {
        *(int*)(A + i * 4) = 0;
    }
}

In principle both snippets do the same thing, they set zero to N elements of an
array A (passed as an address of A). In (1) the compiler (armv8-a clang 11.0.1)
can detect that the operation is a memset, and when compiled with -O3 it indeed
emits a call to memset. However, the compiler cannot optimize code in (2).

The big difference in emitted LLVM IR (when no optimizations are enabled) is
that in (1) the compiler emits inttoptr to cast the address of A followed by
getelementptr. Whereas for (2) the address of the array element is first
calculated using basic integer operations (mul and add) and only then casted to
pointer using inttoptr. I believe that the compiler can optimize (1) as
getelementptr provides certain guarantees that enable safe optimizations, but
simple cast of the element address with inttoptr is too broad.

So, the question is if there is a transformation (pointers to proof-of-concept
academic impelemntation would be equally appreciated) in LLVM toolchain that can
transform problem (2) into (1), so it can be further optimized. I understand
this may be a complex problem in the genral case, so asking for such
transformation may sound a bit naive, but I thought I'd try.

Many thanks,
Igor Wodiany


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220118/c4b1a1a2/attachment.html>

Jessica Clarke via llvm-dev

2022-Jan-18 18:15 UTC

head link

[llvm-dev] Memory accesses transformation

On 18 Jan 2022, at 17:59, Igor Wodiany via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Hi,
> 
> I hope it's a right place to ask. I'm looking for a specific
compiler transformation, that I'm not sure it exists, and I hoped someone
could point me in the right direction.
> 
> I think the easiest way to explain it is by using an example:
> 
> Example (1):
> 
> void zero(int64_t A, int32_t N) {
>     for(int32_t i = 0; i < N; i++) {
>         ((int*)A)[i] = 0;
>     }
> }
> 
> Example (2):
> 
> void zero(int64_t A, int32_t N) {
>     for(int32_t i = 0; i < N; i++) {
>         *(int*)(A + i * 4) = 0;
>     }
> }
> 
> In principle both snippets do the same thing, they set zero to N elements
of an array A (passed as an address of A). In (1) the compiler (armv8-a clang
11.0.1) can detect that the operation is a memset, and when compiled with -O3 it
indeed emits a call to memset. However, the compiler cannot optimize code in
(2).
> 
> The big difference in emitted LLVM IR (when no optimizations are enabled)
is that in (1) the compiler emits inttoptr to cast the address of A followed by
getelementptr. Whereas for (2) the address of the array element is first
calculated using basic integer operations (mul and add) and only then casted to
pointer using inttoptr. I believe that the compiler can optimize (1) as
getelementptr provides certain guarantees that enable safe optimizations, but
simple cast of the element address with inttoptr is too broad.
> 
> So, the question is if there is a transformation (pointers to
proof-of-concept academic impelemntation would be equally appreciated) in LLVM
toolchain that can transform problem (2) into (1), so it can be further
optimized. I understand this may be a complex problem in the genral case, so
asking for such transformation may sound a bit naive, but I thought I'd try.
> 
> Many thanks,
> Igor Wodiany
I don’t think such a transformation would be sound in the case that you
have two adjacent objects that A ends up covering in the latter case.
In the former case by always using A as the pointer you’re only allowed
to access whatever object A refers to, but in the latter case because
you do the arithmetic first you’re allowed to access whatever object A
+ i*4 refers to, and thus can straddle multiple objects if they happen
to be adjacent. So unless it can prove that’s not the case, which it
can’t without more context, that’s all it can legally do. But I could
be wrong, uintptr_t is messy and ill-defined.

Jess

llvm dev - Jan 2022 - Memory accesses transformation

[llvm-dev] Memory accesses transformation

[llvm-dev] Memory accesses transformation