thr3ads.net - llvm dev - [LLVMdev] Replacing a repetitive sequence of code with a loop [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Frank Winter

2015-Jun-03 17:37 UTC

[LLVMdev] Replacing a repetitive sequence of code with a loop

Hey guys, in an HPC project I am working on I am given an LLVM program 
consisting of a linear sequence of repetitive junks of code with an 
uniform memory access pattern. Each code junk does the following: 1) 
loads some memory, 2) performs some arithmetic operations, 3) stores the 
result back to memory. The memory stride between consecutive junks is 
constant over the whole program, thus the whole program could be 
replaced with a single loop with it's loop body containing a generic 
version of said code junk.Here's an example (a short one, the real world 
program would be much longer):

define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1, 
float* noalias %arg2) {
entrypoint:
   %0 = bitcast float* %arg1 to <4 x float>*
   %1 = bitcast float* %arg2 to <4 x float>*
   %2 = load <4 x float>* %0, align 16
   %3 = load <4 x float>* %1, align 16
   %4 = fadd <4 x float> %3, %2
   %5 = bitcast float* %arg0 to <4 x float>*
   store <4 x float> %4, <4 x float>* %5, align 16
   %6 = getelementptr float* %arg1, i64 4
   %7 = getelementptr float* %arg2, i64 4
   %8 = getelementptr float* %arg0, i64 4
   %9 = bitcast float* %6 to <4 x float>*
   %10 = bitcast float* %7 to <4 x float>*
   %11 = load <4 x float>* %9, align 16
   %12 = load <4 x float>* %10, align 16
   %13 = fadd <4 x float> %12, %11
   %14 = bitcast float* %8 to <4 x float>*
   store <4 x float> %13, <4 x float>* %14, align 16
   %15 = getelementptr float* %arg1, i64 8
   %16 = getelementptr float* %arg2, i64 8
   %17 = getelementptr float* %arg0, i64 8
   %18 = bitcast float* %15 to <4 x float>*
   %19 = bitcast float* %16 to <4 x float>*
   %20 = load <4 x float>* %18, align 16
   %21 = load <4 x float>* %19, align 16
   %22 = fadd <4 x float> %21, %20
   %23 = bitcast float* %17 to <4 x float>*
   store <4 x float> %22, <4 x float>* %23, align 16
   %24 = getelementptr float* %arg1, i64 12
   %25 = getelementptr float* %arg2, i64 12
   %26 = getelementptr float* %arg0, i64 12
   %27 = bitcast float* %24 to <4 x float>*
   %28 = bitcast float* %25 to <4 x float>*
   %29 = load <4 x float>* %27, align 16
   %30 = load <4 x float>* %28, align 16
   %31 = fadd <4 x float> %30, %29
   %32 = bitcast float* %26 to <4 x float>*
   store <4 x float> %31, <4 x float>* %32, align 16
   ret void
}

The stride between two consecutive junks is 4*sizeof(float), thus could 
be condensed into a single loop. (The scenario with the constant stride 
between repetitive code junks is the first, most simple version. Later I 
will have to deal with arbitrary strides between junks. Requires an 
index array stored in the code as static data I guess)...

For this simple scenario what already existent LLVM pass would be 
closest to what I am trying to achieve?

Thanks,
Frank

Benjamin Kramer

2015-Jun-03 18:57 UTC

head link

[LLVMdev] Replacing a repetitive sequence of code with a loop

> On 03.06.2015, at 19:37, Frank Winter <fwinter at jlab.org> wrote:
> 
> Hey guys, in an HPC project I am working on I am given an LLVM program
consisting of a linear sequence of repetitive junks of code with an uniform
memory access pattern. Each code junk does the following: 1) loads some memory,
2) performs some arithmetic operations, 3) stores the result back to memory. The
memory stride between consecutive junks is constant over the whole program, thus
the whole program could be replaced with a single loop with it's loop body
containing a generic version of said code junk.Here's an example (a short
one, the real world program would be much longer):
> 
> define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1,
float* noalias %arg2) {
> entrypoint:
>  %0 = bitcast float* %arg1 to <4 x float>*
>  %1 = bitcast float* %arg2 to <4 x float>*
>  %2 = load <4 x float>* %0, align 16
>  %3 = load <4 x float>* %1, align 16
>  %4 = fadd <4 x float> %3, %2
>  %5 = bitcast float* %arg0 to <4 x float>*
>  store <4 x float> %4, <4 x float>* %5, align 16
>  %6 = getelementptr float* %arg1, i64 4
>  %7 = getelementptr float* %arg2, i64 4
>  %8 = getelementptr float* %arg0, i64 4
>  %9 = bitcast float* %6 to <4 x float>*
>  %10 = bitcast float* %7 to <4 x float>*
>  %11 = load <4 x float>* %9, align 16
>  %12 = load <4 x float>* %10, align 16
>  %13 = fadd <4 x float> %12, %11
>  %14 = bitcast float* %8 to <4 x float>*
>  store <4 x float> %13, <4 x float>* %14, align 16
>  %15 = getelementptr float* %arg1, i64 8
>  %16 = getelementptr float* %arg2, i64 8
>  %17 = getelementptr float* %arg0, i64 8
>  %18 = bitcast float* %15 to <4 x float>*
>  %19 = bitcast float* %16 to <4 x float>*
>  %20 = load <4 x float>* %18, align 16
>  %21 = load <4 x float>* %19, align 16
>  %22 = fadd <4 x float> %21, %20
>  %23 = bitcast float* %17 to <4 x float>*
>  store <4 x float> %22, <4 x float>* %23, align 16
>  %24 = getelementptr float* %arg1, i64 12
>  %25 = getelementptr float* %arg2, i64 12
>  %26 = getelementptr float* %arg0, i64 12
>  %27 = bitcast float* %24 to <4 x float>*
>  %28 = bitcast float* %25 to <4 x float>*
>  %29 = load <4 x float>* %27, align 16
>  %30 = load <4 x float>* %28, align 16
>  %31 = fadd <4 x float> %30, %29
>  %32 = bitcast float* %26 to <4 x float>*
>  store <4 x float> %31, <4 x float>* %32, align 16
>  ret void
> }
> 
> The stride between two consecutive junks is 4*sizeof(float), thus could be
condensed into a single loop. (The scenario with the constant stride between
repetitive code junks is the first, most simple version. Later I will have to
deal with arbitrary strides between junks. Requires an index array stored in the
code as static data I guess)...
> 
> For this simple scenario what already existent LLVM pass would be closest
to what I am trying to achieve?
There's a loop reroll pass in LLVM trunk that should do exactly this
transformation.

- Ben

Renato Golin

2015-Jun-03 20:13 UTC

head link

[LLVMdev] Replacing a repetitive sequence of code with a loop

On 3 June 2015 at 19:57, Benjamin Kramer <benny.kra at gmail.com>
wrote:> There's a loop reroll pass in LLVM trunk that should do exactly this
transformation.
Though that's a loop pass (runOnLoop). What you could do is add a
previous pass that would recognize the pattern and create a loop of 1
iteration around the code and then run the reroll pass.

If your pattern recognition is too good, it'll be redundant with the
reroll pass, so I'm not sure how to do that without duplicating effort
or destroying the IR (in case you're wrong). Adding 1-iteration loops
to all functions won't work either. You'll have to balance the
heuristics and make the pass optional, maybe with a pragma to help the
compiler.

cheers,
--renato

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Jun 2015 - [LLVMdev] Replacing a repetitive sequence of code with a loop

[LLVMdev] Replacing a repetitive sequence of code with a loop

[LLVMdev] Replacing a repetitive sequence of code with a loop

[LLVMdev] Replacing a repetitive sequence of code with a loop

Reasonably Related Threads