thr3ads.net - llvm dev - [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Stéphane Letz

2013-Jul-05 14:50 UTC

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit :
> On 07/04/2013 01:39 PM, Stéphane Letz wrote:
>> Hi,
>> 
>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we
can vectorize the C produced code using clang with -O3, or clang with -O1 then
opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot
be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated
LLVM IR lacks some informations that are needed by the vectorization passes to
correctly work.
>> 
>> Any idea of what could be lacking?
> 
> Without any knowledge about the code guessing is hard. You may miss the
'noalias' keyword or nsw/nuw flags, but there are many possibilities.
> 
> If you add '-debug' to opt you may get some hints. Also, if you
have a small test case, posting the LLVM-IR may help.
> 
> Cheers,
> Tobias
> 

I did some progress:

1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW
is there any notion of "default" DataLayout that could be used? How is
a LLVM Module supposed to know which DataLayout to use (in general) ?

2) next the resulting module could not be vectorized with "opt -O3
-vectorize-loops -debug  -S m1.ll -o m2.ll", but if I do in "two
steps" like:

opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll

opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll

then it works….

Any idea?

Thanks.

Stéphane Letz

Arnold Schwaighofer

2013-Jul-05 15:23 UTC

head link

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote:
> 
> Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a
écrit :
> 
>> On 07/04/2013 01:39 PM, Stéphane Letz wrote:
>>> Hi,
>>> 
>>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3,
we can vectorize the C produced code using clang with -O3, or clang with -O1
then opt -O3 -vectorize-loops. But the same program generating LLVM IR version
cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our
generated LLVM IR lacks some informations that are needed by the vectorization
passes to correctly work.
>>> 
>>> Any idea of what could be lacking?
>> 
>> Without any knowledge about the code guessing is hard. You may miss the
'noalias' keyword or nsw/nuw flags, but there are many possibilities.
>> 
>> If you add '-debug' to opt you may get some hints. Also, if you
have a small test case, posting the LLVM-IR may help.
>> 
>> Cheers,
>> Tobias
>> 
> 
> 
> I did some progress:
> 
> 1) adding a DataLayout in my generated LLVM Module, explicitly as a string.
BTW is there any notion of "default" DataLayout that could be used?
How is a LLVM Module supposed to know which DataLayout to use (in general) ?
> 
> 2) next the resulting module could not be vectorized with "opt -O3
-vectorize-loops -debug  -S m1.ll -o m2.ll", but if I do in "two
steps" like:
> 
> opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll
> 
> opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll
> 
> then it works….
> 
> Any idea?
> 
> Thanks.
> 
> Stéphane Letz
Hi Stephane,

Move the alloca for “i" into the entry block.

The IR coming into the loop vectorizer looks something like the following. The
loop vectorizer can't recognize one of the phis as an induction or
reduction, so it gives up.

The reason why you have this “odd” phi is because SROA (which transforms allocas
into SSA variables) does not  get rid of the “i” variable (later passes do but
leave this odd IR around) because “i”’s alloca is not in the entry block - it
only works on allocas in the entry block.

opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll

LV: Found a loop: code_block8
LV: Found an induction variable.
LV: PHI is not a poly recurrence.
LV: Found an unidentified PHI.  %storemerge8 = phi i32 [ 0, %code_block8.lr.ph
], [ %next_index, %code_block8 ]
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing.

IR coming into the vectorizer:

code_block8:                                      ; preds = %code_block8.lr.ph,
%code_block8
  %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index,
%code_block8 ]
  %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8
]            ; <<< THIS phi is the problem.
  %20 = sext i32 %storemerge8 to i64
  %.sum = add i64 %20, %9
  %21 = getelementptr inbounds float* %11, i64 %.sum
  %22 = getelementptr inbounds float* %8, i64 %.sum
  %23 = load float* %22, align 4
  %24 = getelementptr inbounds float* %10, i64 %.sum
  %25 = load float* %24, align 4
  %26 = fadd float %23, %25
  store float %26, float* %21, align 4
  %next_index = add i32 %next_index10, 1
  %27 = icmp slt i32 %next_index, %16
  br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge

exec_block.return_crit_edge:                      ; preds = %exit_block6
  br label %return

return:                                           ; preds =
%exec_block.return_crit_edge, %block_code
  ret void
}

Stéphane Letz

2013-Jul-05 15:43 UTC

head link

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Le 5 juil. 2013 à 17:23, Arnold Schwaighofer <aschwaighofer at apple.com>
a écrit :
> 
> On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote:
> 
>> 
>> Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a
écrit :
>> 
>>> On 07/04/2013 01:39 PM, Stéphane Letz wrote:
>>>> Hi,
>>>> 
>>>> Our DSL can generate C or directly generate LLVM IR. With LLVM
3.3, we can vectorize the C produced code using clang with -O3, or clang with
-O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR
version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that
our generated LLVM IR lacks some informations that are needed by the
vectorization passes to correctly work.
>>>> 
>>>> Any idea of what could be lacking?
>>> 
>>> Without any knowledge about the code guessing is hard. You may miss
the 'noalias' keyword or nsw/nuw flags, but there are many
possibilities.
>>> 
>>> If you add '-debug' to opt you may get some hints. Also, if
you have a small test case, posting the LLVM-IR may help.
>>> 
>>> Cheers,
>>> Tobias
>>> 
>> 
>> 
>> I did some progress:
>> 
>> 1) adding a DataLayout in my generated LLVM Module, explicitly as a
string. BTW is there any notion of "default" DataLayout that could be
used? How is a LLVM Module supposed to know which DataLayout to use (in general)
?
>> 
>> 2) next the resulting module could not be vectorized with "opt -O3
-vectorize-loops -debug  -S m1.ll -o m2.ll", but if I do in "two
steps" like:
>> 
>> opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll
>> 
>> opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll
>> 
>> then it works….
>> 
>> Any idea?
>> 
>> Thanks.
>> 
>> Stéphane Letz
> 
> Hi Stephane,
> 
> Move the alloca for “i" into the entry block.
> 
> The IR coming into the loop vectorizer looks something like the following.
The loop vectorizer can't recognize one of the phis as an induction or
reduction, so it gives up.
> 
> The reason why you have this “odd” phi is because SROA (which transforms
allocas into SSA variables) does not  get rid of the “i” variable (later passes
do but leave this odd IR around) because “i”’s alloca is not in the entry block
- it only works on allocas in the entry block.
> 
> opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll
> 
> LV: Found a loop: code_block8
> LV: Found an induction variable.
> LV: PHI is not a poly recurrence.
> LV: Found an unidentified PHI.  %storemerge8 = phi i32 [ 0,
%code_block8.lr.ph ], [ %next_index, %code_block8 ]
> LV: Can't vectorize the instructions or CFG
> LV: Not vectorizing.
> 
> IR coming into the vectorizer:
> 
> code_block8:                                      ; preds =
%code_block8.lr.ph, %code_block8
>  %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [
%next_index, %code_block8 ]
>  %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index,
%code_block8 ]            ; <<< THIS phi is the problem.
>  %20 = sext i32 %storemerge8 to i64
>  %.sum = add i64 %20, %9
>  %21 = getelementptr inbounds float* %11, i64 %.sum
>  %22 = getelementptr inbounds float* %8, i64 %.sum
>  %23 = load float* %22, align 4
>  %24 = getelementptr inbounds float* %10, i64 %.sum
>  %25 = load float* %24, align 4
>  %26 = fadd float %23, %25
>  store float %26, float* %21, align 4
>  %next_index = add i32 %next_index10, 1
>  %27 = icmp slt i32 %next_index, %16
>  br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge
> 
> exec_block.return_crit_edge:                      ; preds = %exit_block6
>  br label %return
> 
> return:                                           ; preds =
%exec_block.return_crit_edge, %block_code
>  ret void
> }
> 
1) "entry" block is the first block of the function right?

2) do you mean *all* "alloca" in a function always have to be in the
fist entry block?

Thanks.

Stéphane

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jul 2013 - [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Possibly Parallel Threads