thr3ads.net - llvm dev - [LLVMdev] Auto-vectorization and phi nodes [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Vesa Norilo

2013-Feb-19 10:40 UTC

[LLVMdev] Auto-vectorization and phi nodes

Hi all,

Sorry if this is a dumb or FAQ or the wrong list!

I'm currently investigating LLVM vectorization of my generated code. My 
codegen emits a lot of recursions that step through arrays via pointers. 
The recursions are nicely optimized into loops, but the loop 
vectorization can't seem to work on them because of phi nodes that point 
to gep nodes.

Some simple IR to demonstrate; it vectorizes nicely with opt -O3 
-vectorize-loops -force-vector-width until I uncomment the phi/gep nodes.

define void @add_vector(float* noalias %a, float* noalias %b, float* 
noalias %c, i32 %num)
{
Top:
         br label %Loop
Loop:
         %i = phi i32 [0,%Top],[%i.next,%Loop]

;        phi and gep - won't vectorize
;        %a.ptr = phi float* [%a,%Top],[%a.next,%Loop]
;        %b.ptr = phi float* [%b,%Top],[%b.next,%Loop]
;        %c.ptr = phi float* [%c,%Top],[%c.next,%Loop]

;        %a.next = getelementptr float* %a.ptr, i32 1
;        %b.next = getelementptr    float* %b.ptr, i32 1
;        %c.next = getelementptr float* %c.ptr, i32 1

;        induction variable as index - will vectorize
         %a.ptr = getelementptr float* %a, i32 %i
         %b.ptr = getelementptr float* %b, i32 %i
         %c.ptr = getelementptr float* %c, i32 %i

         %a.val = load float* %a.ptr
         %b.val = load float* %b.ptr
         %sum = fadd float %a.val, %b.val
         store float %sum, float* %c.ptr

         %i.next = add i32 %i, 1
         %more = icmp slt i32 %i.next, %num
         br i1 %more, label %Loop, label %End
End:
         ret void
}

So it seems that the loop vectorizer would like the pointer stepping to 
be converted to base+index. However as expected, clang doesn't care 
whether C code is written as pointer arithmetic or table index.

Is there a pass that converts simple pointer arithmetic to base+index? 
If not, should I write one (shouldn't be too hard for my limited use 
case) or try to emit more vector-friendly code from the front end?

Thanks a bunch!
Vesa Norilo

Hal Finkel

2013-Feb-19 15:51 UTC

head link

[LLVMdev] Auto-vectorization and phi nodes

----- Original Message -----> From: "Vesa Norilo" <vnorilo at siba.fi>
> To: llvmdev at cs.uiuc.edu
> Sent: Tuesday, February 19, 2013 4:40:26 AM
> Subject: [LLVMdev] Auto-vectorization and phi nodes
> 
> Hi all,
> 
> Sorry if this is a dumb or FAQ or the wrong list!
> 
> I'm currently investigating LLVM vectorization of my generated code.
> My
> codegen emits a lot of recursions that step through arrays via
> pointers.
> The recursions are nicely optimized into loops, but the loop
> vectorization can't seem to work on them because of phi nodes that
> point
> to gep nodes.
> 
> Some simple IR to demonstrate; it vectorizes nicely with opt -O3
> -vectorize-loops -force-vector-width until I uncomment the phi/gep
> nodes.
> 
> define void @add_vector(float* noalias %a, float* noalias %b, float*
> noalias %c, i32 %num)
> {
> Top:
>          br label %Loop
> Loop:
>          %i = phi i32 [0,%Top],[%i.next,%Loop]
> 
> ;        phi and gep - won't vectorize
> ;        %a.ptr = phi float* [%a,%Top],[%a.next,%Loop]
> ;        %b.ptr = phi float* [%b,%Top],[%b.next,%Loop]
> ;        %c.ptr = phi float* [%c,%Top],[%c.next,%Loop]
> 
> ;        %a.next = getelementptr float* %a.ptr, i32 1
> ;        %b.next = getelementptr    float* %b.ptr, i32 1
> ;        %c.next = getelementptr float* %c.ptr, i32 1
> 
> ;        induction variable as index - will vectorize
>          %a.ptr = getelementptr float* %a, i32 %i
>          %b.ptr = getelementptr float* %b, i32 %i
>          %c.ptr = getelementptr float* %c, i32 %i
> 
>          %a.val = load float* %a.ptr
>          %b.val = load float* %b.ptr
>          %sum = fadd float %a.val, %b.val
>          store float %sum, float* %c.ptr
> 
>          %i.next = add i32 %i, 1
>          %more = icmp slt i32 %i.next, %num
>          br i1 %more, label %Loop, label %End
> End:
>          ret void
> }
> 
> So it seems that the loop vectorizer would like the pointer stepping
> to
> be converted to base+index. However as expected, clang doesn't care
> whether C code is written as pointer arithmetic or table index.
> 
> Is there a pass that converts simple pointer arithmetic to
> base+index?
As I recall, loop strength reduction can do this; but that happens only very
late in the compilation process (well after vectorization). It would probably be
better to update the loop vectorizer to deal with this directly. Nadav?

 -Hal
> If not, should I write one (shouldn't be too hard for my limited use
> case) or try to emit more vector-friendly code from the front end?
> 
> Thanks a bunch!
> Vesa Norilo
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Nadav Rotem

2013-Feb-19 17:22 UTC

head link

[LLVMdev] Auto-vectorization and phi nodes

Hi Vesa, 

The pass IndVars changes the induction variables to allow SCEV to analyze them
and enable other optimizations. This is the canonicalization phase.  Later on,
LSR lowers the canonicalized induction variables to induction variables that map
nicely to the target's addressing modes. In many cases it can remove some of
the induction variables.

I suspect that the loop vectorizer does not vectorize the code because SCEV
fails to detect the induction variable.   Can you run the loop vectorizer with
the '-debug' option and check why it fails ?

Thanks,
Nadav 


On Feb 19, 2013, at 7:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Vesa Norilo" <vnorilo at siba.fi>
>> To: llvmdev at cs.uiuc.edu
>> Sent: Tuesday, February 19, 2013 4:40:26 AM
>> Subject: [LLVMdev] Auto-vectorization and phi nodes
>> 
>> Hi all,
>> 
>> Sorry if this is a dumb or FAQ or the wrong list!
>> 
>> I'm currently investigating LLVM vectorization of my generated
code.
>> My
>> codegen emits a lot of recursions that step through arrays via
>> pointers.
>> The recursions are nicely optimized into loops, but the loop
>> vectorization can't seem to work on them because of phi nodes that
>> point
>> to gep nodes.
>> 
>> Some simple IR to demonstrate; it vectorizes nicely with opt -O3
>> -vectorize-loops -force-vector-width until I uncomment the phi/gep
>> nodes.
>> 
>> define void @add_vector(float* noalias %a, float* noalias %b, float*
>> noalias %c, i32 %num)
>> {
>> Top:
>>         br label %Loop
>> Loop:
>>         %i = phi i32 [0,%Top],[%i.next,%Loop]
>> 
>> ;        phi and gep - won't vectorize
>> ;        %a.ptr = phi float* [%a,%Top],[%a.next,%Loop]
>> ;        %b.ptr = phi float* [%b,%Top],[%b.next,%Loop]
>> ;        %c.ptr = phi float* [%c,%Top],[%c.next,%Loop]
>> 
>> ;        %a.next = getelementptr float* %a.ptr, i32 1
>> ;        %b.next = getelementptr    float* %b.ptr, i32 1
>> ;        %c.next = getelementptr float* %c.ptr, i32 1
>> 
>> ;        induction variable as index - will vectorize
>>         %a.ptr = getelementptr float* %a, i32 %i
>>         %b.ptr = getelementptr float* %b, i32 %i
>>         %c.ptr = getelementptr float* %c, i32 %i
>> 
>>         %a.val = load float* %a.ptr
>>         %b.val = load float* %b.ptr
>>         %sum = fadd float %a.val, %b.val
>>         store float %sum, float* %c.ptr
>> 
>>         %i.next = add i32 %i, 1
>>         %more = icmp slt i32 %i.next, %num
>>         br i1 %more, label %Loop, label %End
>> End:
>>         ret void
>> }
>> 
>> So it seems that the loop vectorizer would like the pointer stepping
>> to
>> be converted to base+index. However as expected, clang doesn't care
>> whether C code is written as pointer arithmetic or table index.
>> 
>> Is there a pass that converts simple pointer arithmetic to
>> base+index?
> 
> As I recall, loop strength reduction can do this; but that happens only
very late in the compilation process (well after vectorization). It would
probably be better to update the loop vectorizer to deal with this directly.
Nadav?
> 
> -Hal
> 
>> If not, should I write one (shouldn't be too hard for my limited
use
>> case) or try to emit more vector-friendly code from the front end?
>> 
>> Thanks a bunch!
>> Vesa Norilo
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Feb 2013 - [LLVMdev] Auto-vectorization and phi nodes

[LLVMdev] Auto-vectorization and phi nodes

[LLVMdev] Auto-vectorization and phi nodes

[LLVMdev] Auto-vectorization and phi nodes

Maybe Matching Threads