Hi all, Sorry if this is a dumb or FAQ or the wrong list! I'm currently investigating LLVM vectorization of my generated code. My codegen emits a lot of recursions that step through arrays via pointers. The recursions are nicely optimized into loops, but the loop vectorization can't seem to work on them because of phi nodes that point to gep nodes. Some simple IR to demonstrate; it vectorizes nicely with opt -O3 -vectorize-loops -force-vector-width until I uncomment the phi/gep nodes. define void @add_vector(float* noalias %a, float* noalias %b, float* noalias %c, i32 %num) { Top: br label %Loop Loop: %i = phi i32 [0,%Top],[%i.next,%Loop] ; phi and gep - won't vectorize ; %a.ptr = phi float* [%a,%Top],[%a.next,%Loop] ; %b.ptr = phi float* [%b,%Top],[%b.next,%Loop] ; %c.ptr = phi float* [%c,%Top],[%c.next,%Loop] ; %a.next = getelementptr float* %a.ptr, i32 1 ; %b.next = getelementptr float* %b.ptr, i32 1 ; %c.next = getelementptr float* %c.ptr, i32 1 ; induction variable as index - will vectorize %a.ptr = getelementptr float* %a, i32 %i %b.ptr = getelementptr float* %b, i32 %i %c.ptr = getelementptr float* %c, i32 %i %a.val = load float* %a.ptr %b.val = load float* %b.ptr %sum = fadd float %a.val, %b.val store float %sum, float* %c.ptr %i.next = add i32 %i, 1 %more = icmp slt i32 %i.next, %num br i1 %more, label %Loop, label %End End: ret void } So it seems that the loop vectorizer would like the pointer stepping to be converted to base+index. However as expected, clang doesn't care whether C code is written as pointer arithmetic or table index. Is there a pass that converts simple pointer arithmetic to base+index? If not, should I write one (shouldn't be too hard for my limited use case) or try to emit more vector-friendly code from the front end? Thanks a bunch! Vesa Norilo
----- Original Message -----> From: "Vesa Norilo" <vnorilo at siba.fi> > To: llvmdev at cs.uiuc.edu > Sent: Tuesday, February 19, 2013 4:40:26 AM > Subject: [LLVMdev] Auto-vectorization and phi nodes > > Hi all, > > Sorry if this is a dumb or FAQ or the wrong list! > > I'm currently investigating LLVM vectorization of my generated code. > My > codegen emits a lot of recursions that step through arrays via > pointers. > The recursions are nicely optimized into loops, but the loop > vectorization can't seem to work on them because of phi nodes that > point > to gep nodes. > > Some simple IR to demonstrate; it vectorizes nicely with opt -O3 > -vectorize-loops -force-vector-width until I uncomment the phi/gep > nodes. > > define void @add_vector(float* noalias %a, float* noalias %b, float* > noalias %c, i32 %num) > { > Top: > br label %Loop > Loop: > %i = phi i32 [0,%Top],[%i.next,%Loop] > > ; phi and gep - won't vectorize > ; %a.ptr = phi float* [%a,%Top],[%a.next,%Loop] > ; %b.ptr = phi float* [%b,%Top],[%b.next,%Loop] > ; %c.ptr = phi float* [%c,%Top],[%c.next,%Loop] > > ; %a.next = getelementptr float* %a.ptr, i32 1 > ; %b.next = getelementptr float* %b.ptr, i32 1 > ; %c.next = getelementptr float* %c.ptr, i32 1 > > ; induction variable as index - will vectorize > %a.ptr = getelementptr float* %a, i32 %i > %b.ptr = getelementptr float* %b, i32 %i > %c.ptr = getelementptr float* %c, i32 %i > > %a.val = load float* %a.ptr > %b.val = load float* %b.ptr > %sum = fadd float %a.val, %b.val > store float %sum, float* %c.ptr > > %i.next = add i32 %i, 1 > %more = icmp slt i32 %i.next, %num > br i1 %more, label %Loop, label %End > End: > ret void > } > > So it seems that the loop vectorizer would like the pointer stepping > to > be converted to base+index. However as expected, clang doesn't care > whether C code is written as pointer arithmetic or table index. > > Is there a pass that converts simple pointer arithmetic to > base+index?As I recall, loop strength reduction can do this; but that happens only very late in the compilation process (well after vectorization). It would probably be better to update the loop vectorizer to deal with this directly. Nadav? -Hal> If not, should I write one (shouldn't be too hard for my limited use > case) or try to emit more vector-friendly code from the front end? > > Thanks a bunch! > Vesa Norilo > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Vesa, The pass IndVars changes the induction variables to allow SCEV to analyze them and enable other optimizations. This is the canonicalization phase. Later on, LSR lowers the canonicalized induction variables to induction variables that map nicely to the target's addressing modes. In many cases it can remove some of the induction variables. I suspect that the loop vectorizer does not vectorize the code because SCEV fails to detect the induction variable. Can you run the loop vectorizer with the '-debug' option and check why it fails ? Thanks, Nadav On Feb 19, 2013, at 7:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> From: "Vesa Norilo" <vnorilo at siba.fi> >> To: llvmdev at cs.uiuc.edu >> Sent: Tuesday, February 19, 2013 4:40:26 AM >> Subject: [LLVMdev] Auto-vectorization and phi nodes >> >> Hi all, >> >> Sorry if this is a dumb or FAQ or the wrong list! >> >> I'm currently investigating LLVM vectorization of my generated code. >> My >> codegen emits a lot of recursions that step through arrays via >> pointers. >> The recursions are nicely optimized into loops, but the loop >> vectorization can't seem to work on them because of phi nodes that >> point >> to gep nodes. >> >> Some simple IR to demonstrate; it vectorizes nicely with opt -O3 >> -vectorize-loops -force-vector-width until I uncomment the phi/gep >> nodes. >> >> define void @add_vector(float* noalias %a, float* noalias %b, float* >> noalias %c, i32 %num) >> { >> Top: >> br label %Loop >> Loop: >> %i = phi i32 [0,%Top],[%i.next,%Loop] >> >> ; phi and gep - won't vectorize >> ; %a.ptr = phi float* [%a,%Top],[%a.next,%Loop] >> ; %b.ptr = phi float* [%b,%Top],[%b.next,%Loop] >> ; %c.ptr = phi float* [%c,%Top],[%c.next,%Loop] >> >> ; %a.next = getelementptr float* %a.ptr, i32 1 >> ; %b.next = getelementptr float* %b.ptr, i32 1 >> ; %c.next = getelementptr float* %c.ptr, i32 1 >> >> ; induction variable as index - will vectorize >> %a.ptr = getelementptr float* %a, i32 %i >> %b.ptr = getelementptr float* %b, i32 %i >> %c.ptr = getelementptr float* %c, i32 %i >> >> %a.val = load float* %a.ptr >> %b.val = load float* %b.ptr >> %sum = fadd float %a.val, %b.val >> store float %sum, float* %c.ptr >> >> %i.next = add i32 %i, 1 >> %more = icmp slt i32 %i.next, %num >> br i1 %more, label %Loop, label %End >> End: >> ret void >> } >> >> So it seems that the loop vectorizer would like the pointer stepping >> to >> be converted to base+index. However as expected, clang doesn't care >> whether C code is written as pointer arithmetic or table index. >> >> Is there a pass that converts simple pointer arithmetic to >> base+index? > > As I recall, loop strength reduction can do this; but that happens only very late in the compilation process (well after vectorization). It would probably be better to update the loop vectorizer to deal with this directly. Nadav? > > -Hal > >> If not, should I write one (shouldn't be too hard for my limited use >> case) or try to emit more vector-friendly code from the front end? >> >> Thanks a bunch! >> Vesa Norilo >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>