Stéphane Letz
2013-Jul-05 14:50 UTC
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit :> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >> Hi, >> >> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >> >> Any idea of what could be lacking? > > Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. > > If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. > > Cheers, > Tobias >I did some progress: 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll then it works…. Any idea? Thanks. Stéphane Letz
Arnold Schwaighofer
2013-Jul-05 15:23 UTC
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote:> > Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : > >> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>> Hi, >>> >>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >>> >>> Any idea of what could be lacking? >> >> Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. >> >> If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. >> >> Cheers, >> Tobias >> > > > I did some progress: > > 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? > > 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: > > opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll > > opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll > > then it works…. > > Any idea? > > Thanks. > > Stéphane LetzHi Stephane, Move the alloca for “i" into the entry block. The IR coming into the loop vectorizer looks something like the following. The loop vectorizer can't recognize one of the phis as an induction or reduction, so it gives up. The reason why you have this “odd” phi is because SROA (which transforms allocas into SSA variables) does not get rid of the “i” variable (later passes do but leave this odd IR around) because “i”’s alloca is not in the entry block - it only works on allocas in the entry block. opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll LV: Found a loop: code_block8 LV: Found an induction variable. LV: PHI is not a poly recurrence. LV: Found an unidentified PHI. %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] LV: Can't vectorize the instructions or CFG LV: Not vectorizing. IR coming into the vectorizer: code_block8: ; preds = %code_block8.lr.ph, %code_block8 %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index, %code_block8 ] %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] ; <<< THIS phi is the problem. %20 = sext i32 %storemerge8 to i64 %.sum = add i64 %20, %9 %21 = getelementptr inbounds float* %11, i64 %.sum %22 = getelementptr inbounds float* %8, i64 %.sum %23 = load float* %22, align 4 %24 = getelementptr inbounds float* %10, i64 %.sum %25 = load float* %24, align 4 %26 = fadd float %23, %25 store float %26, float* %21, align 4 %next_index = add i32 %next_index10, 1 %27 = icmp slt i32 %next_index, %16 br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge exec_block.return_crit_edge: ; preds = %exit_block6 br label %return return: ; preds = %exec_block.return_crit_edge, %block_code ret void }
Stéphane Letz
2013-Jul-05 15:43 UTC
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Le 5 juil. 2013 à 17:23, Arnold Schwaighofer <aschwaighofer at apple.com> a écrit :> > On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote: > >> >> Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : >> >>> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>>> Hi, >>>> >>>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work. >>>> >>>> Any idea of what could be lacking? >>> >>> Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities. >>> >>> If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help. >>> >>> Cheers, >>> Tobias >>> >> >> >> I did some progress: >> >> 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ? >> >> 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug -S m1.ll -o m2.ll", but if I do in "two steps" like: >> >> opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll >> >> opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll >> >> then it works…. >> >> Any idea? >> >> Thanks. >> >> Stéphane Letz > > Hi Stephane, > > Move the alloca for “i" into the entry block. > > The IR coming into the loop vectorizer looks something like the following. The loop vectorizer can't recognize one of the phis as an induction or reduction, so it gives up. > > The reason why you have this “odd” phi is because SROA (which transforms allocas into SSA variables) does not get rid of the “i” variable (later passes do but leave this odd IR around) because “i”’s alloca is not in the entry block - it only works on allocas in the entry block. > > opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll > > LV: Found a loop: code_block8 > LV: Found an induction variable. > LV: PHI is not a poly recurrence. > LV: Found an unidentified PHI. %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] > LV: Can't vectorize the instructions or CFG > LV: Not vectorizing. > > IR coming into the vectorizer: > > code_block8: ; preds = %code_block8.lr.ph, %code_block8 > %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index, %code_block8 ] > %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ] ; <<< THIS phi is the problem. > %20 = sext i32 %storemerge8 to i64 > %.sum = add i64 %20, %9 > %21 = getelementptr inbounds float* %11, i64 %.sum > %22 = getelementptr inbounds float* %8, i64 %.sum > %23 = load float* %22, align 4 > %24 = getelementptr inbounds float* %10, i64 %.sum > %25 = load float* %24, align 4 > %26 = fadd float %23, %25 > store float %26, float* %21, align 4 > %next_index = add i32 %next_index10, 1 > %27 = icmp slt i32 %next_index, %16 > br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge > > exec_block.return_crit_edge: ; preds = %exit_block6 > br label %return > > return: ; preds = %exec_block.return_crit_edge, %block_code > ret void > } >1) "entry" block is the first block of the function right? 2) do you mean *all* "alloca" in a function always have to be in the fist entry block? Thanks. Stéphane
Maybe Matching Threads
- [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
- [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
- [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
- [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
- [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR