search for: prehead

Displaying 20 results from an estimated 414 matches for "prehead".

Did you mean: pread
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19,...
2012 May 04
0
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
...riggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 * %n)}<%for.cond4.preheader>,+,6}<%for.cond7.preheader> This expression is not straight forward because llvm always fold the loop invariant in the AddExpr into the AddRecExpr. If I understand the AddRecExpr correctly, the above SCEV is equivalent to: (5 + ((3 + %n) * %n)) + (...
2019 Oct 30
2
How to make ScalarEvolution recompute SCEV values?
...r the unknown SCEVs, or, is there a way to re-run ScalarEvolution and LoopInfo analysis pass during my pass? This is my current CloneLoop function: Loop *cloneLoop(Function *F, Loop *L, LoopInfo *LI, const Twine &NameSuffix, ValueToValueMapTy &VMap) { // original preheader of the loop const auto PreHeader = L->getLoopPreheader(); // keep track of the original predecessors std::set<BasicBlock *> AllPredecessors; for (auto PredIt = pred_begin(PreHeader), E = pred_end(PreHeader); PredIt != E; PredIt++) AllPredecessors.ins...
2019 May 28
6
Making loop guards part of canonical loop structure
...eed to handle both guarded loops and non-guarded loops. For example, the current loop fusion pass needs to check whether two loops are control flow equivalent before fusing them (i.e., if the first loop executes, the second loop is guaranteed to execute). This is currently done by checking that the preheader of the first loop dominates the preheader of the second loop, and the preheader of the second loop post-dominates the preheader of the first loop. When one (or both) of the loops have a guard, then this check no longer works. If the loop guard was part of the canonical form, then this check could...
2017 Jan 20
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...
2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
...r = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 >> ; Function Attrs: nounwind uwtable >> define i32 @main(i32 %argc, i8** nocapture readonly %argv) { >> entry: >> %cmp = icmp eq i32 %argc, 2 >> br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph >> cond.end: >> %arrayidx = getelementptr inbounds i8** %argv, i64 1 >> %0 = load i8** %arrayidx, align 8 >> %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, >> ...)*)(i8* %0) #3 >> %cmp117 = icmp sgt i32 %call, 0 >>...
2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
...n as: > > @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 > ; Function Attrs: nounwind uwtable > define i32 @main(i32 %argc, i8** nocapture readonly %argv) { > entry: > %cmp = icmp eq i32 %argc, 2 > br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph > cond.end: > %arrayidx = getelementptr inbounds i8** %argv, i64 1 > %0 = load i8** %arrayidx, align 8 > %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, > ...)*)(i8* %0) #3 > %cmp117 = icmp sgt i32 %call, 0 > br i1 %cmp117, label %for....
2012 May 04
3
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
...or (long int j = 0; j < n; j++) ** for (long int k = 0; k < n; k++) ** **A[1 + 2*i][3 + 4*j][5 + 6*k] = 0;* } we'll see *%arrayidx12 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 %add109, i64 %add88, i64 %add* *store i64 0, i64* %arrayidx12, align 8* *{1,+,2}<%for.cond1.preheader>* *{3,+,4}<%for.cond4.preheader>* *{5,+,6}<%for.body6>* which looks great; 3 simple indices, no problem. But consider this: *void z2(long int n, long int A[][n][n][100][100]) {* * for (long int i = 0; i < n; i++)* * for (long int j = 0; j < n; j++)* * for (long in...
2015 Sep 03
2
[RFC] New pass: LoopExitValues
...evgep values that already exist. *** Code after LSR *** ; Function Attrs: nounwind optsize define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture readonly %Src, i32 %Val) #0 { entry: %cmp.25 = icmp eq i32 %Size, 0 br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph.preheader for.body.4.lr.ph.preheader: ; preds = %entry %0 = shl i32 %Size, 2 br label %for.body.4.lr.ph for.body.4.lr.ph: ; preds = %for.body.4.lr.ph.preheader, %for.cond.cleanup.3 %lsr.iv5 = phi i32* [ %Src, %for.body.4.lr.ph.preheader ], [ %2,...
2013 Aug 15
4
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
...by "clang -O1") is shown as: @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** nocapture readonly %argv) { entry: %cmp = icmp eq i32 %argc, 2 br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph cond.end: %arrayidx = getelementptr inbounds i8** %argv, i64 1 %0 = load i8** %arrayidx, align 8 %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, ...)*)(i8* %0) #3 %cmp117 = icmp sgt i32 %call, 0 br i1 %cmp117, label...
2012 Dec 31
3
[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?
...t;161.i": ; preds = %"160.i", %"159.i" call void bitcast (void (...)* @_gfortran_cpu_time_4 to void (float*)*)(float* %start.i) nounwind %204 = load i32* %ns.i, align 4 %205 = icmp sgt i32 %204, 0 br i1 %205, label %"162.preheader.i", label %"170.i" "162.preheader.i": ; preds = %"161.i" %206 = bitcast i8* %x.0.0.i to float* %207 = add i64 %y.3.2.0.0.i, %y.3.1.0.0.i %208 = bitcast i8* %142 to float* %.pre.i = load i32* %ny.i, align 4 %209 = icmp sg...
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
...addr constant [4 x i8] c"%d\0A\00", align 1 >>> ; Function Attrs: nounwind uwtable >>> define i32 @main(i32 %argc, i8** nocapture readonly %argv) { >>> entry: >>> %cmp = icmp eq i32 %argc, 2 >>> br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph >>> cond.end: >>> %arrayidx = getelementptr inbounds i8** %argv, i64 1 >>> %0 = load i8** %arrayidx, align 8 >>> %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 >>> (i8*, >>> ...)*)(i8* %0) #3 >>> %cmp...
2019 Aug 26
2
SCEV related question
...} Here is the IR before the pass where I expect SCEV to return trip-count value ; Function Attrs: nofree norecurse nounwind uwtable writeonly define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr #0 { entry: %cmp3 = icmp ult i64 %i, 16 br i1 %cmp3, label %for.body.preheader, label %for.end for.body.preheader: ; preds = %entry br label %for.body for.body: ; preds = %for.body.preheader, %for.body %i.addr.04 = phi i64 [ %inc, %for.body ], [ %i, %for.body.preheader ] %arrayidx = getelemen...
2017 Apr 13
3
Question on induction variable simplification pass
...We are mainly interested in the backedge taken count of the inner loop. Before indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is {-2,+,1}<nsw><%for.cond1.preheader> max backedge-taken count is 37 After indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is (-1 + (zext i32 {-1,+,1}<nsw><%for.cond1.preheader> to i6...
2016 Feb 24
4
Oddity w/MachineBlockPlacement and Loops
...he code might be able to give some guidance. Fair warning, I'm trying to describe a problem in code I don't really understand, so if something doesn't make sense, assume I misunderstood something. The problematic case I'm seeing is that cold blocks are being placed between the preheader and header of a hot loop. This has the result of adding a bunch of cold code spread through out the code rather than grouped all together at the end of the function. From what I can tell tracing through the code, the critical decision that goes wrong is when we're visiting the preheader...
2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...
2019 May 30
2
Making loop guards part of canonical loop structure
.... If the loop's latch branch alone handles the iteration count, including the possibility of 0, then the loop cannot be converted to a hardware loop, because hardware loops must iterate at least once. If the entire loop is guarded against zero iteration count, we can put the loop setup in the preheader, since at that point the loop is guaranteed to execute at least once. I am strongly in favor of having some way to create loop guards, even if they are trivial. -- Krzysztof Parzyszek  kparzysz at quicinc.com   LLVM compiler development -----Original Message----- From: llvm-dev <llvm-dev-b...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:
2016 Jun 28
2
Instruction selection problem with type i64 - mistaken as v8i64?
...LLVM back end with the Mips MSA vector extensions (from the Mips back end) I have encountered an error when compiling with llc: the instruction selector uses a vector register instead of a scalar register with type i64 . I have the following part of LLVM IR program: vector.body.preheader: ; preds = %min.iters.checked br label %vector.body vector.body: ; preds = %vector.body.preheader, %vector.body %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] %vec.ph...
2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...