thr3ads.net - search: "prehead"

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19,...

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

0

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

...riggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 * %n)}<%for.cond4.preheader>,+,6}<%for.cond7.preheader> This expression is not straight forward because llvm always fold the loop invariant in the AddExpr into the AddRecExpr. If I understand the AddRecExpr correctly, the above SCEV is equivalent to: (5 + ((3 + %n) * %n)) + (...

How to make ScalarEvolution recompute SCEV values?

2019 Oct 30

2

How to make ScalarEvolution recompute SCEV values?

...r the unknown SCEVs, or, is there a way to re-run ScalarEvolution and LoopInfo analysis pass during my pass? This is my current CloneLoop function: Loop *cloneLoop(Function *F, Loop *L, LoopInfo *LI, const Twine &NameSuffix, ValueToValueMapTy &VMap) { // original preheader of the loop const auto PreHeader = L->getLoopPreheader(); // keep track of the original predecessors std::set<BasicBlock *> AllPredecessors; for (auto PredIt = pred_begin(PreHeader), E = pred_end(PreHeader); PredIt != E; PredIt++) AllPredecessors.ins...

Making loop guards part of canonical loop structure

2019 May 28

6

Making loop guards part of canonical loop structure

...eed to handle both guarded loops and non-guarded loops. For example, the current loop fusion pass needs to check whether two loops are control flow equivalent before fusing them (i.e., if the first loop executes, the second loop is guaranteed to execute). This is currently done by checking that the preheader of the first loop dominates the preheader of the second loop, and the preheader of the second loop post-dominates the preheader of the first loop. When one (or both) of the loops have a guard, then this check no longer works. If the loop guard was part of the canonical form, then this check could...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 20

3

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

2

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...r = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 >> ; Function Attrs: nounwind uwtable >> define i32 @main(i32 %argc, i8** nocapture readonly %argv) { >> entry: >> %cmp = icmp eq i32 %argc, 2 >> br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph >> cond.end: >> %arrayidx = getelementptr inbounds i8** %argv, i64 1 >> %0 = load i8** %arrayidx, align 8 >> %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, >> ...)*)(i8* %0) #3 >> %cmp117 = icmp sgt i32 %call, 0 >>...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

0

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...n as: > > @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 > ; Function Attrs: nounwind uwtable > define i32 @main(i32 %argc, i8** nocapture readonly %argv) { > entry: > %cmp = icmp eq i32 %argc, 2 > br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph > cond.end: > %arrayidx = getelementptr inbounds i8** %argv, i64 1 > %0 = load i8** %arrayidx, align 8 > %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, > ...)*)(i8* %0) #3 > %cmp117 = icmp sgt i32 %call, 0 > br i1 %cmp117, label %for....

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

3

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

...or (long int j = 0; j < n; j++) ** for (long int k = 0; k < n; k++) ** **A[1 + 2*i][3 + 4*j][5 + 6*k] = 0;* } we'll see *%arrayidx12 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 %add109, i64 %add88, i64 %add* *store i64 0, i64* %arrayidx12, align 8* *{1,+,2}<%for.cond1.preheader>* *{3,+,4}<%for.cond4.preheader>* *{5,+,6}<%for.body6>* which looks great; 3 simple indices, no problem. But consider this: *void z2(long int n, long int A[][n][n][100][100]) {* * for (long int i = 0; i < n; i++)* * for (long int j = 0; j < n; j++)* * for (long in...

[RFC] New pass: LoopExitValues

2015 Sep 03

2

[RFC] New pass: LoopExitValues

...evgep values that already exist. *** Code after LSR *** ; Function Attrs: nounwind optsize define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture readonly %Src, i32 %Val) #0 { entry: %cmp.25 = icmp eq i32 %Size, 0 br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph.preheader for.body.4.lr.ph.preheader: ; preds = %entry %0 = shl i32 %Size, 2 br label %for.body.4.lr.ph for.body.4.lr.ph: ; preds = %for.body.4.lr.ph.preheader, %for.cond.cleanup.3 %lsr.iv5 = phi i32* [ %Src, %for.body.4.lr.ph.preheader ], [ %2,...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

4

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...by "clang -O1") is shown as: @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** nocapture readonly %argv) { entry: %cmp = icmp eq i32 %argc, 2 br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph cond.end: %arrayidx = getelementptr inbounds i8** %argv, i64 1 %0 = load i8** %arrayidx, align 8 %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, ...)*)(i8* %0) #3 %cmp117 = icmp sgt i32 %call, 0 br i1 %cmp117, label...

[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?

2012 Dec 31

3

[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?

...t;161.i": ; preds = %"160.i", %"159.i" call void bitcast (void (...)* @_gfortran_cpu_time_4 to void (float*)*)(float* %start.i) nounwind %204 = load i32* %ns.i, align 4 %205 = icmp sgt i32 %204, 0 br i1 %205, label %"162.preheader.i", label %"170.i" "162.preheader.i": ; preds = %"161.i" %206 = bitcast i8* %x.0.0.i to float* %207 = add i64 %y.3.2.0.0.i, %y.3.1.0.0.i %208 = bitcast i8* %142 to float* %.pre.i = load i32* %ny.i, align 4 %209 = icmp sg...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

0

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...addr constant [4 x i8] c"%d\0A\00", align 1 >>> ; Function Attrs: nounwind uwtable >>> define i32 @main(i32 %argc, i8** nocapture readonly %argv) { >>> entry: >>> %cmp = icmp eq i32 %argc, 2 >>> br i1 %cmp, label %cond.end, label %for.cond2.preheader.lr.ph >>> cond.end: >>> %arrayidx = getelementptr inbounds i8** %argv, i64 1 >>> %0 = load i8** %arrayidx, align 8 >>> %call = tail call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 >>> (i8*, >>> ...)*)(i8* %0) #3 >>> %cmp...

SCEV related question

2019 Aug 26

2

SCEV related question

...} Here is the IR before the pass where I expect SCEV to return trip-count value ; Function Attrs: nofree norecurse nounwind uwtable writeonly define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr #0 { entry: %cmp3 = icmp ult i64 %i, 16 br i1 %cmp3, label %for.body.preheader, label %for.end for.body.preheader: ; preds = %entry br label %for.body for.body: ; preds = %for.body.preheader, %for.body %i.addr.04 = phi i64 [ %inc, %for.body ], [ %i, %for.body.preheader ] %arrayidx = getelemen...

Question on induction variable simplification pass

2017 Apr 13

3

Question on induction variable simplification pass

...We are mainly interested in the backedge taken count of the inner loop. Before indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is {-2,+,1}<nsw><%for.cond1.preheader> max backedge-taken count is 37 After indvars, the backedge information computed by ScalarEvolution is as follows- Outer loop- backedge-taken count is 39 max backedge-taken count is 39 Inner loop- backedge-taken count is (-1 + (zext i32 {-1,+,1}<nsw><%for.cond1.preheader> to i6...

Oddity w/MachineBlockPlacement and Loops

2016 Feb 24

4

Oddity w/MachineBlockPlacement and Loops

...he code might be able to give some guidance. Fair warning, I'm trying to describe a problem in code I don't really understand, so if something doesn't make sense, assume I misunderstood something. The problematic case I'm seeing is that cold blocks are being placed between the preheader and header of a hot loop. This has the result of adding a bunch of cold code spread through out the code rather than grouped all together at the end of the function. From what I can tell tracing through the code, the critical decision that goes wrong is when we're visiting the preheader...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

2

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...

Making loop guards part of canonical loop structure

2019 May 30

2

Making loop guards part of canonical loop structure

.... If the loop's latch branch alone handles the iteration count, including the possibility of 0, then the loop cannot be converted to a hardware loop, because hardware loops must iterate at least once. If the entire loop is guarded against zero iteration count, we can put the loop setup in the preheader, since at that point the loop is guaranteed to execute at least once. I am strongly in favor of having some way to create loop guards, even if they are trivial. -- Krzysztof Parzyszek kparzysz at quicinc.com LLVM compiler development -----Original Message----- From: llvm-dev <llvm-dev-b...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

Instruction selection problem with type i64 - mistaken as v8i64?

2016 Jun 28

2

Instruction selection problem with type i64 - mistaken as v8i64?

...LLVM back end with the Mips MSA vector extensions (from the Mips back end) I have encountered an error when compiling with llc: the instruction selector uses a vector register instead of a scalar register with type i64 . I have the following part of LLVM IR program: vector.body.preheader: ; preds = %min.iters.checked br label %vector.body vector.body: ; preds = %vector.body.preheader, %vector.body %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] %vec.ph...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

2

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

...D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1" The Loop Vectorizer generates code with more instructions: ==== Loop Vectorizer from rL292492 ==== for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ] %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ] %i.119 = phi i64 [...

search for: prehead