thr3ads.net - search: "preheaders"

Displaying 20 results from an estimated 414 matches for "preheaders".

Did you mean: preheader

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

Hi Preston, On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *

How to make ScalarEvolution recompute SCEV values?

2019 Oct 30

How to make ScalarEvolution recompute SCEV values?

Hello all, I’m pretty new to LLVM. I'm writing a pass for loop optimization. I clone and rearrange loops, setting the cloned loop as the original loop’s parent. This can be done multiple times, until there is no more work to do. The trouble is, after the first time I do this, the cloned loop's SCEVs become unknown types when they should be AddRecExpr. If I re-run the whole pass on the

Making loop guards part of canonical loop structure

2019 May 28

Making loop guards part of canonical loop structure

...trivial guard will be created. I have not looked at the implementation of this extensively yet because I wanted feedback on this direction first. However, my initial thought is to modify LoopSimplify to add the concept of a loop guard, and provide similar guarantees that it currently provides for preheaders and exit blocks. Specifically, if a guard block is not found, one is created and inserted immediately before the loop preheader to guarantee a structure similar to the one in the example above. Note that as with the preheader and exit blocks, it is possible that subsequent passes (e.g., SimplifyCFG...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 20

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi, We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen) are caused by changes [rL292492] in InstCombine: https://reviews.llvm.org/D28406

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

Is there any chance of replacing/extending the GEP instruction? As noted in the GEP FAQ, GEPs don't support variable-length arrays; when the front ends have to support VLAs, they linearize the subscript expressions, throwing away information. The FAQ suggests that folks interested in writing an analysis that understands array indices (I'm thinking of dependence analysis) should be

[RFC] New pass: LoopExitValues

2015 Sep 03

[RFC] New pass: LoopExitValues

On Wed, Sep 2, 2015 at 5:36 AM, James Molloy <james at jamesmolloy.co.uk> wrote: > Hi, > > Coremark really isn't a good enough test - have you run the LLVM test suite > with this patch, and what were the performance differences? For the test suite single source benches, the 235 tests improved performance, 2 regressed and 705 were unchanged. That seems very optimistic.

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces

[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?

2012 Dec 31

[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?

Dear all, In our compiler we use a modified version LLVM Polly, which is very sensitive to proper code generation. Among the number of limitations, the loop region (enclosed by phi node on induction variable and branch) is required to be free of additional memory-dependent branches. In other words, there must be no conditional "br" instructions below phi nodes. The problem we are facing

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at

SCEV related question

2019 Aug 26

SCEV related question

Here is original C code: void topup(int a[], unsigned long i) { for (; i < 16; i++) { a[i] = 1; } } Here is the IR before the pass where I expect SCEV to return trip-count value ; Function Attrs: nofree norecurse nounwind uwtable writeonly define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr #0 { entry: %cmp3 = icmp ult i64 %i, 16 br i1

Question on induction variable simplification pass

2017 Apr 13

Question on induction variable simplification pass

Hi all, It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution's analysis conservative which can lead to missed performance opportunities. For example, consider this loopnest- int i, j; for(i=0; i< 40; i++) for(j=0;

Oddity w/MachineBlockPlacement and Loops

2016 Feb 24

Oddity w/MachineBlockPlacement and Loops

I'm getting some odd behavior out of MBP and was hoping someone knowledge of the code might be able to give some guidance. Fair warning, I'm trying to describe a problem in code I don't really understand, so if something doesn't make sense, assume I misunderstood something. The problematic case I'm seeing is that cold blocks are being placed between the preheader and

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay, The benchmark source file: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux Opt options: opt -O3

Making loop guards part of canonical loop structure

2019 May 30

Making loop guards part of canonical loop structure

...gt; > I have not looked at the implementation of this extensively yet > because I wanted feedback on this direction first. However, my initial > thought is to modify LoopSimplify to add the concept of a loop guard, > and provide similar guarantees that it currently provides for > preheaders and exit blocks. Specifically, if a guard block is not > found, one is created and inserted immediately before the loop > preheader to guarantee a structure similar to the one in the example > above. Note that as with the preheader and exit blocks, it is possible > that subsequent p...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

Instruction selection problem with type i64 - mistaken as v8i64?

2016 Jun 28

Instruction selection problem with type i64 - mistaken as v8i64?

Hello. I am writing a back end in which I combined the existing BPF LLVM back end with the Mips MSA vector extensions (from the Mips back end) I have encountered an error when compiling with llc: the instruction selector uses a vector register instead of a scalar register with type i64 . I have the following part of LLVM IR program: vector.body.preheader:

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Thank you for information. I’ll build clang without the hack and re-run the benchmark tomorrow. -Evgeny From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 8:00 PM To: Evgeny Astigeevich Cc: llvm-dev; nd Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines > Do you mean to

search for: preheaders