Displaying 20 results from an estimated 414 matches for "preheaders".
Did you mean:
preheader
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code,
for.cond.preheader: ; preds = %if.end18
%mul = mul i32 %12, %3
%cmp21128 = icmp sgt i32 %mul, 0
br i1 %cmp21128, label %for.body.preheader, label %return
for.body.preheader: ; preds =
%for.cond.preheader
%19 = mul i32 %12, %3
%20 = add i32 %19, -1
%21 = zext i32 %20 to i64
%22 =
2012 May 04
0
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Hi Preston,
On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote:
>
> which produces
>
> %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64
> %arrayidx21.sum, i64 %add1411, i64 %add
> store i64 0, i64* %arrayidx24, align 8
> {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *
2019 Oct 30
2
How to make ScalarEvolution recompute SCEV values?
Hello all,
I’m pretty new to LLVM.
I'm writing a pass for loop optimization. I clone and rearrange loops, setting the cloned loop as the original loop’s parent. This can be done multiple times, until there is no more work to do. The trouble is, after the first time I do this, the cloned loop's SCEVs become unknown types when they should be AddRecExpr.
If I re-run the whole pass on the
2019 May 28
6
Making loop guards part of canonical loop structure
...trivial guard will be
created.
I have not looked at the implementation of this
extensively yet because I wanted feedback on this direction first. However, my
initial thought is to modify LoopSimplify to add the concept of a loop guard,
and provide similar guarantees that it currently provides for preheaders and exit
blocks. Specifically, if a guard block is not found, one is created and inserted
immediately before the loop preheader to guarantee a structure similar to the
one in the example above. Note that as with the preheader and exit blocks, it is
possible that subsequent passes (e.g., SimplifyCFG...
2017 Jan 20
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Hi,
We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen) are caused by changes [rL292492] in InstCombine:
https://reviews.llvm.org/D28406
2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi Sebpop,
Thanks for your explanation.
I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis?
Star Tan
At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote:
>Codeprepare and independent blocks are introducing these loads and
2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Codeprepare and independent blocks are introducing these loads and stores.
These are prepasses that polly runs prior to building the dependence graph
to transform scalar dependences into data dependences.
Ether was working on eliminating the rewrite of scalar dependences.
On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote:
> Hi all,
>
> I have investigated the
2012 May 04
3
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Is there any chance of replacing/extending the GEP instruction?
As noted in the GEP FAQ, GEPs don't support variable-length arrays; when
the front ends have to support VLAs, they linearize the subscript
expressions, throwing away information. The FAQ suggests that folks
interested in writing an analysis that understands array indices (I'm
thinking of dependence analysis) should be
2015 Sep 03
2
[RFC] New pass: LoopExitValues
On Wed, Sep 2, 2015 at 5:36 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
> Hi,
>
> Coremark really isn't a good enough test - have you run the LLVM test suite
> with this patch, and what were the performance differences?
For the test suite single source benches, the 235 tests improved
performance, 2 regressed and 705 were unchanged. That seems very
optimistic.
2013 Aug 15
4
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi all,
I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces
2012 Dec 31
3
[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?
Dear all,
In our compiler we use a modified version LLVM Polly, which is very
sensitive to proper code generation. Among the number of limitations, the
loop region (enclosed by phi node on induction variable and branch) is
required to be free of additional memory-dependent branches. In other
words, there must be no conditional "br" instructions below phi nodes. The
problem we are facing
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
I do not think that running SROA before polly is a good idea:
it would defeat the purpose of the code preparation passes that
polly intentionally schedules for the data dependence analysis.
If you remove the data references before polly runs, you would
miss them in the dependence graph: that could lead to incorrect
transforms.
On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at
2019 Aug 26
2
SCEV related question
Here is original C code:
void topup(int a[], unsigned long i) {
for (; i < 16; i++) {
a[i] = 1;
}
}
Here is the IR before the pass where I expect SCEV to return trip-count
value
; Function Attrs: nofree norecurse nounwind uwtable writeonly
define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr
#0 {
entry:
%cmp3 = icmp ult i64 %i, 16
br i1
2017 Apr 13
3
Question on induction variable simplification pass
Hi all,
It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution's analysis conservative which can lead to missed performance opportunities.
For example, consider this loopnest-
int i, j;
for(i=0; i< 40; i++)
for(j=0;
2016 Feb 24
4
Oddity w/MachineBlockPlacement and Loops
I'm getting some odd behavior out of MBP and was hoping someone
knowledge of the code might be able to give some guidance. Fair
warning, I'm trying to describe a problem in code I don't really
understand, so if something doesn't make sense, assume I misunderstood
something.
The problematic case I'm seeing is that cold blocks are being placed
between the preheader and
2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Hi Sanjay,
The benchmark source file: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup
Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux
Opt options: opt -O3
2019 May 30
2
Making loop guards part of canonical loop structure
...gt;
> I have not looked at the implementation of this extensively yet
> because I wanted feedback on this direction first. However, my initial
> thought is to modify LoopSimplify to add the concept of a loop guard,
> and provide similar guarantees that it currently provides for
> preheaders and exit blocks. Specifically, if a guard block is not
> found, one is created and inserted immediately before the loop
> preheader to guarantee a structure similar to the one in the example
> above. Note that as with the preheader and exit blocks, it is possible
> that subsequent p...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2016 Jun 28
2
Instruction selection problem with type i64 - mistaken as v8i64?
Hello.
I am writing a back end in which I combined the existing BPF LLVM back end with the
Mips MSA vector extensions (from the Mips back end)
I have encountered an error when compiling with llc: the instruction selector uses a
vector register instead of a scalar register with type i64 .
I have the following part of LLVM IR program:
vector.body.preheader:
2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Thank you for information.
I’ll build clang without the hack and re-run the benchmark tomorrow.
-Evgeny
From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Sunday, January 22, 2017 8:00 PM
To: Evgeny Astigeevich
Cc: llvm-dev; nd
Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
> Do you mean to