similar to: Question about LLVM region analysis

Displaying 20 results from an estimated 2000 matches similar to: "Question about LLVM region analysis"

2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at
2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and
2013 Aug 15
4
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
On 08/15/2013 03:32 AM, Star Tan wrote: > Hi all, Hi, I tried to reproduce your findings, but could not do so. > I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by
2011 Dec 14
0
[LLVMdev] Help with hazards
The scoreboard hazard detector that I've added for the PPC 440 is not detecting hazards as it should (which certainly could be my fault somehow, but...). For example, it will produce a schedule that looks like... SU(28): 0x127969b0: f64,ch = LFD 0x12793aa0, 0x1277b4f0, 0x127965b0<Mem:LD8[%scevgep100](tbaa=!"double")> [ORD=41] [ID=28] SU(46): 0x12796ab0: f64 = FADD 0x127969b0,
2012 Dec 10
3
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Hello all, I wanted to get some feedback on this patch for ScalarEvolution. It addresses a performance problem I am seeing for simple benchmark. Starting with this C code: 01: signed char foo(void) 02: { 03: const int count = 8000; 04: signed char result = 0; 05: int j; 06: 07: for (j = 0; j < count; ++j) { 08: result += (result_t)(3); 09: } 10: 11: return result; 12: } I
2018 Jun 29
2
Cleaning up ‘br i1 false’ cases in CodeGenPrepare
Hi, I have come across a couple of cases where the code generated after CodeGenPrepare pass has "br i1 false .." with both true and false conditions preserved and this propagates further and remains the same in the final assembly code/executable. In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which handles the br i1 false condition) is called only once and if after the
2018 May 16
0
GlobalAddress lowering strategy
I've been looking at GlobalAddress lowering strategy recently and was wondering if anyone had any ideas, insights, or experience to share. ## Background When lowering global address, which is typically done in FooTargetLowering::LowerGlobalAddress you have the option of folding in the global offset into the global address or else emitting the base globaladdress and a separate ADD node for the
2013 Jul 03
2
[LLVMdev] [Polly] Assert in Scope construction
Should have changed the subject line... --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Sergei Larin > Sent: Wednesday, July 03, 2013 12:29 PM > To: 'Tobias Grosser' > Cc: 'llvmdev'
2013 Jul 05
0
[LLVMdev] [Polly] Assert in Scope construction
Hi Sergei, On Thu, Jul 4, 2013 at 1:36 AM, Sergei Larin <slarin at codeaurora.org> wrote: > Should have changed the subject line... > > --- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by > The Linux Foundation > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
2019 Sep 24
2
An issue with "lifetime.start" and storing "undef"
Hi All, In order to confine the scope of a local variable allocated using "alloca", we had written a pass named 'undeflocal'. This helps avoid pseudo phi nodes inserted because of the path (carrying an undefined value) starting from the "entry" block. This unwanted phi instruction was preventing some of the optimizations from getting triggered later on. For example,
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8
2012 May 04
0
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Hi Preston, On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =
2019 Oct 30
2
How to make ScalarEvolution recompute SCEV values?
Hello all, I’m pretty new to LLVM. I'm writing a pass for loop optimization. I clone and rearrange loops, setting the cloned loop as the original loop’s parent. This can be done multiple times, until there is no more work to do. The trouble is, after the first time I do this, the cloned loop's SCEVs become unknown types when they should be AddRecExpr. If I re-run the whole pass on the
2012 Dec 31
3
[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?
Dear all, In our compiler we use a modified version LLVM Polly, which is very sensitive to proper code generation. Among the number of limitations, the loop region (enclosed by phi node on induction variable and branch) is required to be free of additional memory-dependent branches. In other words, there must be no conditional "br" instructions below phi nodes. The problem we are facing
2019 May 30
2
Making loop guards part of canonical loop structure
On Hexagon, unguarded loops cannot be converted to hardware loops. If the loop's latch branch alone handles the iteration count, including the possibility of 0, then the loop cannot be converted to a hardware loop, because hardware loops must iterate at least once. If the entire loop is guarded against zero iteration count, we can put the loop setup in the preheader, since at that point the
2009 May 08
2
[LLVMdev] Splitting a basic block, replacing it's terminator
Hi, I want to insert a conditional branch in the middle of a basic block. To that end, I am doing these steps: (1) Split the basic block: bb->splitBasicBlock() (2) Remove the old terminator: succ->removePredecessor(bb) bb->getTerminator()->getParent() (3) Adding a new terminator: BranchInst::Create(ifTrue, ifFalse, cnd, "", bb); That seems to work, but later passes
2015 Sep 10
2
[RFC] New pass: LoopExitValues
Which cases does this pass handle which aren't otherwise optimized out by passes like GlobalValueNumbering or DeadCodeElimination? Thanks, Jake VanAdrighem On Thu, Sep 10, 2015 at 2:35 PM, Steve King <steve at metrokings.com> wrote: > Hello LLVM, > It seems this thread has gone cold. Is there some low risk way for > the community to take the new pass for a test drive? >