thr3ads.net - similar to: "[LLVMdev] Help with hazards"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Help with hazards"

[LLVMdev] Instruction Scheduling Itineraries

2011 Oct 22

[LLVMdev] Instruction Scheduling Itineraries

On Oct 21, 2011, at 12:15 AM, James Molloy wrote: > Hi Andy, > > Could you describe how this would be done? In the current ARM itineraries > (say C-A9 for example), the superscalar issue stage is modelled as taking 1 > cycle. If it were to take 2 cycles instead, as far as I can tell the hazard > analyser would stall because both FU's would be acquired. > > I would

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

On 08/15/2013 03:32 AM, Star Tan wrote: > Hi all, Hi, I tried to reproduce your findings, but could not do so. > I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Hello, I was trying to trace the Loop vectorizer of the LLVM, I wrote a simple loop with a clear dependency. But found that the debug shows that 'we can vectorize this loop' Here you are my loop with dependency: for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; And the debug prints: LV: Checking a loop in "main" LV: Found a loop: for.body4 LV: Found an

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Notice that the code you provided, for globals and stack allocations, at least, is semantically equivalent to: int a = d[19]; for(int k = 20; k < 50; k++) dataY[k] = a; Like so, the load you see missing was redundant, probably hoisted by GVN/PRE and replaced with "%.pre". H. On Sun, Nov 3, 2013 at 11:26 AM, Sara Elshobaky <sara.elshobaky at gmail.com>wrote: >

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Hi Sarah, the loop vectorizer runs not on the C code but on LLVM IR this c code was lowered to. Before the loop vectorizer runs many other optimization change the shape of this IR. You can see in the LLVM IR you referenced below, a preceding LLVM IR transformation has change your loop from: > for(int k=20;k<50;k++) > dataY[k] = dataY[k-1]; to > int a = d[19]; >

[LLVMdev] IR optimization pass ideas for backend porting before ISel

2012 Jul 30

[LLVMdev] IR optimization pass ideas for backend porting before ISel

Hi LLVMers, I'm writing a LLVM backend for C*Core, an ISA derived from Motorola M*Core. I was wondering if someone wrote some IR level optimization passes for backend porting before ISel, such as an IR transformation from GEP to integer conversion/calculating instructions, and PHI combination. Here's the bubble sorting example. The IR codes below are changed by hand and I try to write

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Actually what I meant in my original loop, that there is a dependency between every two consecutive iterations. So, how the loop vectorizer says 'we can vectorize this loop'? for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; From: Henrique Santos [mailto:henrique.nazare.santos at gmail.com] Sent: Sunday, November 03, 2013 4:28 PM To: Sara Elshobaky Cc: <llvmdev at

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Hi Nick, Thanks for looking into it. I have tried that as well but it didn't worked. "AddExpr->getOperand(0))" node is: " (4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>" When I cast this to "SCEVAddRecExpr" it returns NULL. Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19,

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)". So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ? Is that the right way ? Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19, 2015 1:02 PM To: Nema, Ashutosh Cc:

Instruction Itineraries: question about operand latencies

2016 Jun 06

Instruction Itineraries: question about operand latencies

In our architecture loads from certain memory locations take a long time to complete (on the order of 150 clock cycles). Since we don't have a way to tell at compile time if the address being loaded from lies in slow or fast memory, I've gone ahead and made all of the load numbers high, such as: InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>]>, However, I see that

[LLVMdev] DFAPacketizer

2013 Feb 12

[LLVMdev] DFAPacketizer

Hi, I looked a bit through the mail archives, and found this question answered in Oct 2011 (see below). It is interesting to find this in the ARM backend, considering your answer. Can you give more information about for example is this a temporary deficiency in the DFAPacketizer? What is the IIC_iMOVi itinerary doing below? Thanks, Jonas Thu Oct 6 15:11:25 CDT 2011: Hello Hal. > Is there

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 31

[LLVMdev] Cast to SCEVAddRecExpr

Sorry typo in test case, Please ignore previous mail. Consider below case: for (j=1; j < itr; j++) { - - - - for (i=1; i < itr; i++) { { temp= var[i << 1]; - - - - - } } In the above example, we are unable to get "SCEVAddRecExpr" for "var[i << 1]" Its "SCEVAddRecExpr" is computable in *Outer Loop* I

DFAPacketizer, Scheduling and LoadLatency

2015 Nov 16

DFAPacketizer, Scheduling and LoadLatency

I'm unclear how does DFAPacketizer and the scheduler know a given instruction is a load. Here is what I'm talking about Let's assume my VLIW target is described as follows: def MyTargetItineraries : ProcessorItineraries<[Slot0, Slot1], [], [ .............................. InstrItinData<RI, [InstrStage<1, [Slot0, Slot1]>]>,

[LLVMdev] DFAPacketizer

2013 Feb 18

[LLVMdev] DFAPacketizer

Hi Anshu, Would there be any interest in extending this algorithm to handling more extensive models, such as VLIW scheduling based on FU's and bundle space... ie handle multiple stages ? I might do it and commit, if there is acceptance and guidance... Jonas ________________________________ From: Anshuman Dasgupta [mailto:adasgupt at codeaurora.org] Sent: Tuesday, February 12, 2013 4:47 PM

similar to: [LLVMdev] Help with hazards