thr3ads.net - similar to: "[LLVMdev] loop vectorizer issue"

Displaying 20 results from an estimated 900 matches similar to: "[LLVMdev] loop vectorizer issue"

2013 Nov 03

[LLVMdev] loop vectorizer issue

Actually what I meant in my original loop, that there is a dependency between every two consecutive iterations. So, how the loop vectorizer says 'we can vectorize this loop'? for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; From: Henrique Santos [mailto:henrique.nazare.santos at gmail.com] Sent: Sunday, November 03, 2013 4:28 PM To: Sara Elshobaky Cc: <llvmdev at

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Notice that the code you provided, for globals and stack allocations, at least, is semantically equivalent to: int a = d[19]; for(int k = 20; k < 50; k++) dataY[k] = a; Like so, the load you see missing was redundant, probably hoisted by GVN/PRE and replaced with "%.pre". H. On Sun, Nov 3, 2013 at 11:26 AM, Sara Elshobaky <sara.elshobaky at gmail.com>wrote: >

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Hi Sarah, the loop vectorizer runs not on the C code but on LLVM IR this c code was lowered to. Before the loop vectorizer runs many other optimization change the shape of this IR. You can see in the LLVM IR you referenced below, a preceding LLVM IR transformation has change your loop from: > for(int k=20;k<50;k++) > dataY[k] = dataY[k-1]; to > int a = d[19]; >

[LLVMdev] Help with hazards

2011 Dec 14

[LLVMdev] Help with hazards

The scoreboard hazard detector that I've added for the PPC 440 is not detecting hazards as it should (which certainly could be my fault somehow, but...). For example, it will produce a schedule that looks like... SU(28): 0x127969b0: f64,ch = LFD 0x12793aa0, 0x1277b4f0, 0x127965b0<Mem:LD8[%scevgep100](tbaa=!"double")> [ORD=41] [ID=28] SU(46): 0x12796ab0: f64 = FADD 0x127969b0,

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at

[LLVMdev] llvm jit

2013 Oct 07

[LLVMdev] llvm jit

So, what is the use of the profile passes in LLVM? Also, does llvm detect hot blocks of code for recompilation? On Mon, Oct 7, 2013 at 4:44 PM, Amara Emerson <amara.emerson at arm.com> wrote: > No, the JIT does not do any profile guided optimizations for any > architecture. It just uses the static compilation components before loading > the object into memory and running its own

[LLVMdev] Modify a module at runtime in MCJIT

2014 Sep 01

[LLVMdev] Modify a module at runtime in MCJIT

Hello, I'm using MCJIT to run some loops on my ARM processor. I was trying to perform some runtime optimizations on some function, and this requires recompiling the function at runtime. I know that this feature is not available yet in MCJIT , and to modify a function I have to create a new module with the newly optimized code. My questions are: - The newly created module can be

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces

[LLVMdev] Runtime optimizer

2013 Oct 04

[LLVMdev] Runtime optimizer

Hello, Please, I need more information on the runtime optimizer used in the LLVM JIT. - Where can I find it in the LLVM source code? - Are those runtime optimizations done on the LLVM representation code or on the machine code? Sara -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Number of instructions executed

2014 Jan 26

[LLVMdev] Number of instructions executed

Hello, I'm executing my byte code program by the lli tool using mcjit. I need a way to find statistics about the number of instructions executed for my program. The -stats option does not include this value, are there any other way to know? I need this information to compare different versions of my bytecode program. Please advice Thanks in advance Sara Elsohbaky --------------

Regarding Dependence distance dump

2018 Sep 19

Regarding Dependence distance dump

On Wed, Sep 19, 2018 at 4:58 AM Venkataramanan Kumar < venkataramanan.kumar.llvm at gmail.com> wrote: > Hi, > > I tired to see when this behavior changed in LLVM. > It seems to start from. > --snip-- > commit 95e5d37d5868ebde2302bc302c1e0af407c5646d > Author: Sebastian Pop <sebpop at gmail.com> > Date: Tue Mar 6 21:55:59 2018 +0000 > > DA: remove

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should

[LLVMdev] noinline attribute problem

2014 Feb 25

[LLVMdev] noinline attribute problem

Hello, I have the following simple C code below. It should return '8' as a result. But the returned result is false as it returns '1'. When I remove the line of '__attribute__((noinline))' , the returned results are correct '8'. Any idea? Please advice as I need to get the assembly code of the 'getTexSize' function alone. Note: I compile using the

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Hi Nick, Thanks for looking into it. I have tried that as well but it didn't worked. "AddExpr->getOperand(0))" node is: " (4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>" When I cast this to "SCEVAddRecExpr" it returns NULL. Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19,

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)". So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ? Is that the right way ? Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19, 2015 1:02 PM To: Nema, Ashutosh Cc:

[LLVMdev] llvm jit

2013 Oct 07

[LLVMdev] llvm jit

hello, I have a question about the llvm jit. Does it use the profile information generated during runtime to enhance the generated code for arm processor? according to 'LLVM: A Compilation Framework for Lifelong Program Analysis **<http://llvm.org/pubs/2004-01-30-CGO-LLVM.html>' it is available but can't find it in the current source code. I really appreciate any help. Thanks in

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Hi, I'm trying to cast one of the SCEV node to "SCEVAddRecExpr". Every time cast return NULL, and I'm unable to do this. SCEV Node: ((4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw> + %var)<nsw> Casting: const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SCEVNode); 'var' is of type float pointer (float*). Without 'sext' it works, but

similar to: [LLVMdev] loop vectorizer issue