thr3ads.net - similar to: "Missing vectorization of loop due to load late in the loop"

Displaying 20 results from an estimated 5000 matches similar to: "Missing vectorization of loop due to load late in the loop"

SCEV related question

2019 Aug 26

SCEV related question

Here is original C code: void topup(int a[], unsigned long i) { for (; i < 16; i++) { a[i] = 1; } } Here is the IR before the pass where I expect SCEV to return trip-count value ; Function Attrs: nofree norecurse nounwind uwtable writeonly define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr #0 { entry: %cmp3 = icmp ult i64 %i, 16 br i1

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the

SCEV related question

2019 Aug 25

SCEV related question

Hello, I am first time paying with SCEV codebase. I am trying to find out why ScalarEvolution is not able to give correct back edge taken count for an expression. So in my case flow reaches to howFarToZero() and in that function, I have following expressions as SCEV Start = (15 + (-1 * %i) (which is set to Distance SCEV) Step = 1 now, first of all, should I expect Start as ConstantSCEV (15)

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at

Dereferenceable load semantics & LICM

2017 Mar 31

Dereferenceable load semantics & LICM

On Fri, Mar 31, 2017 at 10:23 AM, Sanjoy Das <sanjoy at playingwithpointers.com > wrote: > Hi Piotr, > > On March 31, 2017 at 9:07:42 AM, Piotr Padlewski > (piotr.padlewski at gmail.com) wrote: > > Hi all, > > I have a question about dereferenceable metadata on load instruction. I > > have a patch (https://reviews.llvm.org/D31539) for LICM that hoists >

pgamma discontinuity (PR#7307)

2004 Oct 22

pgamma discontinuity (PR#7307)

Full_Name: Morten Welinder Version: 2 OS: Solaris/space/gcc2.95.2 Submission from: (NULL) (65.213.85.217) I changed src/nmath/standalone/test.c to read: --------------------------------------------------------------------------------- #define MATHLIB_STANDALONE 1 #include <Rmath.h> #include <stdio.h> int main() { double x; for (x = 99990; x <= 100009; x++) printf

Question on induction variable simplification pass

2017 Apr 13

Question on induction variable simplification pass

Hi all, It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution's analysis conservative which can lead to missed performance opportunities. For example, consider this loopnest- int i, j; for(i=0; i< 40; i++) for(j=0;

Making loop guards part of canonical loop structure

2019 May 28

Making loop guards part of canonical loop structure

Hi all, TL;DR: Should the loop guard branch be part of the canonical loop structure? Background: ----------- While working on the next set of improvements to loop fusion, we discovered that loop rotation will put a guard around a loop if it cannot prove the loop will execute at least once, as seen in the following (simplified) example: entry: br i1 %cmp1, label %for.body.lr.ph, label

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

[PATCHish] IfConversion; lost edges for some diamonds

2017 Jan 10

[PATCHish] IfConversion; lost edges for some diamonds

On Tue, Jan 10, 2017 at 2:31 AM, Peter A Jonsson <pj at sics.se> wrote: > Hi Kyle, > > my apologies for mailing you directly but it seems new user creation is > disabled on the llvm bugzilla. > > We sometime lose edges during IfConversion of diamonds and it’s not > obvious how to reproduce on an upstream target. The documentation for > HasFallThrough says *may*

[LLVMdev] Rotated loop identification

2013 Feb 06

[LLVMdev] Rotated loop identification

On Feb 4, 2013, at 10:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote: > Dear all, > > I'm working on a late IR target dependent optimization on loops. A part of this > optimization requires to derive "by hand" the trip-count expression of a given > loop. In order to handle correctly these cases I need to check if the loop has > an entry guard

[LLVMdev] Rotated loop identification

2013 Feb 07

[LLVMdev] Rotated loop identification

Thanks for your reply. Maybe it wasn't so clear, but the optimization I'm writing is target-dependent and so it's declared inside the target backend and is run after the independent optimizer. So I cannot run my pass just after LoopRotate and before InstCombine. I can still use ScalarEvolution, but the instruction combiner can in some cases simplify the entry guard condition. A small

[LLVMdev] : Predication on SIMD architectures and LLVM

2012 Nov 01

[LLVMdev] : Predication on SIMD architectures and LLVM

On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote: > Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we

[RFC] New pass: LoopExitValues

2015 Sep 01

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

[LLVMdev] another question

2002 Sep 16

[LLVMdev] another question

In the section expaining "dyn_cast" There are following lines of code: if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { ... } I cannot understand how you take a operand, a value, and cast it into a Instruction. Can you explain it for me? Another common example is: // Loop over all of the phi nodes in a basic block BasicBlock::iterator BBI =

[LLVMdev] Rotated loop identification

2013 Feb 04

[LLVMdev] Rotated loop identification

Dear all, I'm working on a late IR target dependent optimization on loops. A part of this optimization requires to derive "by hand" the trip-count expression of a given loop. In order to handle correctly these cases I need to check if the loop has an entry guard or not. The problem I have is that starting from the information I derive during my analysis (initial IV value, last IV

[LLVMdev] : Predication on SIMD architectures and LLVM

2012 Oct 31

[LLVMdev] : Predication on SIMD architectures and LLVM

Hi all, I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this

[LLVMdev] If Conversion and predicated returns

2013 Apr 10

[LLVMdev] If Conversion and predicated returns

Evan, et al., I've come across a small issue when using the if conversion pass in PPC to generate conditional returns. Here's a small example: ** Before if conversion ** BB#0: derived from LLVM BB %entry %R3<def> = LI 0 %CR0<def> = CMPLWI %R3, 0 BCC 68, %CR0, <BB#3> Successors according to CFG: BB#3(16) BB#1(16) BB#1: derived from LLVM BB

similar to: Missing vectorization of loop due to load late in the loop