similar to: Missing vectorization of loop due to load late in the loop

Displaying 20 results from an estimated 5000 matches similar to: "Missing vectorization of loop due to load late in the loop"

2019 Aug 26
2
SCEV related question
Here is original C code: void topup(int a[], unsigned long i) { for (; i < 16; i++) { a[i] = 1; } } Here is the IR before the pass where I expect SCEV to return trip-count value ; Function Attrs: nofree norecurse nounwind uwtable writeonly define dso_local void @topup(i32* nocapture %a, i64 %i) local_unnamed_addr #0 { entry: %cmp3 = icmp ult i64 %i, 16 br i1
2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and
2013 Aug 15
4
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces
2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the
2019 Aug 25
2
SCEV related question
Hello, I am first time paying with SCEV codebase. I am trying to find out why ScalarEvolution is not able to give correct back edge taken count for an expression. So in my case flow reaches to howFarToZero() and in that function, I have following expressions as SCEV Start = (15 + (-1 * %i) (which is set to Distance SCEV) Step = 1 now, first of all, should I expect Start as ConstantSCEV (15)
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at
2017 Mar 31
2
Dereferenceable load semantics & LICM
On Fri, Mar 31, 2017 at 10:23 AM, Sanjoy Das <sanjoy at playingwithpointers.com > wrote: > Hi Piotr, > > On March 31, 2017 at 9:07:42 AM, Piotr Padlewski > (piotr.padlewski at gmail.com) wrote: > > Hi all, > > I have a question about dereferenceable metadata on load instruction. I > > have a patch (https://reviews.llvm.org/D31539) for LICM that hoists >
2004 Oct 22
3
pgamma discontinuity (PR#7307)
Full_Name: Morten Welinder Version: 2 OS: Solaris/space/gcc2.95.2 Submission from: (NULL) (65.213.85.217) I changed src/nmath/standalone/test.c to read: --------------------------------------------------------------------------------- #define MATHLIB_STANDALONE 1 #include <Rmath.h> #include <stdio.h> int main() { double x; for (x = 99990; x <= 100009; x++) printf
2017 Apr 13
3
Question on induction variable simplification pass
Hi all, It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution's analysis conservative which can lead to missed performance opportunities. For example, consider this loopnest- int i, j; for(i=0; i< 40; i++) for(j=0;
2019 May 28
6
Making loop guards part of canonical loop structure
Hi all, TL;DR: Should the loop guard branch be part of the canonical loop structure? Background: ----------- While working on the next set of improvements to loop fusion, we discovered that loop rotation will put a guard around a loop if it cannot prove the loop will execute at least once, as seen in the following (simplified) example: entry: br i1 %cmp1, label %for.body.lr.ph, label
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
2017 Jan 10
2
[PATCHish] IfConversion; lost edges for some diamonds
On Tue, Jan 10, 2017 at 2:31 AM, Peter A Jonsson <pj at sics.se> wrote: > Hi Kyle, > > my apologies for mailing you directly but it seems new user creation is > disabled on the llvm bugzilla. > > We sometime lose edges during IfConversion of diamonds and it’s not > obvious how to reproduce on an upstream target. The documentation for > HasFallThrough says *may*
2013 Feb 06
0
[LLVMdev] Rotated loop identification
On Feb 4, 2013, at 10:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote: > Dear all, > > I'm working on a late IR target dependent optimization on loops. A part of this > optimization requires to derive "by hand" the trip-count expression of a given > loop. In order to handle correctly these cases I need to check if the loop has > an entry guard
2013 Feb 07
2
[LLVMdev] Rotated loop identification
Thanks for your reply. Maybe it wasn't so clear, but the optimization I'm writing is target-dependent and so it's declared inside the target backend and is run after the independent optimizer. So I cannot run my pass just after LoopRotate and before InstCombine. I can still use ScalarEvolution, but the instruction combiner can in some cases simplify the entry guard condition. A small
2012 Nov 01
0
[LLVMdev] : Predication on SIMD architectures and LLVM
On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote: > Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2002 Sep 16
4
[LLVMdev] another question
In the section expaining "dyn_cast" There are following lines of code: if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { ... } I cannot understand how you take a operand, a value, and cast it into a Instruction. Can you explain it for me? Another common example is: // Loop over all of the phi nodes in a basic block BasicBlock::iterator BBI =
2013 Feb 04
3
[LLVMdev] Rotated loop identification
Dear all, I'm working on a late IR target dependent optimization on loops. A part of this optimization requires to derive "by hand" the trip-count expression of a given loop. In order to handle correctly these cases I need to check if the loop has an entry guard or not. The problem I have is that starting from the information I derive during my analysis (initial IV value, last IV
2012 Oct 31
3
[LLVMdev] : Predication on SIMD architectures and LLVM
Hi all, I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this
2013 Apr 10
3
[LLVMdev] If Conversion and predicated returns
Evan, et al., I've come across a small issue when using the if conversion pass in PPC to generate conditional returns. Here's a small example: ** Before if conversion ** BB#0: derived from LLVM BB %entry %R3<def> = LI 0 %CR0<def> = CMPLWI %R3, 0 BCC 68, %CR0, <BB#3> Successors according to CFG: BB#3(16) BB#1(16) BB#1: derived from LLVM BB