thr3ads.net - similar to: "LoopVectorize fails to vectorize more complex loops"

Displaying 20 results from an estimated 6000 matches similar to: "LoopVectorize fails to vectorize more complex loops"

LoopVectorize fails to vectorize code with condition on reduction

2018 Jun 11

LoopVectorize fails to vectorize code with condition on reduction

Hello. I'm not able to vectorize this simple C loop doing basically what could be called predicated sum-reduction: #define NMAX 1000 int colOccupied[NMAX]; void Func(int N) { int numSol = 0; for (int c = 0; c < N; c++) { if (colOccupied[c] == 0) numSol++; } return numSol; } The compiler

Separate LoopVectorize LLVM pass

2017 Apr 14

Separate LoopVectorize LLVM pass

Hello. I am trying to create my own LoopVectorize.cpp pass as a separate pass from the LLVM trunk, as described in http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project. Did anybody try something like this? I added close to the end of the .cpp file: /* this line seems to be required - it allows to run this pass as an embedded pass by giving opt -my-loop-vectorize

LoopVectorize module - some possible enhancements

2016 Aug 21

LoopVectorize module - some possible enhancements

Hello, Michael, I'd like to ask if we can enhance the LoopVectorize LLVM module (I am currently using a version from Jul 2016). More exactly: - do you envision to support in the near future LLVM IR gather and scatter intrinsics (as described at http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics and scatter)? I see you have defined some methods that should

[LLVMdev] RFC: Loop versioning for LICM

2015 Feb 26

[LLVMdev] RFC: Loop versioning for LICM

Hi Ashutosh, Have you been following the recent Loop Access Analysis work? LAA was split out from the Loop Vectorizer that have been performing the kind of loop versioning that you describe. The main reason was to be able to share this functionality with other passes. Loop Access Analysis is an analysis pass that computes basic memory dependence and the runtime checks. The versioning decision

[LLVMdev] RFC: Loop versioning for LICM

2015 Feb 26

[LLVMdev] RFC: Loop versioning for LICM

I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

[LLVMdev] LLVM loop vectorizer

2016 Feb 18

[LLVMdev] LLVM loop vectorizer

Hi Alex, I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes. Thanks, Michael > On Feb 15, 2016, at 6:44 AM, RCU

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Hello all, I wanted to get some feedback on this patch for ScalarEvolution. It addresses a performance problem I am seeing for simple benchmark. Starting with this C code: 01: signed char foo(void) 02: { 03: const int count = 8000; 04: signed char result = 0; 05: int j; 06: 07: for (j = 0; j < count; ++j) { 08: result += (result_t)(3); 09: } 10: 11: return result; 12: } I

[LLVMdev] LLVM loop vectorizer

2016 Jun 04

[LLVMdev] LLVM loop vectorizer

Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael > On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: >

[LLVMdev] LLVM loop vectorizer

2016 Jun 07

[LLVMdev] LLVM loop vectorizer

Hi Alex, This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771 Adam > On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

2016 Apr 20

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Hi, Would you be able to kindly check and assist with the IndVarSimplify / SCEV problem I got in the latest LLVM, please? Sometimes IndVarSimplify may not eliminate narrow IV's when there actually exists such a possibility. This may affect other LLVM passes and result in inefficient code. The reproducing test 'indvar_test.cpp' is attached. The problem is with the second

[LLVMdev] LLVM loop vectorizer

2015 Jul 08

[LLVMdev] LLVM loop vectorizer

Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV function is inspired from

[LLVMdev] E = L->begin() in LoopVectorize

2014 Mar 18

[LLVMdev] E = L->begin() in LoopVectorize

Hi, I'm studying loop vectorizer. I don't understand the code yet. But it looks not right to assign L->begin() to E. Is it a typo? Thanks, Liang diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp index 435c005..87b5d79 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@

[LLVMdev] E = L->begin() in LoopVectorize

2014 Mar 18

[LLVMdev] E = L->begin() in LoopVectorize

Looking at it now, curious why no tests failed. On Tue, Mar 18, 2014 at 2:48 PM, Jim Grosbach <grosbach at apple.com> wrote: > Almost certainly, yes. Nice catch! > > > On Mar 18, 2014, at 2:38 PM, Liang Wang <netcasper at gmail.com> wrote: > > > Hi, > > > > I'm studying loop vectorizer. I don't understand the code yet. But > > it

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8

[LLVMdev] Loss of precision with very large branch weights

2015 Apr 24

[LLVMdev] Loss of precision with very large branch weights

In PR 22718, we are looking at issues with long running applications producing non-representative frequencies. For example, in these two loops: int g = 0; __attribute__((noinline)) void bar() { g++; } extern int printf(const char*, ...); int main() { int i, j, k; for (i = 0; i < 1000000; i++) bar(); printf ("g = %d\n", g); g = 0; for (i = 0; i < 500000; i++)

[LLVMdev] RFC: Loop versioning for LICM

2015 Mar 04

[LLVMdev] RFC: Loop versioning for LICM

> On Mar 3, 2015, at 1:29 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com <mailto:Ashutosh.Nema at amd.com>> wrote: > > Hi Adam, > > Thanks for looking into LoopVersioning work. > > I have gone through recent LoopAccessAnalysis changes and found some of the stuff > overlaps (i.e. runtime memory check, loop access analysis etc.). LoopVersioning can > use

TableGen - Help to implement a form of gather/scatter operations for Mips MSA

2016 Dec 09

TableGen - Help to implement a form of gather/scatter operations for Mips MSA

Hello. I read on page 4 of http://www.cs.fsu.edu/~whalley/cda5155/chap4.pdf that gather and scatter operations exist for Mips, named LVI and SVI, respectively. Did anyone think of implementing in the LLVM Mips back end (part of the MSA vector instructions) gather and scatter operations? If so, can you share with me the TableGen spec? (I tried to start from LD_DESC_BASE, but it

[LLVMdev] Loop Vectorization and Store-Load Forwarding issue

2015 Jun 12

[LLVMdev] Loop Vectorization and Store-Load Forwarding issue

I have been looking into this small test case (Part A) where loop vectorization is disabled due to possible store-load forwarding conflict (Part B). As you can see, due to the presence of dependence distance 2 the loop is vectorizable only for a width of 2. However, the presence of dependence distance 15 (due to y[j-15]) results in store-load forwarding issue as store packet of y[16:17] (iteration

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried

similar to: LoopVectorize fails to vectorize more complex loops