thr3ads.net - similar to: "[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short"

Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short"

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

2012 Feb 04

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

On Fri, 2012-02-03 at 10:28 +0100, Tobias Grosser wrote: > Hi Hal, > > this is one of the first test cases, I would love to have improved > vectorizer support. I sent it out earlier, but I think it is a good time > to look into it again, after the vectorizer was committed. > > The basic examples is a set of scalar loads that load for consecutive > elements and store

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

2012 Feb 04

[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

Hello, Thanks for your work on the bb-vectorizer. It looks like a promising pass to be used for multi-work-item-vectorization in pocl. On 02/04/2012 06:21 AM, Hal Finkel wrote: > Try it now (after r149761). If this "solution" causes other problems, > then we may need to think of something more sophisticated. I wonder if the case where a store is the last user of the value could

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Nov 23

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Mon, 2011-11-21 at 21:22 -0600, Hal Finkel wrote: > On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote: > > Tobias, > > > > I've attached an updated patch. It contains a few bug fixes and many > > (refactoring and coding-convention) changes inspired by your comments. > > > > I'm currently trying to fix the bug responsible for causing a compile

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Nov 22

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote: > Tobias, > > I've attached an updated patch. It contains a few bug fixes and many > (refactoring and coding-convention) changes inspired by your comments. > > I'm currently trying to fix the bug responsible for causing a compile > failure when compiling >

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 04

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Dec 02

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On 11/23/2011 05:52 PM, Hal Finkel wrote: > On Mon, 2011-11-21 at 21:22 -0600, Hal Finkel wrote: >> > On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote: >>> > > Tobias, >>> > > >>> > > I've attached an updated patch. It contains a few bug fixes and many >>> > > (refactoring and coding-convention) changes inspired

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : > On 07/04/2013 01:39 PM, Stéphane Letz wrote: >> Hi, >> >> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote: > > Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : > >> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>> Hi, >>> >>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

----- Original Message ----- > > > I ran the BB vectorizer as I guess this is the SLP vectorizer. No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt. -Hal > > BBV: using target information > BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens) which then got eliminated later on (since I don't see any vector instructions in the final IR). Below the debug output of the SLP pass: Args: opt -O1 -vectorize-slp -debug loop.ll -S SLP: Analyzing blocks in _Z3barmmPfS_S_. SLP: Found 2 stores to vectorize. SLP:

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

2013 Jun 26

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Sent from my iPhone... On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> >> >> >> On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: >> >> >> >> >> Indvars should ideally preserve NSW flags whenever possible. However, >> we don't want to

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

[LLVMdev] loop vectorizer misses opportunity, exploit

2013 Oct 31

[LLVMdev] loop vectorizer misses opportunity, exploit

----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

2013 Jun 25

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

----- Original Message ----- > > > > On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > Indvars should ideally preserve NSW flags whenever possible. However, > we don't want to rely on SCEV to preserve them. SCEV expressions are > implicitly reassociated and uniqued in a flow-insensitive universe > independent of the

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

similar to: [LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short