thr3ads.net - search: "vectorizor"

Displaying 20 results from an estimated 31 matches for "vectorizor".

Did you mean: vectorizer

2012 Feb 07

[LLVMdev] Vectorization: Next Steps

...we'll need a better loop analysis pass in order for this to > > happen. Some of this was started in LoopDependenceAnalysis, but that > > pass is not yet finished. We'll need something like this to recognize > > affine memory references, etc. > > I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVector...

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

[LLVMdev] Vectorization: Next Steps

...t, which is a layering violation) but the idea is sound. This does mean that running "opt -vectorize foo.bc" would not get the same optimization as running clang with the target you want enabled though. We already have this problem with -loop-reduce though. >> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVector...

[LLVMdev] PR400 - alignment for LD/ST

2007 Apr 02

[LLVMdev] PR400 - alignment for LD/ST

...hat traverses an array, but subsequent optimization/analysis passes can. I think that either form of information would be easy to get, but I don't know what the tradeoffs are (loss of generality or loss of precision). Devang, do you have any thoughts on this or idea of how it would impact a vectorizor? > Also, I've noticed that a some transformations on Loads/Stores don't preserve > either volatility or alignment information. I'm not suprised about the alignment piece (it hasn't been filled in yet) but not preserving volatility is definitely a bug. -Chris -- http://n...

[LLVMdev] Vectorization: Next Steps

2012 Feb 06

[LLVMdev] Vectorization: Next Steps

...n pass. I > think that we'll need a better loop analysis pass in order for this to > happen. Some of this was started in LoopDependenceAnalysis, but that > pass is not yet finished. We'll need something like this to recognize > affine memory references, etc. I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVector...

[LLVMdev] PR400 - alignment for LD/ST

2007 Apr 02

[LLVMdev] PR400 - alignment for LD/ST

On Apr 2, 2007, at 2:12 PM, Chris Lattner wrote: > > Devang, do you have any thoughts on this or idea of how it would > impact a > vectorizor? When you say "load is multiple of 4 bytes away from a 8-byte aligned data" it is not clear whether it is 16-byte aligned or not. However, "load is 4 bytes away from a 8-byte aligned data" is clear - it is aligned at 12-byte and not 16-byte. However, that means, for loop...

[LLVMdev] Auto-vectorization in GCC 4.0

2006 Jul 31

[LLVMdev] Auto-vectorization in GCC 4.0

...te code generation work inside llvmgcc4. Another issue is that gimple has various different forms (high gimple, low gimple, and several other minor forms). We work on "high gimple", so optimizations that require low gimple or later forms won't work. I don't know what the gcc vectorizor uses, but IIRC it runs late in the pipeline, so it probably is low-gimple. I don't think the gimple->llvm translator can't handle low gimple, but it may be possible to extend it. -Chris >> On Jul 31, 2006, at 1:10 PM, Devang Patel wrote: >> >> > llvmgcc4 emit...

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

[LLVMdev] Vectorization: Next Steps

...s a layering violation) but the idea is sound. This does mean that running "opt -vectorize foo.bc" would not get the same optimization as running clang with the target you want enabled though. We already have this problem with -loop-reduce though. > > >> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVector...

[LLVMdev] Vectorization: Next Steps

2012 Feb 13

[LLVMdev] Vectorization: Next Steps

...es be available to the pass, as it is now, when called from a full-compilation driver (like clang)? Or are you suggesting that I propose some object like TLI that might be available in 'opt' even though TLI itself is not available there? Thanks again, Hal > >> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVector...

[LLVMdev] GSoC 2009: Auto-vectorization

2009 Apr 01

[LLVMdev] GSoC 2009: Auto-vectorization

...e one above to start with). Many people have talked about this, but no code has gone in yet. 2. We need some interface that can be implemented by a target machine to describe what the vector capabilities of the machine are. 3. We need the transformation code. Starting with a simple loop vectorizor (as opposed to an SLP system) would make sense to me. Once the basics are in place, it can be feature-crept to support new idioms (reductions, alignment analysis, etc). The first cut doesn't need to provide an immediate speedup, but should have many testcases that demonstrates the loop...

[RFC] Introducing a vector reduction add instruction.

2015 Nov 13

[RFC] Introducing a vector reduction add instruction.

Hi When a reduction instruction is vectorized in a loop, it will be turned into an instruction with vector operands of the same operation type. This new instruction has a special property that can give us more flexibility during instruction selection later: this operation is valid as long as the reduction of all elements of the result vector is identical to the reduction of all elements of its

About Clang llvm PGO

2016 May 07

About Clang llvm PGO

...oticed. For GCC case, PGO itself contributes about 15% performance boost. The majority of the performance improvement comes from loop vectorization. Note that trunk GCC does not turn on vectorization at O2, but O3 or O2 with PGO. LLVM also vectorizes the key loops. However compared with GCC's vectorizor, LLVM's auto-vectorizer produces worse code (e.g, long sequence of instructions to do sign extension etc): ~6.5instr/iter vs ~9instr/iter. GCC also does loop unroll after vectorization which also helped a little more. LLVM's vectorization actually hurts performance a little. We will look...

[LLVMdev] Vectorization: Next Steps

2012 Feb 03

[LLVMdev] Vectorization: Next Steps

As some of you may know, I committed my basic-block autovectorization pass a few days ago. I encourage anyone interested to try it out (pass -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. Especially in combination with -unroll-allow-partial, I have observed some significant benchmark speedups, but, I have also observed some significant slowdowns. I would like to share my

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Oct 29

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

...data-flow consistency. This works > > > > only > > > > within one basic block, but can do loop vectorization in > > > > combination > > > > with (partial) unrolling. The basic idea was inspired by the > > > > Vienna MAP > > > > Vectorizor, which has been used to vectorize FFT kernels, but the > > > > algorithm used here is different. > > > > > > > > To try it, use -bb-vectorize with opt. There are a few options: > > > > -bb-vectorize-req-chain-depth: default: 3 -- The depth of the >...

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Oct 29

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

...rks > > > > > only > > > > > within one basic block, but can do loop vectorization in > > > > > combination > > > > > with (partial) unrolling. The basic idea was inspired by the > > > > > Vienna MAP > > > > > Vectorizor, which has been used to vectorize FFT kernels, but the > > > > > algorithm used here is different. > > > > > > > > > > To try it, use -bb-vectorize with opt. There are a few options: > > > > > -bb-vectorize-req-chain-depth: default: 3 -- T...

[LLVMdev] PR400 - alignment for LD/ST

2007 Apr 02

[LLVMdev] PR400 - alignment for LD/ST

On Apr 2, 2007, at 5:01 PM, Devang Patel wrote: > > On Apr 2, 2007, at 2:12 PM, Chris Lattner wrote: >> >> Devang, do you have any thoughts on this or idea of how it would >> impact a >> vectorizor? > > When you say "load is multiple of 4 bytes away from a 8-byte aligned > data" > it is not clear whether it is 16-byte aligned or not. However, "load > is 4 bytes > away from a 8-byte aligned data" is clear - it is aligned at 12-byte > and not > 16-byt...

[LLVMdev] PR400 - alignment for LD/ST

2007 Apr 02

[LLVMdev] PR400 - alignment for LD/ST

On Apr 2, 2007, at 2:01 PM, Chris Lattner wrote: > On Mon, 2 Apr 2007, Christopher Lamb wrote: >> Here's a related question. It seems that there might be a benefit >> in knowing >> about two alignment values for a load/store. The alignment of the >> load/store >> itself, but potentially also the alignment of the base pointer >> used for the

[LLVMdev] Instruction pattern type inference problem

2007 Apr 23

[LLVMdev] Instruction pattern type inference problem

On Apr 23, 2007, at 5:08 PM, Chris Lattner wrote: > On Sun, 22 Apr 2007, Christopher Lamb wrote: >> 1. Is there a good reason that v2f32 types are excluded from the >> isFloatingPoint filter? Looks like a bug to me. >> >> v2f32 = 22, // 2 x f32 >> v4f32 = 23, // 4 x f32 <== start ?? >> v2f64 = 24, // 2 x

[LLVMdev] Auto-vectorization in GCC 4.0

2006 Jul 31

[LLVMdev] Auto-vectorization in GCC 4.0

On Jul 31, 2006, at 11:14 AM, Vikram Adve wrote: > Does llvmgcc4 convert the high-level AST to LLVM (like llvmgcc3x) > or does it go from GIMPL to LLVM? If the latter, would it be > possible to allow some TreeSSA optimizations before emitting LLVM? llvmgcc4 intercepts high-level GCC trees to GIMPLE tree transformation routines to get trees that are suitable for LLVM byte code.

[LLVMdev] PR400 - alignment for LD/ST

2007 Apr 02

[LLVMdev] PR400 - alignment for LD/ST

On Mon, 2 Apr 2007, Christopher Lamb wrote: > Here's a related question. It seems that there might be a benefit in knowing > about two alignment values for a load/store. The alignment of the load/store > itself, but potentially also the alignment of the base pointer used for the > load/store. Having an alignment attribute on pointer types would solve both > these issues, but

[LLVMdev] Instruction pattern type inference problem

2007 Apr 23

[LLVMdev] Instruction pattern type inference problem

On Sun, 22 Apr 2007, Christopher Lamb wrote: > 1. Is there a good reason that v2f32 types are excluded from the > isFloatingPoint filter? Looks like a bug to me. > > v2f32 = 22, // 2 x f32 > v4f32 = 23, // 4 x f32 <== start ?? > v2f64 = 24, // 2 x f64 <== end > > static inline bool isFloatingPoint(ValueType VT) {

search for: vectorizor