similar to: [LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Displaying 20 results from an estimated 400 matches similar to: "[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR"

2013 Jul 05
0
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some
2013 Jul 05
2
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : > On 07/04/2013 01:39 PM, Stéphane Letz wrote: >> Hi, >> >> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be
2013 Jul 05
0
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote: > > Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : > >> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>> Hi, >>> >>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or
2013 Jul 05
2
[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR
Le 5 juil. 2013 à 17:23, Arnold Schwaighofer <aschwaighofer at apple.com> a écrit : > > On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote: > >> >> Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit : >> >>> On 07/04/2013 01:39 PM, Stéphane Letz wrote: >>>> Hi, >>>>
2010 May 29
3
[LLVMdev] Vectorized LLVM IR
Le 29 mai 2010 à 01:08, Bill Wendling a écrit : > Hi Stéphane, > > The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code? > > -bw > > On May 28, 2010, at 12:13 PM, Stéphane Letz wrote: We are actually testing LLVM for the Faust language
2010 May 28
0
[LLVMdev] Vectorized LLVM IR
Hi Stéphane, The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code? -bw On May 28, 2010, at 12:13 PM, Stéphane Letz wrote: > Hi, > > We are experimenting directly generating vectorized LLVM IR (using <8 x float> kind of types), then compiling the code
2010 May 29
0
[LLVMdev] Vectorized LLVM IR
On Sat, May 29, 2010 at 12:42 AM, Stéphane Letz <letz at grame.fr> wrote: > > Le 29 mai 2010 à 01:08, Bill Wendling a écrit : > >> Hi Stéphane, >> >> The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code? >> >> -bw >>
2010 May 28
3
[LLVMdev] Vectorized LLVM IR
Hi, We are experimenting directly generating vectorized LLVM IR (using <8 x float> kind of types), then compiling the code to SSE on a 64 bits machine. Right now the equivalent code in scalar mode sill outperform the SSE one. What is the quality of the SSE support in X86 LLVL backend? Are they any specific things to be aware of to improve the speed? Thanks Stéphane Letz
2013 Oct 30
2
[LLVMdev] loop vectorizer
The loop vectorizer seems to be not able to vectorize the following code: void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4; const std::uint64_t ir1 = ( (i/inner)
2013 Oct 30
3
[LLVMdev] loop vectorizer
----- Original Message ----- > > > I ran the BB vectorizer as I guess this is the SLP vectorizer. No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt. -Hal > > BBV: using target information > BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
2013 Oct 30
0
[LLVMdev] loop vectorizer
The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens) which then got eliminated later on (since I don't see any vector instructions in the final IR). Below the debug output of the SLP pass: Args: opt -O1 -vectorize-slp -debug loop.ll -S SLP: Analyzing blocks in _Z3barmmPfS_S_. SLP: Found 2 stores to vectorize. SLP:
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the
2010 May 29
1
[LLVMdev] Vectorized LLVM IR
On Sat, May 29, 2010 at 1:18 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Sat, May 29, 2010 at 12:42 AM, Stéphane Letz <letz at grame.fr> wrote: >> >> Le 29 mai 2010 à 01:08, Bill Wendling a écrit : >> >>> Hi Stéphane, >>> >>> The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some
2013 Oct 30
0
[LLVMdev] loop vectorizer
Hi Frank, The access pattern to arrays a and b is non-linear. Unrolled loops are usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all values for i ? Thanks, Nadav On Oct 30, 2013, at 9:05 AM, Frank Winter <fwinter at jlab.org> wrote: > The loop vectorizer seems to be not able to vectorize the following code: > > void bar(std::uint64_t start,
2013 Oct 30
0
[LLVMdev] loop vectorizer
Well, they are not directly consecutive. They are consecutive with a constant offset or stride: ir1 = ir0 + 4 If I rewrite the function in this form void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = (
2014 Sep 02
2
[LLVMdev] Preserving NSW/NUW bits
David/All, Just a quick question about NSW/NUW bits, if you've got a second. I noticed you've been doing a little work on this as of late. I have a bit of code that looks like the following: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %2 = add i64 %indvars.iv.next, -1 %tmp = trunc i64 %2 to i32 %cmp = icmp slt i32 %tmp, %0 br i1 %cmp, label %for.body, label
2015 Aug 20
2
loop unrolling introduces conditional branch
Hi, I want to use loop unrolling pass, however, I find that loop unrolling will introduces conditional branch at end of every "unrolled" part. For example, consider the following code *void foo( int n, int array_x[])* *{* * for (int i=0; i < n; i++)* * array_x[i] = i; * *}* Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify -loop-rotate -lcssa
2015 Aug 22
3
loop unrolling introduces conditional branch
Hi, Mehdi, For example, I have this very simple source code: void foo( int n, int array_x[]) { for (int i=0; i < n; i++) array_x[i] = i; } After I use "clang -emit-llvm -o bc_from_clang.bc -c try.cc", I get bc_from_clang.bc. With my code (using LLVM IRbuilder API), I get bc_from_api.bc. Attachment please find thse two files. I also past the IR here.
2017 Dec 19
4
MemorySSA question
Hi, I am new to MemorySSA and wanted to understand its capabilities. Hence I wrote the following program (test.c): int N; void test(int *restrict a, int *restrict b, int *restrict c, int *restrict d, int *restrict e) { int i; for (i = 0; i < N; i = i + 5) { a[i] = b[i] + c[i]; } for (i = 0; i < N - 5; i = i + 5) { e[i] = a[i] * d[i]; } } I compiled this program using
2014 Sep 24
3
[LLVMdev] Loops Prevent Function Pointer Inlining?
I've CC'ed Chad Rosier as I think this behaviour is a side-effect of his revert of IndVarSimplify.cpp (git c6b1a7e577a0b9e9cff9f9b7ac35a2cde7c448d8, SVN 217962). The change basically makes the IndVar pass change: ; <label>:4 ; preds = %6, %0 %i.0 = phi i32 [ 0, %0 ], [ %11, %6 ] %5 = icmp eq i32 %i.0, 0 br i1 %5, label %6, label %17 To: