thr3ads.net - similar to: "[LLVMdev] LLVM Loop Vectorizer puzzle"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] LLVM Loop Vectorizer puzzle"

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, The TinyTripCountVectorThreshold only applies to loops with a known (constant) trip count. If a loop has a trip count below this value we don’t attempt to vectorize the loop. The loop below has an unknown trip count. Once we decide to vectorize a loop, we emit code to check whether we can execute one iteration of the vectorized body. This is the code quoted below. On May 22, 2013, at 10:23

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, Just from personal interest, is there a canonical way in IR+metadata to express "This small constant trip-count loop is desired to be converted into a sequence of vector operations directly"? Ie, mapping a 4 element i32 loop into a linear sequence of <4 x i32> operations. Obviously this may not always be a win, but I'm just wondering if there's a way to communicate

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

----- Original Message ----- > > > > Hi, > > Just from personal interest, is there a canonical way in IR+metadata > to express "This small constant trip-count loop is desired to be > converted into a sequence of vector operations directly"? Ie, > mapping a 4 element i32 loop into a linear sequence of <4 x i32> > operations. Obviously this may not

[LLVMdev] Polly issue

2013 May 20

[LLVMdev] Polly issue

Hi, When I test "matmul" in the polly directory, I get the following performance data: //=============================== --> 12. Compare the runtime of the executables time ./matmul.normalopt.exe 0:23.53 real, 23.48 user, 0.00 sys time ./matmul.polly.interchanged.exe 0:22.86 real, 22.82 user, 0.01 sys time ./matmul.polly.interchanged+tiled.exe 0:22.87 real, 22.83 user, 0.00 sys

[LLVMdev] Are 7th LLVM Developer meeting's PPTs available?

2013 Dec 02

[LLVMdev] Are 7th LLVM Developer meeting's PPTs available?

Hi, everyone: The 7th LLVM developer meeting was held at November 6-7, 2013. I can't find more infos other than abstract. Any help? Thanks. Regards, maxs

[LLVMdev] LLVM Loop Vectorizer is enabled by default???

2013 Jun 03

[LLVMdev] LLVM Loop Vectorizer is enabled by default???

In http://llvm.org/docs/Vectorizers.html, it says "LLVM’s Loop Vectorizer is now enabled by default for -O3". But I use the following command: opt -O3 -debug-pass=Arguments test.ll -o /dev/null I can't see the "loop-vectorize" option in the result. Any advice ? My opt version is: ===================================== $ opt --version LLVM (http://llvm.org/): LLVM

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

On 06/11/13 08:54, Arnold wrote: > > > Sent from my iPhone > > On Nov 5, 2013, at 7:39 PM, Frank Winter <fwinter at jlab.org > <mailto:fwinter at jlab.org>> wrote: > >> Good that you bring this up. I still have no solution to this >> vectorization problem. >> >> However, I can rewrite the code and insert a second loop which >>

[LLVMdev] assert in InnerLoopVectorizer::createEmptyLoop

2012 Dec 21

[LLVMdev] assert in InnerLoopVectorizer::createEmptyLoop

I am seeing an assert when I compile the attached program with clang: $ clang -fno-strict-aliasing -target mips64el-unknown-linux -O3 -fomit-frame-pointer -S test1.c -o test1.ll -emit-llvm It asserts when LoopVectorize.cpp:506 is executed. It looks like it is complaining because it is trying to zero-extend an i64 (type Count->getType() returns i64) to an i32 (IdxTy). if (Count->getType()

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

Sent from my iPhone > On Nov 5, 2013, at 7:39 PM, Frank Winter <fwinter at jlab.org> wrote: > > Good that you bring this up. I still have no solution to this vectorization problem. > > However, I can rewrite the code and insert a second loop which eliminates the 'urem' and 'div' instructions in the index calculations. In this case, the inner loop's trip

[LLVMdev] loop vectorizer

2013 Nov 06

[LLVMdev] loop vectorizer

Good that you bring this up. I still have no solution to this vectorization problem. However, I can rewrite the code and insert a second loop which eliminates the 'urem' and 'div' instructions in the index calculations. In this case, the inner loop's trip count would be equal to the SIMD length and the loop vectorizer ignores the loop. Unrolling the loop and SLP is not an

puzzle about contrasts

2008 Sep 09

puzzle about contrasts

Hi, I'm trying to redefine the contrasts for a linear model. With a 2 level factor, x, with levels A and B, a two level factor outputs A and B - A from an lm fit, say lm(y ~ x). I would like to set the contrasts so that the coefficients output are -0.5 (A + B) and B - A, but I can't get the sign correct for the first coefficient (Intercept). Here is a toy example, set.seed(12161952) y

[LLVMdev] assert in InnerLoopVectorizer::createEmptyLoop

2012 Dec 21

[LLVMdev] assert in InnerLoopVectorizer::createEmptyLoop

Please file a bug for these types of issues. On Fri, Dec 21, 2012 at 1:49 PM, Akira Hatanaka <ahatanak at gmail.com> wrote: > I am seeing an assert when I compile the attached program with clang: > > $ clang -fno-strict-aliasing -target mips64el-unknown-linux -O3 > -fomit-frame-pointer -S test1.c -o test1.ll -emit-llvm > > > It asserts when LoopVectorize.cpp:506 is

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On Thu, May 23, 2013 at 12:02 PM, Nadav Rotem <nrotem at apple.com> wrote: > > On May 23, 2013, at 8:52 AM, "Redmond, Paul" <paul.redmond at intel.com> > wrote: > > > !0 = metadata !{ metadata !1, metadata !2 } > !1 = metadata !{ metadata !"llvm.loop.parallel" } > !2 = metadata !{ metadata !"llvm.vectorization.vector_width", i32 8

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On May 23, 2013, at 8:52 AM, "Redmond, Paul" <paul.redmond at intel.com> wrote: > > !0 = metadata !{ metadata !1, metadata !2 } > !1 = metadata !{ metadata !"llvm.loop.parallel" } > !2 = metadata !{ metadata !"llvm.vectorization.vector_width", i32 8 } > > I'm not even sure you would need the llvm.loop.parallel anymore since the

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On May 23, 2013, at 10:37 AM, Cameron McInally <cameron.mcinally at nyu.edu> wrote: > In all fairness, I do not believe that ivdep is an ICC-specific pragma. There are many compilers that support ivdep and lots of legacy (and modern) codes that benefit from it. Seems silly, to me at least, to reinvent the wheel. Hi Cameron, The history of the idvep pragma is fascinating. I did not

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 05

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

On 07/04/2013 01:39 PM, Stéphane Letz wrote: > Hi, > > Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

2013 Jul 04

[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Hi, Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to

similar to: [LLVMdev] LLVM Loop Vectorizer puzzle