thr3ads.net - similar to: "Vector Shuffle chain lowering to X86 instructions simplification inconsistencies"

Displaying 20 results from an estimated 3000 matches similar to: "Vector Shuffle chain lowering to X86 instructions simplification inconsistencies"

Loop Unrolling Fail in Simple Vectorized loop

2016 Oct 13

Loop Unrolling Fail in Simple Vectorized loop

If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4, I think. On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < llvm-dev at lists.llvm.org> wrote: > So, I tried unrolling the following simple loop. > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i++){ > > a[i] =

Loop Unrolling Fail in Simple Vectorized loop

2016 Oct 13

Loop Unrolling Fail in Simple Vectorized loop

Thanks for the explanation. But I am a little confused with the following fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert the loop to say; for(unsigned i = 0; i < vectorizable_elements ; i += 2){ //main loop } for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ //fix up } Why does it have to reason about the range of vectorizable_elements? Even

llc error

2016 Sep 03

llc error

I updated to the latest revision and now llvm does not build and quits cmake with CMake Error at cmake/modules/LLVMProcessSources.cmake:83 (message): Found unknown source file ../llvm-revec/lib/CodeGen/MachineFunctionAnalysis.cpp Please update ../llvm-revec/lib/CodeGen/CMakeLists.txt Thanks On Sat, Sep 3, 2016 at 2:09 AM, Craig Topper <craig.topper at gmail.com> wrote: >

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

I isolated the LLVM IR and the X86 instructions emitted for the function and are attached herewith and it is clearly emitting vector instructions. I am having a hard time figuring out where the vector instructions are formulated. For sure SLP and Loop vectorizer is not doing anything. On Mon, Aug 21, 2017 at 11:56 AM, Craig Topper <craig.topper at gmail.com> wrote: > The X86 backend

Loop Unrolling Fail in Simple Vectorized loop

2016 Oct 12

Loop Unrolling Fail in Simple Vectorized loop

Hi all, Attached herewith is a simple vectorized function with loops performing a simple shuffle. I want all loops (inner and outer) to be unrolled by 2 and as such used -unroll-count=2 The inner loops(with k as the induction variable and having constant trip counts) unroll fully, but the outer loop with (j) fails to unroll. The llvm code is also attached with inner loops fully unrolled. To

llc error

2016 Sep 03

llc error

Hi all, The attached LLVM assembly file fails to generate x86 code when compiled using llc. compilation command - ../llvm-build/bin/llc -filetype=asm -march=x86-64 -mcpu=core-avx2 ex4.ll The error message is, LLVM ERROR: Cannot select: t95: v8f32 = X86ISD::SUBV_BROADCAST t17 t17: v4f32,ch = load<LD16[%scevgep](tbaa=<0x4dbcd98>)> t0, t16, undef:i64 t16: i64 = add t2,

Changing Alignment of global variables in LLVM

2017 Oct 03

Changing Alignment of global variables in LLVM

If I know for sure I am accessing 32 byte chunks at a time, how can I go about changing the alignment of @u? Should I use DataLayout's reset method? I couldn't find a method to change alignment of one global variable. Thanks On Tue, Oct 3, 2017 at 6:34 PM, Matthias Braun <mbraun at apple.com> wrote: > The effective alignment is part of the load and store operations. Updating

Changing Alignment of global variables in LLVM

2017 Oct 03

Changing Alignment of global variables in LLVM

What is the best way to change the alignment of global variables and allocated structures in LLVM during one of its optimization passes? For example, I want to change, @u = internal unnamed_addr global [5 x [65 x [65 x [65 x double]]]] zeroinitializer, align 16 to align to 32 bytes. How can this be accomplished so that all other references in the code accessing this structure are also

Vectorization in LLVM x86 backend

2017 Aug 21

Vectorization in LLVM x86 backend

Hi all, Recently I compiled the attached .c file using Clang with "-mavx2 -mfma -m32 -O3" optimization flags. First I used -emit-llvm and inspected the LLVM IR and there are no vector instructions. Then I got the assembly output of the file in it I can clearly see vector instructions in it. Neither the SLPVectorizer or the LoopVectorizer is however doing any vectorization (also

Getting the symbolic expression for an address calculation

2016 Oct 04

Getting the symbolic expression for an address calculation

How do you generate a SCEVAddRecExpr from a SCEV? It tried dyn_casting and it seems like that the SCEV returned by getSCEV is not a SCEVAddRecExpr. Thanks On Fri, Sep 30, 2016 at 4:16 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 9/30/2016 12:16 PM, Charith Mendis via llvm-dev wrote: > >> >> Hi all, >> >> What is the best way to get the symbolic

Getting the symbolic expression for an address calculation

2016 Sep 30

Getting the symbolic expression for an address calculation

Hi all, What is the best way to get the symbolic expression for an address calculation in llvm specially when memory addresses are calculated within a loop. Use case: I want to know what loop induction variables are used for a particular address calculation and in what symbolic context. Thereby, I want to identify which stores and loads will be contiguous in memory if I unroll each of the

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 30

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about that. I'll fix those. On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > Here is another test. > > When looking at the AVX codegen, I noticed that, when using the new > shuffle lowering, we no longer emit a single vbroadcastss in the case

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 23

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > If you don’t want to spend time on this, I’d be happy to create a > candidate patch for review? I’ve been unclear if you were taking patches > for your shuffle work prior to it becoming the default. While I'm happy to work on it, I'm even more happy to have patches. =D -------------- next

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

2015 Jan 04

[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...

On Sun, Jan 4, 2015 at 3:20 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 24 Nov 2014, at 17:53, Chandler Carruth <chandlerc at gmail.com> wrote: > > > I'll be skimming the PRs to see if there are any really critical > regressions, but so far it looks pretty good. > > > > If you are actively disabling the new vector shuffling and have some PR

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 20

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

After some adding some serious ninja-ry to the new shuffle lowering... On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com> wrote: > 2. none_useless_shuflle none > Instead of using a single move to materialize a zero extended constant > into a vector register, we explicitly zeroed a vector register and use a > shuffle. > ... this test case is fixed,

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 25

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I ran the benchmarking subset of test-suite on a btver2 machine and optimizing for btver2 (so enabling AVX codegen). I don't see anything outside of the noise with x86-experimental-vector-shuffle-legality=1. On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

Hi, The current definition of shuffle vector is <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x i32> <mask> ; yields <n x <ty>> The first two operands of a 'shufflevector' instruction are vectors with types that match each other and types that match the result of the instruction. The third

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

On Mon, Sep 29, 2008 at 8:11 PM, Mon Ping Wang <wangmp at apple.com> wrote: > The problem with generating insert and extracts is that we can generate poor > code > %tmp16 = extractelement <4 x float> %f4b, i32 0 > %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0 > %tmp18 = extractelement <4 x float> %f4b, i32 1 > %f8c

similar to: Vector Shuffle chain lowering to X86 instructions simplification inconsistencies