Displaying 20 results from an estimated 3000 matches similar to: "Vector Shuffle chain lowering to X86 instructions simplification inconsistencies"
2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4,
I think.
On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> So, I tried unrolling the following simple loop.
>
> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){
>
> for(unsigned i=0; i<count; i++){
>
> a[i] =
2016 Oct 13
2
Loop Unrolling Fail in Simple Vectorized loop
Thanks for the explanation. But I am a little confused with the following
fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert
the loop to say;
for(unsigned i = 0; i < vectorizable_elements ; i += 2){
//main loop
}
for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
//fix up
}
Why does it have to reason about the range of vectorizable_elements? Even
2016 Sep 03
2
llc error
I updated to the latest revision and now llvm does not build and quits
cmake with
CMake Error at cmake/modules/LLVMProcessSources.cmake:83 (message):
Found unknown source file
../llvm-revec/lib/CodeGen/MachineFunctionAnalysis.cpp
Please update
../llvm-revec/lib/CodeGen/CMakeLists.txt
Thanks
On Sat, Sep 3, 2016 at 2:09 AM, Craig Topper <craig.topper at gmail.com> wrote:
>
2017 Aug 21
2
Vectorization in LLVM x86 backend
I isolated the LLVM IR and the X86 instructions emitted for the function
and are attached herewith and it is clearly emitting vector instructions. I
am having a hard time figuring out where the vector instructions are
formulated. For sure SLP and Loop vectorizer is not doing anything.
On Mon, Aug 21, 2017 at 11:56 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> The X86 backend
2016 Oct 12
2
Loop Unrolling Fail in Simple Vectorized loop
Hi all,
Attached herewith is a simple vectorized function with loops performing a
simple shuffle.
I want all loops (inner and outer) to be unrolled by 2 and as such used
-unroll-count=2
The inner loops(with k as the induction variable and having constant trip
counts) unroll fully, but the outer loop with (j) fails to unroll.
The llvm code is also attached with inner loops fully unrolled.
To
2016 Sep 03
4
llc error
Hi all,
The attached LLVM assembly file fails to generate x86 code when compiled
using llc.
compilation command - ../llvm-build/bin/llc -filetype=asm -march=x86-64
-mcpu=core-avx2 ex4.ll
The error message is,
LLVM ERROR: Cannot select: t95: v8f32 = X86ISD::SUBV_BROADCAST t17
t17: v4f32,ch = load<LD16[%scevgep](tbaa=<0x4dbcd98>)> t0, t16, undef:i64
t16: i64 = add t2,
2017 Oct 03
2
Changing Alignment of global variables in LLVM
If I know for sure I am accessing 32 byte chunks at a time, how can I go
about changing the alignment of @u?
Should I use DataLayout's reset method? I couldn't find a method to change
alignment of one global variable.
Thanks
On Tue, Oct 3, 2017 at 6:34 PM, Matthias Braun <mbraun at apple.com> wrote:
> The effective alignment is part of the load and store operations. Updating
2017 Oct 03
2
Changing Alignment of global variables in LLVM
What is the best way to change the alignment of global variables and
allocated structures in LLVM during one of its optimization passes?
For example, I want to change,
@u = internal unnamed_addr global [5 x [65 x [65 x [65 x double]]]]
zeroinitializer, align 16
to align to 32 bytes.
How can this be accomplished so that all other references in the code
accessing this structure are also
2017 Aug 21
2
Vectorization in LLVM x86 backend
Hi all,
Recently I compiled the attached .c file using Clang with "-mavx2 -mfma
-m32 -O3" optimization flags.
First I used -emit-llvm and inspected the LLVM IR and there are no vector
instructions. Then I got the assembly output of the file in it I can
clearly see vector instructions in it.
Neither the SLPVectorizer or the LoopVectorizer is however doing any
vectorization (also
2016 Oct 04
2
Getting the symbolic expression for an address calculation
How do you generate a SCEVAddRecExpr from a SCEV? It tried dyn_casting and
it seems like that the SCEV returned by getSCEV is not a SCEVAddRecExpr.
Thanks
On Fri, Sep 30, 2016 at 4:16 PM, Friedman, Eli <efriedma at codeaurora.org>
wrote:
> On 9/30/2016 12:16 PM, Charith Mendis via llvm-dev wrote:
>
>>
>> Hi all,
>>
>> What is the best way to get the symbolic
2016 Sep 30
2
Getting the symbolic expression for an address calculation
Hi all,
What is the best way to get the symbolic expression for an address
calculation in llvm specially when memory addresses are calculated within
a loop.
Use case: I want to know what loop induction variables are used for a
particular address calculation and in what symbolic context. Thereby, I
want to identify which stores and loads will be contiguous in memory if I
unroll each of the
2014 Sep 30
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about
that. I'll fix those.
On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com
> wrote:
> Hi Chandler,
>
> Here is another test.
>
> When looking at the AVX codegen, I noticed that, when using the new
> shuffle lowering, we no longer emit a single vbroadcastss in the case
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> If you don’t want to spend time on this, I’d be happy to create a
> candidate patch for review? I’ve been unclear if you were taking patches
> for your shuffle work prior to it becoming the default.
While I'm happy to work on it, I'm even more happy to have patches. =D
-------------- next
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
I have tested the new shuffle lowering on a AMD Jaguar cpu (which is
AVX but not AVX2).
On this particular target, there is a delay when output data from an
execution unit is used as input to another execution unit of a
different cluster. For example, There are 6 executions units which are
divided into 3 execution clusters of Float(FPM,FPA), Vector Integer
(MMXA,MMXB,IMM), and Store
2015 Jan 04
2
[LLVMdev] Heads up! Planning to remove old vector shuffle lowering this week...
On Sun, Jan 4, 2015 at 3:20 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> On 24 Nov 2014, at 17:53, Chandler Carruth <chandlerc at gmail.com> wrote:
>
> > I'll be skimming the PRs to see if there are any really critical
> regressions, but so far it looks pretty good.
> >
> > If you are actively disabling the new vector shuffling and have some PR
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
After some adding some serious ninja-ry to the new shuffle lowering...
On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com>
wrote:
> 2. none_useless_shuflle none
> Instead of using a single move to materialize a zero extended constant
> into a vector register, we explicitly zeroed a vector register and use a
> shuffle.
>
... this test case is fixed,
2015 Jan 25
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I ran the benchmarking subset of test-suite on a btver2 machine and
optimizing for btver2 (so enabling AVX codegen).
I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.
On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com
> wrote:
> Hi Chandler,
>
> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
Hi,
The current definition of shuffle vector is
<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x
i32> <mask> ; yields <n x <ty>>
The first two operands of a 'shufflevector' instruction are vectors
with types that match each other and types that match the result of
the instruction. The third
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
On Mon, Sep 29, 2008 at 8:11 PM, Mon Ping Wang <wangmp at apple.com> wrote:
> The problem with generating insert and extracts is that we can generate poor
> code
> %tmp16 = extractelement <4 x float> %f4b, i32 0
> %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0
> %tmp18 = extractelement <4 x float> %f4b, i32 1
> %f8c