search for: vpermilp

Displaying 17 results from an estimated 17 matches for "vpermilp".

Did you mean: vpermilps
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of it rings a bell: > Very cool, and thanks for the analysis! > > > Another problem I'm seeing is that in some cases we can't fold memory > anymore: > vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] > vblendps $0x1, %xmm2, %xmm0, %xmm0 > becomes: > vmovaps -0xXX(%rdx), %xmm2 > vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] > vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...some raw observations, in case any of it rings a bell: >>> >> >> Very cool, and thanks for the analysis! >> >> >>> >>> >>> Another problem I'm seeing is that in some cases we can't fold memory >>> anymore: >>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >>> vblendps $0x1, %xmm2, %xmm0, %xmm0 >>> becomes: >>> vmovaps -0xXX(%rdx), %xmm2 >>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] >>> vshufps...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...ng them, but for now, here are >> some raw observations, in case any of it rings a bell: >> > > Very cool, and thanks for the analysis! > > >> >> >> Another problem I'm seeing is that in some cases we can't fold memory >> anymore: >> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >> vblendps $0x1, %xmm2, %xmm0, %xmm0 >> becomes: >> vmovaps -0xXX(%rdx), %xmm2 >> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] >> vshufps $-0x68, %xmm0, %x...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...rings a bell: >>>> >>> >>> Very cool, and thanks for the analysis! >>> >>> >>>> >>>> >>>> Another problem I'm seeing is that in some cases we can't fold memory >>>> anymore: >>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2] >>>> vblendps $0x1, %xmm2, %xmm0, %xmm0 >>>> becomes: >>>> vmovaps -0xXX(%rdx), %xmm2 >>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = >>>> xmm2[3,0],xmm0[...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sat, Sep 20, 2014 at 7:12 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > Hi Andrea / Chandler / Quentin, > > If AVX is available I would expect the vpermilps/vpermilpd instruction to > be used for all float/double single vector shuffles, especially as it can > deal with the folded load case as well - this would avoid the integer/float > execution domain transfer issue with using vpshufd. > Yes, this is the obvious solution to folding memor...
2014 Sep 30
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ng the new > shuffle lowering, we no longer emit a single vbroadcastss in the case > where the shuffle performs a splat of a scalar float loaded from > memory. > > For example: > (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering) > vmovss (%rdi), %xmm0 > vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] > > Instead of: > (with -mcpu=corei7-avx) > vbroadcastss (%rdi), %xmm0 > > I have attached a small reproducible for it. > > Basically, the old shuffle lowering logic calls function > 'NormalizeVectorShuffle' to handle shu...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > If you don’t want to spend time on this, I’d be happy to create a > candidate patch for review? I’ve been unclear if you were taking patches > for your shuffle work prior to it becoming the default. While I'm happy to work on it, I'm even more happy to have patches. =D -------------- next
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
...imilar behavior with a slightly different code (using AVX): pushl %ebp .Ltmp5: .cfi_def_cfa_offset 8 .Ltmp6: .cfi_offset %ebp, -8 movl %esp, %ebp .Ltmp7: .cfi_def_cfa_register %ebp movl 12(%ebp), %eax .loc 1 9 0 prologue_end # shufxbug.cl:9:0 .Ltmp8: vpermilps $65, 304(%eax), %xmm0 # xmm0 = mem[1,0,0,1] vxorps %xmm1, %xmm1, %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 movl 16(%ebp), %eax .loc 1 10 0 # shufxbug.cl:10:0 vmovups %ymm0, 608(%eax) .loc 1 11 0 # shufxbug.cl:11:0 popl %e...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would avoid the > integer/float execution domain transfer issue with using vpshufd. > > > > Yes, this is the obvious solution to...
2020 Aug 31
2
Vectorization of math function failed?
...00 00 00 00 callq e <_Z4fct1Dv4_f+0xe> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp) 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0 19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e> 1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp) 24: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0 2b: e8 00 00 00 00 callq 30 <_Z4fct1Dv4_f+0x30> 30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp) 36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0 3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42> 42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp...
2013 Feb 19
0
[LLVMdev] Is it a bug or am I missing something ?
<<<<<<<<<<<<<<<<<<<<<<<<<< ; ModuleID = 'shufxbug.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:6 4-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" define void @sample_test(<4 x float>* nocapture
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store
2015 Jan 25
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I ran the benchmarking subset of test-suite on a btver2 machine and optimizing for btver2 (so enabling AVX codegen). I don't see anything outside of the noise with x86-experimental-vector-shuffle-legality=1. On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth
2018 Aug 06
2
[PATCH] D50328: [X86][SSE] Combine (some) target shuffles with multiple uses
[NOTE: Removed Phab and reviewers] > ================ > Comment at: test/CodeGen/X86/2012-01-12-extract-sv.ll:12 > +; CHECK-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3] > +; CHECK-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0] > ; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 > ---------------- > greened wrote: >> Can we make this test less brittle by using FileCheck variables? >> This goes for pretty much every test in this patch. > I'm sorry but no - its...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
After some adding some serious ninja-ry to the new shuffle lowering... On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com> wrote: > 2. none_useless_shuflle none > Instead of using a single move to materialize a zero extended constant > into a vector register, we explicitly zeroed a vector register and use a > shuffle. > ... this test case is fixed,
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly