Displaying 17 results from an estimated 17 matches for "vpermilp".
Did you mean:
vpermilps
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...As for the others, I'm working on reducing them, but for now, here are
> some raw observations, in case any of it rings a bell:
>
Very cool, and thanks for the analysis!
>
>
> Another problem I'm seeing is that in some cases we can't fold memory
> anymore:
> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
> vblendps $0x1, %xmm2, %xmm0, %xmm0
> becomes:
> vmovaps -0xXX(%rdx), %xmm2
> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...some raw observations, in case any of it rings a bell:
>>>
>>
>> Very cool, and thanks for the analysis!
>>
>>
>>>
>>>
>>> Another problem I'm seeing is that in some cases we can't fold memory
>>> anymore:
>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>> becomes:
>>> vmovaps -0xXX(%rdx), %xmm2
>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>>> vshufps...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...ng them, but for now, here are
>> some raw observations, in case any of it rings a bell:
>>
>
> Very cool, and thanks for the analysis!
>
>
>>
>>
>> Another problem I'm seeing is that in some cases we can't fold memory
>> anymore:
>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>> becomes:
>> vmovaps -0xXX(%rdx), %xmm2
>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>> vshufps $-0x68, %xmm0, %x...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...rings a bell:
>>>>
>>>
>>> Very cool, and thanks for the analysis!
>>>
>>>
>>>>
>>>>
>>>> Another problem I'm seeing is that in some cases we can't fold memory
>>>> anymore:
>>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>>> becomes:
>>>> vmovaps -0xXX(%rdx), %xmm2
>>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 =
>>>> xmm2[3,0],xmm0[...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sat, Sep 20, 2014 at 7:12 AM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> Hi Andrea / Chandler / Quentin,
>
> If AVX is available I would expect the vpermilps/vpermilpd instruction to
> be used for all float/double single vector shuffles, especially as it can
> deal with the folded load case as well - this would avoid the integer/float
> execution domain transfer issue with using vpshufd.
>
Yes, this is the obvious solution to folding memor...
2014 Sep 30
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ng the new
> shuffle lowering, we no longer emit a single vbroadcastss in the case
> where the shuffle performs a splat of a scalar float loaded from
> memory.
>
> For example:
> (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering)
> vmovss (%rdi), %xmm0
> vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
>
> Instead of:
> (with -mcpu=corei7-avx)
> vbroadcastss (%rdi), %xmm0
>
> I have attached a small reproducible for it.
>
> Basically, the old shuffle lowering logic calls function
> 'NormalizeVectorShuffle' to handle shu...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> If you don’t want to spend time on this, I’d be happy to create a
> candidate patch for review? I’ve been unclear if you were taking patches
> for your shuffle work prior to it becoming the default.
While I'm happy to work on it, I'm even more happy to have patches. =D
-------------- next
2013 Feb 19
2
[LLVMdev] Is it a bug or am I missing something ?
...imilar behavior with a slightly different code (using AVX):
pushl %ebp
.Ltmp5:
.cfi_def_cfa_offset 8
.Ltmp6:
.cfi_offset %ebp, -8
movl %esp, %ebp
.Ltmp7:
.cfi_def_cfa_register %ebp
movl 12(%ebp), %eax
.loc 1 9 0 prologue_end # shufxbug.cl:9:0
.Ltmp8:
vpermilps $65, 304(%eax), %xmm0 # xmm0 = mem[1,0,0,1]
vxorps %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
movl 16(%ebp), %eax
.loc 1 10 0 # shufxbug.cl:10:0
vmovups %ymm0, 608(%eax)
.loc 1 11 0 # shufxbug.cl:11:0
popl %e...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote:
>
> > If AVX is available I would expect the vpermilps/vpermilpd instruction
> to be used for all float/double single vector shuffles, especially as it
> can deal with the folded load case as well - this would avoid the
> integer/float execution domain transfer issue with using vpshufd.
> >
> > Yes, this is the obvious solution to...
2020 Aug 31
2
Vectorization of math function failed?
...00 00 00 00 callq e <_Z4fct1Dv4_f+0xe>
e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e>
1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp)
24: c4 e3 79 05 04 24 01 vpermilpd $0x1,(%rsp),%xmm0
2b: e8 00 00 00 00 callq 30 <_Z4fct1Dv4_f+0x30>
30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp)
36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0
3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42>
42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp...
2013 Feb 19
0
[LLVMdev] Is it a bug or am I missing something ?
<<<<<<<<<<<<<<<<<<<<<<<<<<
; ModuleID = 'shufxbug.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:6
4-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i386-pc-linux-gnu"
define void @sample_test(<4 x float>* nocapture
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
I have tested the new shuffle lowering on a AMD Jaguar cpu (which is
AVX but not AVX2).
On this particular target, there is a delay when output data from an
execution unit is used as input to another execution unit of a
different cluster. For example, There are 6 executions units which are
divided into 3 execution clusters of Float(FPM,FPA), Vector Integer
(MMXA,MMXB,IMM), and Store
2015 Jan 25
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I ran the benchmarking subset of test-suite on a btver2 machine and
optimizing for btver2 (so enabling AVX codegen).
I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.
On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com
> wrote:
> Hi Chandler,
>
> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth
2018 Aug 06
2
[PATCH] D50328: [X86][SSE] Combine (some) target shuffles with multiple uses
[NOTE: Removed Phab and reviewers]
> ================
> Comment at: test/CodeGen/X86/2012-01-12-extract-sv.ll:12
> +; CHECK-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3]
> +; CHECK-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]
> ; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
> ----------------
> greened wrote:
>> Can we make this test less brittle by using FileCheck variables?
>> This goes for pretty much every test in this patch.
> I'm sorry but no - its...
2014 Sep 20
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
After some adding some serious ninja-ry to the new shuffle lowering...
On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com>
wrote:
> 2. none_useless_shuflle none
> Instead of using a single move to materialize a zero extended constant
> into a vector register, we explicitly zeroed a vector register and use a
> shuffle.
>
... this test case is fixed,
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly