thr3ads.net - search: "vinsertps"

Displaying 17 results from an estimated 17 matches for "vinsertps".

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = > xmm4[0,1],xmm1[2],x...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t; On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >>> >>> Unfortunately, another team, while doing internal testing has seen the >>> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = &gt...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t;> >> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >> >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >> xmm4[0...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...o:rob.lougher at gmail.com>> >>>> wrote: >>>>> >>>>> Unfortunately, another team, while doing internal testing has seen the >>>>> new path generating illegal insertps masks. A sample here: >>>>> >>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>>>> vinsertps $416, %xmm1, %xm...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 04

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Greetings all, As you may have noticed, there is a new vector shuffle lowering path in the X86 backend. You can try it out with the '-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm -x86-experimental-vector-shuffle-lowering' to clang. Please test it out! There may be some correctness bugs, I'm still fuzz testing it to shake them out. But I expect fairly few

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...tor <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4, i32 5, i32 2, i32 7> ret <4 x float> %1 } ;;; llc (-mcpu=corei7-avx): vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] 3) When a shuffle performs an insert at index 0 we always generate an insertps, while a movss would do a better job. ;;; define <4 x float> @baz(<4 x float> %A, <4 x float> %B) { %1 = shufflevector <4 x float>...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...t;__dso_handle+0x28> 40051d: vpsubd %xmm1,%xmm0,%xmm0 400521: vmovq %xmm0,%rax 400526: movslq %eax,%rcx 400529: sar $0x20,%rax 40052d: vpextrq $0x1,%xmm0,%rdx 400533: movslq %edx,%rsi 400536: sar $0x20,%rdx 40053a: vmovss 0x4006c0(,%rcx,4),%xmm0 400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0 40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0 400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0 <__dso_handle+0x38> 40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r...

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

...00 00 callq 30 <_Z4fct1Dv4_f+0x30> 30: c5 f9 29 44 24 10 vmovapd %xmm0,0x10(%rsp) 36: c4 e3 79 04 04 24 e7 vpermilps $0xe7,(%rsp),%xmm0 3d: e8 00 00 00 00 callq 42 <_Z4fct1Dv4_f+0x42> 42: c5 f8 28 4c 24 30 vmovaps 0x30(%rsp),%xmm1 48: c4 e3 71 21 4c 24 20 vinsertps $0x10,0x20(%rsp),%xmm1,%xmm1 4f: 10 50: c4 e3 71 21 4c 24 10 vinsertps $0x20,0x10(%rsp),%xmm1,%xmm1 57: 20 58: c4 e3 71 21 c0 30 vinsertps $0x30,%xmm0,%xmm1,%xmm0 5e: 48 83 c4 48 add $0x48,%rsp 62: c3 retq 63: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...5, i32 2, i32 7> >> ret <4 x float> %1 >> } >> ;;; >> >> llc (-mcpu=corei7-avx): >> vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] >> >> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): >> vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] >> >> >> 3) When a shuffle performs an insert at index 0 we always generate an >> insertps, while a movss would do a better job. >> ;;; >> define <4 x float> @baz(<4 x float> %A, <4 x f...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...%xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0] > vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2] > > > Also, I see differences when some loads are shuffled, that I'm a bit > conflicted about: > vmovaps -0xXX(%rbp), %xmm3 > ... > vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = xmm4[3],xmm3[1,2,3] > becomes: > vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2] > ... > vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3] > > Note that the second version does the shuffle in...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = >>> xmm3[0,2],xmm0[1,2] >>> >>> >>> Also, I see differences when some loads are shuffled, that I'm a bit >>> conflicted about: >>> vmovaps -0xXX(%rbp), %xmm3 >>> ... >>> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = >>> xmm4[3],xmm3[1,2,3] >>> becomes: >>> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2] >>> ... >>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = >>> xmm4[3],xmm2[1,2,3]...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...0] >> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = >> xmm3[0,2],xmm0[1,2] >> >> >> Also, I see differences when some loads are shuffled, that I'm a bit >> conflicted about: >> vmovaps -0xXX(%rbp), %xmm3 >> ... >> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = xmm4[3],xmm3[1,2,3] >> becomes: >> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2] >> ... >> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3] >> >> Note that the second ver...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...0 = >>>> xmm3[0,2],xmm0[1,2] >>>> >>>> >>>> Also, I see differences when some loads are shuffled, that I'm a bit >>>> conflicted about: >>>> vmovaps -0xXX(%rbp), %xmm3 >>>> ... >>>> vinsertps $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = >>>> xmm4[3],xmm3[1,2,3] >>>> becomes: >>>> vpermilps $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2] >>>> ... >>>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = >>>&...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...t;4 x i32> <i32 4, > i32 5, i32 2, i32 7> > ret <4 x float> %1 > } > ;;; > > llc (-mcpu=corei7-avx): > vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] > > llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): > vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3] > > > 3) When a shuffle performs an insert at index 0 we always generate an > insertps, while a movss would do a better job. > ;;; > define <4 x float> @baz(<4 x float> %A, <4 x float> %B) { > %1...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...%B, <4 x i32> <i32 4, >> i32 1, i32 2, i32 3> >> ret <4 x float> %1 >> } >> ;;; >> >> llc (-mcpu=corei7-avx): >> vmovss %xmm1, %xmm0, %xmm0 >> >> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): >> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3] > > > So, this is hard. I think we should do this in MC after register allocation > because movss is the worst instruction ever: it switches from blending with > the destination to zeroing the destination when the source switches f...

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...mm, this is base + 8 * 8 + 4. >> +; STRESS-NEXT: vmovss 68([[BASE]]), [[OUT_Imm:%xmm[0-9]+]] >> +; Add high slice: out[out_start].imm, this is base + 4. >> +; STRESS-NEXT: vaddss 4([[BASE]]), [[OUT_Imm]], [[RES_Imm:%xmm[0-9]+]] >> ; Swap Imm and Real. >> ; STRESS-NEXT: vinsertps $16, [[RES_Imm]], [[RES_Real]], [[RES_Vec:%xmm[0-9]+]] >> ; Put the results back into out[out_start]. >> @@ -32,14 +32,14 @@ >> ; >> ; Same for REGULAR, we eliminate register bank copy with each slices. >> ; REGULAR-LABEL: t1: >> -; Load out[out_start + 8].imm, t...

search for: vinsertps