thr3ads.net - search: "vmovss"

2020 Aug 31

2

Vectorization of math function failed?

...66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) da: 00 00 00 dd: 0f 1f 00 nopl (%rax) 00000000000000e0 <_Z4fct3Pf>: e0: 53 push %rbx e1: 48 83 ec 10 sub $0x10,%rsp e5: 48 89 fb mov %rdi,%rbx e8: c5 fa 10 07 vmovss (%rdi),%xmm0 ec: c5 fa 10 4f 04 vmovss 0x4(%rdi),%xmm1 f1: c5 fa 11 4c 24 0c vmovss %xmm1,0xc(%rsp) f7: e8 00 00 00 00 callq fc <_Z4fct3Pf+0x1c> fc: c5 fa 11 03 vmovss %xmm0,(%rbx) 100: c5 fa 10 44 24 0c vmovss 0xc(%rsp),%xmm0 106: e8 00 00 00 00...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

4

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...ructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled. For example: "vmovss %xmm0, 32(%rsp,%rax,4)", has the following 2 possible encodings: EVEX encoding (8 bytes long): 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4) VEX encoding (6 bytes long): c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4) See report...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 30

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...gen, I noticed that, when using the new > shuffle lowering, we no longer emit a single vbroadcastss in the case > where the shuffle performs a splat of a scalar float loaded from > memory. > > For example: > (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering) > vmovss (%rdi), %xmm0 > vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] > > Instead of: > (with -mcpu=corei7-avx) > vbroadcastss (%rdi), %xmm0 > > I have attached a small reproducible for it. > > Basically, the old shuffle lowering logic calls function > 'NormalizeV...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...wer registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either > the EVEX or the VEX format. For such cases, using the VEX encoding results > in a code size reduction of ~2 bytes even though it is compiled with the > AVX512F/AVX512VL features enabled. > > > > For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 > possible encodings: > > > > EVEX encoding (8 bytes long): > > 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4) > > > > VEX encoding (6 bytes long): > > c5 fa 11 44 84 20...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 23

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > If you don’t want to spend time on this, I’d be happy to create a > candidate patch for review? I’ve been unclear if you were taking patches > for your shuffle work prior to it becoming the default. While I'm happy to work on it, I'm even more happy to have patches. =D -------------- next

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

3

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled. For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings: EVEX encoding (8 bytes long): 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4) VEX encoding (6 bytes long): c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4) See reported Bu...

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

2

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled. For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings: EVEX encoding (8 bytes long): 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4) VEX encoding (6 bytes long): c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4) See reported Bu...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

13

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...; @baz(<4 x float> %A, <4 x float> %B) { >> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4, >> i32 1, i32 2, i32 3> >> ret <4 x float> %1 >> } >> ;;; >> >> llc (-mcpu=corei7-avx): >> vmovss %xmm1, %xmm0, %xmm0 >> >> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): >> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3] > > > So, this is hard. I think we should do this in MC after register allocation > because movss is the wors...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

4

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...I haven't observed any significant regression in our internal codebase. In one particular case I observed a slowdown (around 1%); here is what I found when investigating on this slowdown. 1. With the new shuffle lowering, there is one case where we end up producing the following sequence: vmovss .LCPxx(%rip), %xmm1 vxorps %xmm0, %xmm0, %xmm0 vblendps $1, %xmm1, %xmm0, %xmm0 Before, we used to generate a simpler: vmovss .LCPxx(%rip), %xmm1 In this particular case, the 'vblendps' is redundant since the vmovss would zero the upper bits in %xmm1. I am not sure why we get thi...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...0x183(%rip),%xmm1,%xmm1 # 4006a0 <__dso_handle+0x28> 40051d: vpsubd %xmm1,%xmm0,%xmm0 400521: vmovq %xmm0,%rax 400526: movslq %eax,%rcx 400529: sar $0x20,%rax 40052d: vpextrq $0x1,%xmm0,%rdx 400533: movslq %edx,%rsi 400536: sar $0x20,%rdx 40053a: vmovss 0x4006c0(,%rcx,4),%xmm0 400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0 40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0 400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0 <__dso_handle+0x38> 40056c: vmov...

[LLVMdev] Test failures with 3.4.1

2014 Apr 10

3

[LLVMdev] Test failures with 3.4.1

...re/dev/debian/pkg-llvm/llvm-toolchain/branches/llvm-toolchain-3.4-3.4+205824/test/CodeGen/X86/fp-fast.ll:65:10: error: expected string not found in input ; CHECK: xorps ^ <stdin>:78:2: note: scanning from here .align 4, 0x90 ^ <stdin>:83:3: note: possible intended match here vmovss LCPI5_0(%rip), %xmm1 ^ /home/sylvestre/dev/debian/pkg-llvm/llvm-toolchain/branches/llvm-toolchain-3.4-3.4+205824/test/CodeGen/X86/fp-fast.ll:77:10: error: expected string not found in input ; CHECK: xorps ^ <stdin>:95:2: note: scanning from here .align 4, 0x90 ^ <stdin>:100...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

5

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...ertps, while a movss would do a better job. ;;; define <4 x float> @baz(<4 x float> %A, <4 x float> %B) { %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4, i32 1, i32 2, i32 3> ret <4 x float> %1 } ;;; llc (-mcpu=corei7-avx): vmovss %xmm1, %xmm0, %xmm0 llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3] I hope this is useful. We would be happy to contribute patches to improve some of the above cases, but we obviously know that this is still a work...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

3

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

..., float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, > i32 6, i32 7> > ret <4 x float> %2 > } > > > llc -march=x86-64 -mattr=+avx test.ll -o - > > test: # @test > vxorps %xmm2, %xmm2, %xmm2 > vmovss %xmm0, %xmm2, %xmm2 > vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $48, %xmm1, %xmm0, %xmm0 # x...

[LLVMdev] Test failures with 3.4.1

2014 Apr 09

2

[LLVMdev] Test failures with 3.4.1

Hello, Trying the 3.4.1 branch, I get following tests failing: LLVM :: CodeGen/X86/2009-06-05-VZextByteShort.ll LLVM :: CodeGen/X86/fma4-intrinsics-x86_64.ll LLVM :: CodeGen/X86/fp-fast.ll LLVM :: CodeGen/X86/vec_shift4.ll LLVM :: CodeGen/X86/vshift-4.ll I am testing on a Debian testing 64b. Does it ring a bell? Sylvestre

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...gt;> i32 6, i32 7> >>> ret <4 x float> %2 >>> } >>> >>> >>> llc -march=x86-64 -mattr=+avx test.ll -o - >>> >>> test: # @test >>> vxorps %xmm2, %xmm2, %xmm2 >>> vmovss %xmm0, %xmm2, %xmm2 >>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>> retl >>> >>> test2: # @test2 >>&...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...%1, <4 x i32> <i32 4, i32 1, >> i32 6, i32 7> >> ret <4 x float> %2 >> } >> >> >> llc -march=x86-64 -mattr=+avx test.ll -o - >> >> test: # @test >> vxorps %xmm2, %xmm2, %xmm2 >> vmovss %xmm0, %xmm2, %xmm2 >> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >> retl >> >> test2: # @test2 >> vinsertps $48,...

[LLVMdev] broken LLVM-MC?

2013 Dec 13

0

[LLVMdev] broken LLVM-MC?

Well, you’ll probably need to specify which CPU for the instructions to be recognized as valid encodings. -mcpu=knl doesn’t seem sufficient, though, so there’s probably something more going on. Elena, do you know what’s happening here? It’s important that the disassembler work with the new instructions as well as the assembler. I looked but didn’t see any disassembler tests for avx512. -Jim On

[LLVMdev] broken LLVM-MC?

2013 Dec 13

2

[LLVMdev] broken LLVM-MC?

Hi, It seems LLVM-MC is broken with Avx512? $ echo "vinserti32x4 \$1, %xmm21, %zmm5, %zmm17"|./Release+Asserts/bin/llvm-mc -assemble -arch=x86-64 -show-encoding -x86-asm-syntax=att .text vinserti32x4 $1, %xmm21, %zmm5, %zmm17 # encoding: [0x62,0xa3,0x55,0x48,0x38,0xcd,0x01] $ echo "0x62,0xa3,0x55,0x48,0x38,0xcd,0x01" |./Release+Asserts/bin/llvm-mc -disassemble

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...;; > define <4 x float> @baz(<4 x float> %A, <4 x float> %B) { > %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4, > i32 1, i32 2, i32 3> > ret <4 x float> %1 > } > ;;; > > llc (-mcpu=corei7-avx): > vmovss %xmm1, %xmm0, %xmm0 > > llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): > vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3] > > So, this is hard. I think we should do this in MC after register allocation because movss is the worst instruction ever:...

search for: vmovss