Displaying 20 results from an estimated 22 matches for "vmovss".
Did you mean:
movss
2020 Aug 31
2
Vectorization of math function failed?
...66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
da: 00 00 00
dd: 0f 1f 00 nopl (%rax)
00000000000000e0 <_Z4fct3Pf>:
e0: 53 push %rbx
e1: 48 83 ec 10 sub $0x10,%rsp
e5: 48 89 fb mov %rdi,%rbx
e8: c5 fa 10 07 vmovss (%rdi),%xmm0
ec: c5 fa 10 4f 04 vmovss 0x4(%rdi),%xmm1
f1: c5 fa 11 4c 24 0c vmovss %xmm1,0xc(%rsp)
f7: e8 00 00 00 00 callq fc <_Z4fct3Pf+0x1c>
fc: c5 fa 11 03 vmovss %xmm0,(%rbx)
100: c5 fa 10 44 24 0c vmovss 0xc(%rsp),%xmm0
106: e8 00 00 00 00...
2016 Nov 23
4
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
...ructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled.
For example: "vmovss %xmm0, 32(%rsp,%rax,4)", has the following 2 possible encodings:
EVEX encoding (8 bytes long):
62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4)
VEX encoding (6 bytes long):
c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4)
See report...
2014 Sep 30
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...gen, I noticed that, when using the new
> shuffle lowering, we no longer emit a single vbroadcastss in the case
> where the shuffle performs a splat of a scalar float loaded from
> memory.
>
> For example:
> (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering)
> vmovss (%rdi), %xmm0
> vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
>
> Instead of:
> (with -mcpu=corei7-avx)
> vbroadcastss (%rdi), %xmm0
>
> I have attached a small reproducible for it.
>
> Basically, the old shuffle lowering logic calls function
> 'NormalizeV...
2016 Nov 23
2
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
...wer registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either
> the EVEX or the VEX format. For such cases, using the VEX encoding results
> in a code size reduction of ~2 bytes even though it is compiled with the
> AVX512F/AVX512VL features enabled.
>
>
>
> For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2
> possible encodings:
>
>
>
> EVEX encoding (8 bytes long):
>
> 62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4)
>
>
>
> VEX encoding (6 bytes long):
>
> c5 fa 11 44 84 20...
2014 Sep 23
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> If you don’t want to spend time on this, I’d be happy to create a
> candidate patch for review? I’ve been unclear if you were taking patches
> for your shuffle work prior to it becoming the default.
While I'm happy to work on it, I'm even more happy to have patches. =D
-------------- next
2016 Nov 24
3
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
...instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled.
For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings:
EVEX encoding (8 bytes long):
62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4)
VEX encoding (6 bytes long):
c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4)
See reported Bu...
2016 Nov 28
2
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
...instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled.
For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings:
EVEX encoding (8 bytes long):
62 f1 7e 08 11 44 84 08 vmovss %xmm0, 32(%rsp,%rax,4)
VEX encoding (6 bytes long):
c5 fa 11 44 84 20 vmovss %xmm0, 32(%rsp,%rax,4)
See reported Bu...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...; @baz(<4 x float> %A, <4 x float> %B) {
>> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,
>> i32 1, i32 2, i32 3>
>> ret <4 x float> %1
>> }
>> ;;;
>>
>> llc (-mcpu=corei7-avx):
>> vmovss %xmm1, %xmm0, %xmm0
>>
>> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]
>
>
> So, this is hard. I think we should do this in MC after register allocation
> because movss is the wors...
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...I haven't observed any significant regression in our internal codebase.
In one particular case I observed a slowdown (around 1%); here is what
I found when investigating on this slowdown.
1. With the new shuffle lowering, there is one case where we end up
producing the following sequence:
vmovss .LCPxx(%rip), %xmm1
vxorps %xmm0, %xmm0, %xmm0
vblendps $1, %xmm1, %xmm0, %xmm0
Before, we used to generate a simpler:
vmovss .LCPxx(%rip), %xmm1
In this particular case, the 'vblendps' is redundant since the vmovss
would zero the upper bits in %xmm1. I am not sure why we get thi...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...0x183(%rip),%xmm1,%xmm1 # 4006a0
<__dso_handle+0x28>
40051d: vpsubd %xmm1,%xmm0,%xmm0
400521: vmovq %xmm0,%rax
400526: movslq %eax,%rcx
400529: sar $0x20,%rax
40052d: vpextrq $0x1,%xmm0,%rdx
400533: movslq %edx,%rsi
400536: sar $0x20,%rdx
40053a: vmovss 0x4006c0(,%rcx,4),%xmm0
400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0
40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0
400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0
<__dso_handle+0x38>
40056c: vmov...
2014 Apr 10
3
[LLVMdev] Test failures with 3.4.1
...re/dev/debian/pkg-llvm/llvm-toolchain/branches/llvm-toolchain-3.4-3.4+205824/test/CodeGen/X86/fp-fast.ll:65:10:
error: expected string not found in input
; CHECK: xorps
^
<stdin>:78:2: note: scanning from here
.align 4, 0x90
^
<stdin>:83:3: note: possible intended match here
vmovss LCPI5_0(%rip), %xmm1
^
/home/sylvestre/dev/debian/pkg-llvm/llvm-toolchain/branches/llvm-toolchain-3.4-3.4+205824/test/CodeGen/X86/fp-fast.ll:77:10:
error: expected string not found in input
; CHECK: xorps
^
<stdin>:95:2: note: scanning from here
.align 4, 0x90
^
<stdin>:100...
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ertps, while a movss would do a better job.
;;;
define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {
%1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,
i32 1, i32 2, i32 3>
ret <4 x float> %1
}
;;;
llc (-mcpu=corei7-avx):
vmovss %xmm1, %xmm0, %xmm0
llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]
I hope this is useful. We would be happy to contribute patches to
improve some of the above cases, but we obviously know that this is
still a work...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
..., float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,
> i32 6, i32 7>
> ret <4 x float> %2
> }
>
>
> llc -march=x86-64 -mattr=+avx test.ll -o -
>
> test: # @test
> vxorps %xmm2, %xmm2, %xmm2
> vmovss %xmm0, %xmm2, %xmm2
> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> retl
>
> test2: # @test2
> vinsertps $48, %xmm1, %xmm0, %xmm0 # x...
2014 Apr 09
2
[LLVMdev] Test failures with 3.4.1
Hello,
Trying the 3.4.1 branch, I get following tests failing:
LLVM :: CodeGen/X86/2009-06-05-VZextByteShort.ll
LLVM :: CodeGen/X86/fma4-intrinsics-x86_64.ll
LLVM :: CodeGen/X86/fp-fast.ll
LLVM :: CodeGen/X86/vec_shift4.ll
LLVM :: CodeGen/X86/vshift-4.ll
I am testing on a Debian testing 64b.
Does it ring a bell?
Sylvestre
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...gt;> i32 6, i32 7>
>>> ret <4 x float> %2
>>> }
>>>
>>>
>>> llc -march=x86-64 -mattr=+avx test.ll -o -
>>>
>>> test: # @test
>>> vxorps %xmm2, %xmm2, %xmm2
>>> vmovss %xmm0, %xmm2, %xmm2
>>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>> retl
>>>
>>> test2: # @test2
>>&...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...%1, <4 x i32> <i32 4, i32 1,
>> i32 6, i32 7>
>> ret <4 x float> %2
>> }
>>
>>
>> llc -march=x86-64 -mattr=+avx test.ll -o -
>>
>> test: # @test
>> vxorps %xmm2, %xmm2, %xmm2
>> vmovss %xmm0, %xmm2, %xmm2
>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>> retl
>>
>> test2: # @test2
>> vinsertps $48,...
2013 Dec 13
0
[LLVMdev] broken LLVM-MC?
Well, you’ll probably need to specify which CPU for the instructions to be recognized as valid encodings. -mcpu=knl doesn’t seem sufficient, though, so there’s probably something more going on.
Elena, do you know what’s happening here? It’s important that the disassembler work with the new instructions as well as the assembler. I looked but didn’t see any disassembler tests for avx512.
-Jim
On
2013 Dec 13
2
[LLVMdev] broken LLVM-MC?
Hi,
It seems LLVM-MC is broken with Avx512?
$ echo "vinserti32x4 \$1, %xmm21, %zmm5,
%zmm17"|./Release+Asserts/bin/llvm-mc -assemble -arch=x86-64 -show-encoding
-x86-asm-syntax=att
.text
vinserti32x4 $1, %xmm21, %zmm5, %zmm17 # encoding:
[0x62,0xa3,0x55,0x48,0x38,0xcd,0x01]
$ echo "0x62,0xa3,0x55,0x48,0x38,0xcd,0x01" |./Release+Asserts/bin/llvm-mc
-disassemble
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;;
> define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {
> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,
> i32 1, i32 2, i32 3>
> ret <4 x float> %1
> }
> ;;;
>
> llc (-mcpu=corei7-avx):
> vmovss %xmm1, %xmm0, %xmm0
>
> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]
>
> So, this is hard. I think we should do this in MC after register allocation because movss is the worst instruction ever:...