Displaying 20 results from an estimated 471 matches for "xmm0".
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
> xmm4[0,1],xmm1[2],xmm4[3]
>...
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...s, in case any of it rings a bell:
>
Very cool, and thanks for the analysis!
>
>
> Another problem I'm seeing is that in some cases we can't fold memory
> anymore:
> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
> vblendps $0x1, %xmm2, %xmm0, %xmm0
> becomes:
> vmovaps -0xXX(%rdx), %xmm2
> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2]
>
>
> Also, I see differences when some loads are shuffled, that I&...
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...The generated instruction varies, but
>>>>>> it seems to often be similar to (I
>>>>>> don't have it in front of me, sorry):
>>>>>> movapd xmm0, xmm[ecx+0x???????]
>>>>>> Where the xmm register changes, and
>>>>>> the second parameter is a memory access.
>>>>>> ECX is always set to 0x7ffffff -
>>&g...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...sed.
Output from clang 3.4 for target corei7-avx:
$ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math
-march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */
$ objdump -dC --no-show-raw-insn ./a.out
...
00000000004004f0 <main>:
4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x>
4004f8: vpsrld $0x17,%xmm0,%xmm0
4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680
<__dso_handle+0x8>
400505: vcvtdq2ps %xmm0,%xmm1
400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690
<__dso_handle+0x18>
400511: vcvttps2dq %xmm1,%xmm...
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...14 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>>
>>> Unfortunately, another team, while doing internal testing has seen the
>>> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>> xmm4[...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...or the analysis!
>>
>>
>>>
>>>
>>> Another problem I'm seeing is that in some cases we can't fold memory
>>> anymore:
>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>> becomes:
>>> vmovaps -0xXX(%rdx), %xmm2
>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>>> xmm3[0,2],xmm0[1,2]
>>>
>>>
>&...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...gt;
> Very cool, and thanks for the analysis!
>
>
>>
>>
>> Another problem I'm seeing is that in some cases we can't fold memory
>> anymore:
>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>> becomes:
>> vmovaps -0xXX(%rdx), %xmm2
>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]
>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>> xmm3[0,2],xmm0[1,2]
>>
>>
>> Also, I see differences...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...t;>>
>>>>
>>>>
>>>> Another problem I'm seeing is that in some cases we can't fold memory
>>>> anymore:
>>>> vpermilps $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>> vblendps $0x1, %xmm2, %xmm0, %xmm0
>>>> becomes:
>>>> vmovaps -0xXX(%rdx), %xmm2
>>>> vshufps $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 =
>>>> xmm2[3,0],xmm0[0,0]
>>>> vshufps $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 =
>>>> xmm3[0,2],xmm0...
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...tructions.
Example:
;;;
define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {
%1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,
i32 5, i32 2, i32 7>
ret <4 x float> %1
}
;;;
llc (-mcpu=corei7-avx):
vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]
llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]
vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]
2) On SSE4.1, we should try not to emit an insertps if the shuf...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
>> wrote:
>>
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks. A sample here:
>>
>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0,1],xmm1[2],xmm...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...il.com>>
>>>> wrote:
>>>>>
>>>>> Unfortunately, another team, while doing internal testing has seen the
>>>>> new path generating illegal insertps masks. A sample here:
>>>>>
>>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xm...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
...>>>>>>> fault raising mechanism appears.
>>>>>>>
>>>>>>> The generated instruction varies, but it seems to often be similar
>>>>>>> to (I don't have it in front of me, sorry):
>>>>>>> movapd xmm0, xmm[ecx+0x???????]
>>>>>>> Where the xmm register changes, and the second parameter is a memory
>>>>>>> access.
>>>>>>> ECX is always set to 0x7ffffff - however I don't know if this is
>>>>>>> part of the SSE e...
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...t regression in our internal codebase.
In one particular case I observed a slowdown (around 1%); here is what
I found when investigating on this slowdown.
1. With the new shuffle lowering, there is one case where we end up
producing the following sequence:
vmovss .LCPxx(%rip), %xmm1
vxorps %xmm0, %xmm0, %xmm0
vblendps $1, %xmm1, %xmm0, %xmm0
Before, we used to generate a simpler:
vmovss .LCPxx(%rip), %xmm1
In this particular case, the 'vblendps' is redundant since the vmovss
would zero the upper bits in %xmm1. I am not sure why we get this
poor-codegen with your new shuffle...
2017 Mar 01
2
[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm
...# %merge128
movq 184(%rsp), %rcx
movq %rax, 728(%rcx)
movq 184(%rsp), %rax
movq 728(%rax), %rcx
movq %rcx, 736(%rax)
movq 184(%rsp), %rax
movq $0, 744(%rax)
movq 184(%rsp), %rax
movq $0, 752(%rax)
movq 184(%rsp), %rax
movq $0, 760(%rax)
movq 176(%rsp), %rax
movsd 5608(%rax), %xmm0 # xmm0 = mem[0],zero
movq 184(%rsp), %rax
mulsd 648(%rax), %xmm0
movsd 160(%rsp), %xmm1 # 8-byte Reload
# xmm1 = mem[0],zero
addsd %xmm0, %xmm1
movsd %xmm1, 672(%rax)
movq 176(%rsp), %rax
movsd 5648(%rax), %xmm0 # xmm0 = mem[0],zero
movq 18...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...gt; %A, <4 x float> %B) {
>> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,
>> i32 5, i32 2, i32 7>
>> ret <4 x float> %1
>> }
>> ;;;
>>
>> llc (-mcpu=corei7-avx):
>> vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]
>>
>> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]
>> vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]
>>
>>
>> 2)...
2020 Aug 31
2
Vectorization of math function failed?
...: clang++ -O3 -march=native -mtune=native -c -o
vec.o vec.cc -lmvec -fno-math-errno
And here is what I get:
vec.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z4fct1Dv4_f>:
0: 48 83 ec 48 sub $0x48,%rsp
4: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
9: e8 00 00 00 00 callq e <_Z4fct1Dv4_f+0xe>
e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
19: e8 00 00 00 00 callq 1e <_Z4fct1Dv4_f+0x1e>
1e: c5 f8 29 44 24 20 vmovaps %xmm0,0x20(%rsp)
24: c4 e3...
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
...index, 4, !dbg !6
%5 = icmp eq i64 %index.next, 1000, !dbg !6
br i1 %5, label %middle.block, label %vector.body, !dbg !6, !llvm.loop !15
.LBB0_1: # %vector.body
# =>This Inner Loop Header: Depth=1
movd %ebx, %xmm0
pshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
paddd .LCPI0_0(%rip), %xmm0
cvtdq2ps %xmm0, %xmm0
movaps %xmm0, 16(%rsp) # 16-byte Spill
shufps $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
callq sinf
movaps %xmm0...
2015 Oct 02
2
Register Spill Caused by the Reassociation pass
This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
...c_compute_autocorrelation_asm_ia32_sse_lag_16
cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx
@@ -596,7 +597,7 @@
movss xmm3, xmm2
movss xmm2, xmm0
- ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2
+ ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2
movaps xmm1, xmm0
mulps xmm1, xmm2
addps xmm5, xmm1
@@ -619,6 +620,95 @@
ret
ALIGN 16
+cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
+ ;[ebp + 20] == autoc[]
+...