Displaying 20 results from an estimated 51 matches for "xmm7".
Did you mean:
xmm0
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
...autocorrelation_asm_ia32_sse_lag_16
cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx
@@ -596,7 +597,7 @@
movss xmm3, xmm2
movss xmm2, xmm0
- ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2
+ ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2
movaps xmm1, xmm0
mulps xmm1, xmm2
addps xmm5, xmm1
@@ -619,6 +620,95 @@
ret
ALIGN 16
+cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
+ ;[ebp + 20] == autoc[]
+ ;[ebp + 1...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
> xmm4[0,1],xmm1[2],xmm4[3]
> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
> xmm6[0,1],xmm13[2],xmm6[3]
> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 =
> xmm7[0,1],xmm0[2],xmm7[3]...
2014 Sep 04
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Greetings all,
As you may have noticed, there is a new vector shuffle lowering path in the
X86 backend. You can try it out with the
'-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm
-x86-experimental-vector-shuffle-lowering' to clang. Please test it out!
There may be some correctness bugs, I'm still fuzz testing it to shake them
out. But I expect fairly few
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...rd ptr [eax+8]
002E00F8 movapd xmmword ptr [esp+70h],xmm0
002E00FE movddup xmm0,mmword ptr [eax]
002E0102 movapd xmmword ptr [esp+60h],xmm0
002E0108 xorpd xmm0,xmm0
002E010C movapd xmmword ptr [esp+0C0h],xmm0
002E0115 xorpd xmm1,xmm1
002E0119 xorpd xmm7,xmm7
002E011D movapd xmmword ptr [esp+0A0h],xmm1
002E0126 movapd xmmword ptr [esp+0B0h],xmm7
002E012F movapd xmm3,xmm1
002E0133 movlpd qword ptr [esp+0F0h],xmm3
002E013C movhpd qword ptr [esp+0E0h],xmm3
002E0145 movlpd qword ptr [esp+100h],xmm7
002E014E p...
2017 Aug 04
2
Bug or incorrect use of inline asm?
...source_filename = "asanasm.d"
target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@globVar = global [2 x i32] [i32 66051, i32 66051]
define void @_D7asanasm8offconstFZv() {
call void asm sideeffect "movdqa 4$0, %xmm7", "*m,~{xmm7}"([2 x i32]*
@globVar)
ret void
}
```
results in:
<inline asm>:1:10: error: unexpected token in argument list
movdqa 4globVar(%rip), %xmm7
So in that case, I do have to add the '+' to make it work ("4+$0").
So depending on the pointer...
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...es more instructions and spills more
> registers compared to 3.1.
>
Actually, here is an occurrence of that behavior when compiling the
code with trunk:
[...]
movaps %xmm1, %xmm0 ### xmm1 mov'ed to xmm0
movaps %xmm1, %xmm14 ### xmm1 mov'ed to xmm14
addps %xmm7, %xmm0
movaps %xmm7, %xmm13
movaps %xmm0, %xmm1 ### and now other data is mov'ed into xmm1,
making one of the above movaps superfluous
[...]
amb
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>> xmm4[0,1],xmm1[2],xmm4[3]
>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>>> xmm6[0,1],xmm13[2],xmm6[3]
>>> vinsertps $416, %xmm0, %xmm7, %xmm0 #...
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
...l p2 = .. expressions and so
forth.
p1 = p1 * a
p1 = p1 * a
.
.
p2 = p2 * b
p2 = p2 * b
.
.
p3 = p3 * c
p3 = p3 * c
.
.
An actual excerpt of the generated x86 assembly follows:
mulss %xmm8, %xmm10
mulss %xmm8, %xmm10
.
. repeated 512 times
.
mulss %xmm7, %xmm9
mulss %xmm7, %xmm9
.
. repeated 512 times
.
mulss %xmm6, %xmm3
mulss %xmm6, %xmm3
.
. repeated 512 times
.
Since p1, p2, p3, and p4 are all independent, this reordering is correct. This would have
the possible advantage of reducing live ranges of va...
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...has seen the
>> new path generating illegal insertps masks. A sample here:
>>
>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0,1],xmm1[2],xmm4[3]
>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>> xmm6[0,1],xmm13[2],xmm6[3]
>> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 =
>> xmm7...
2011 Nov 30
0
[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves
...ops);
+ if ( (rc != X86EMUL_OKAY) || memcmp(res, res + 8, 32) )
+ goto fail;
+ printf("okay\n");
+ }
+ else
+ {
+ printf("skipped\n");
+ memset(res + 2, 0x66, 8);
+ }
+
+ printf("%-40s", "Testing movaps (%edx),%xmm7...");
+ if ( stack_exec && cpu_has_sse )
+ {
+ extern const unsigned char movaps_from_mem[];
+
+ asm volatile ( "xorps %%xmm7, %%xmm7\n"
+ ".pushsection .test, \"a\", @progbits\n"
+ "mova...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...i64> %sextS54_D to i128
%mskS54_D = icmp ne i128 %BCS54_D, 0
br i1 %mskS54_D, label %middle.block, label %vector.ph
Now the assembly for the above IR code is:
# BB#4: # %for.cond.preheader
vmovdqa 144(%rsp), %xmm0 # 16-byte Reload
vpmuludq %xmm7, %xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm2, %rax...
2010 Aug 02
0
[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!
...t:
CC libavcodec/x86/mpegvideo_mmx.o
fatal error: error in backend: Ran out of registers during register
allocation!
Please check your inline asm statement for invalid constraints:
INLINEASM <es:movd %eax, %xmm3
pshuflw $$0, %xmm3, %xmm3
punpcklwd %xmm3, %xmm3
pxor %xmm7, %xmm7
pxor %xmm4, %xmm4
movdqa ($2), %xmm5
pxor %xmm6, %xmm6
psubw ($3), %xmm6
mov $$-128, %eax
.align 1 << 4
1:
movdqa ($1, %eax), %xmm0
movdqa %xmm0, %xmm1
pabsw %xmm0, %xmm0
psubusw %xmm6, %xmm0...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...insertps masks. A sample here:
>>>>>
>>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>>>> xmm4[0,1],xmm1[2],xmm4[3]
>>>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>>>>> xmm6[0,1],xmm13[2],xmm6[3]
>>>>> v...
2008 Jul 14
5
[LLVMdev] Spilled variables using unaligned moves
...ems like an optimization opportunity.
The attached replacement of fibonacci.cpp generates x86 code like this:
03A70010 push ebp
03A70011 mov ebp,esp
03A70013 and esp,0FFFFFFF0h
03A70019 sub esp,1A0h
...
03A7006C movups xmmword ptr [esp+180h],xmm7
...
03A70229 mulps xmm1,xmmword ptr [esp+180h]
...
03A70682 movups xmm0,xmmword ptr [esp+180h]
Note how stores and loads use unaligned moves while it could use aligned
moves. It's also interesting that the multiply does correctly assume the
stack to be 16-byte aligned....
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
....scl 2;
.type 32;
.endef
.text
.globl test
.align 16, 0x90
test: # @test
# BB#0: # %entry
pushq %rbp
movq %rsp, %rbp
subq $64, %rsp
vmovaps %xmm7, -32(%rbp) # 16-byte Spill
vmovaps %xmm6, -16(%rbp) # 16-byte Spill
vmovaps %ymm3, %ymm6
vmovaps %ymm2, %ymm7
vaddps %ymm7, %ymm0, %ymm0
vaddps %ymm6, %ymm1, %ymm1
callq foo
vsubps %ymm7, %ymm0, %ymm0
vsubps %ymm6,...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...[eax+36]
+ movss xmm5, [eax+40]
+ mulss xmm4, xmm1
+ mulss xmm5, xmm1
+ movaps xmm6, [ebx+4]
+ subps xmm6, xmm2
+ movups [ebx], xmm6
+ movaps xmm7, [ebx+20]
+ subps xmm7, xmm3
+ movups [ebx+16], xmm7
+
+ movss xmm7, [ebx+36]
+ subss xmm7, xmm4
+ movss [ebx+32], xmm7
+ xorps xmm2, xmm2
+...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>> xmm4[0,1],xmm1[2],xmm4[3]
>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>>> xmm6[0,1],xmm13[2],xmm6[3]
>>> vinsertps $416, %xmm0, %xmm7, %xmm0 #...