thr3ads.net - search: "xmm7"

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

2

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...autocorrelation_asm_ia32_sse_lag_16 cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 + ;[ebp + 20] == autoc[] + ;[ebp + 1...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

3

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = > xmm4[0,1],xmm1[2],xmm4[3] > vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = > xmm6[0,1],xmm13[2],xmm6[3] > vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 = > xmm7[0,1],xmm0[2],xmm7[3]...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 04

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Greetings all, As you may have noticed, there is a new vector shuffle lowering path in the X86 backend. You can try it out with the '-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm -x86-experimental-vector-shuffle-lowering' to clang. Please test it out! There may be some correctness bugs, I'm still fuzz testing it to shake them out. But I expect fairly few

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

0

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...rd ptr [eax+8] 002E00F8 movapd xmmword ptr [esp+70h],xmm0 002E00FE movddup xmm0,mmword ptr [eax] 002E0102 movapd xmmword ptr [esp+60h],xmm0 002E0108 xorpd xmm0,xmm0 002E010C movapd xmmword ptr [esp+0C0h],xmm0 002E0115 xorpd xmm1,xmm1 002E0119 xorpd xmm7,xmm7 002E011D movapd xmmword ptr [esp+0A0h],xmm1 002E0126 movapd xmmword ptr [esp+0B0h],xmm7 002E012F movapd xmm3,xmm1 002E0133 movlpd qword ptr [esp+0F0h],xmm3 002E013C movhpd qword ptr [esp+0E0h],xmm3 002E0145 movlpd qword ptr [esp+100h],xmm7 002E014E p...

Bug or incorrect use of inline asm?

2017 Aug 04

2

Bug or incorrect use of inline asm?

...source_filename = "asanasm.d" target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc" @globVar = global [2 x i32] [i32 66051, i32 66051] define void @_D7asanasm8offconstFZv() { call void asm sideeffect "movdqa 4$0, %xmm7", "*m,~{xmm7}"([2 x i32]* @globVar) ret void } ``` results in: <inline asm>:1:10: error: unexpected token in argument list movdqa 4globVar(%rip), %xmm7 So in that case, I do have to add the '+' to make it work ("4+$0"). So depending on the pointer...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

0

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...es more instructions and spills more > registers compared to 3.1. > Actually, here is an occurrence of that behavior when compiling the code with trunk: [...] movaps %xmm1, %xmm0 ### xmm1 mov'ed to xmm0 movaps %xmm1, %xmm14 ### xmm1 mov'ed to xmm14 addps %xmm7, %xmm0 movaps %xmm7, %xmm13 movaps %xmm0, %xmm1 ### and now other data is mov'ed into xmm1, making one of the above movaps superfluous [...] amb

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >>> xmm4[0,1],xmm1[2],xmm4[3] >>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >>> xmm6[0,1],xmm13[2],xmm6[3] >>> vinsertps $416, %xmm0, %xmm7, %xmm0 #...

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

1

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...l p2 = .. expressions and so forth. p1 = p1 * a p1 = p1 * a . . p2 = p2 * b p2 = p2 * b . . p3 = p3 * c p3 = p3 * c . . An actual excerpt of the generated x86 assembly follows: mulss %xmm8, %xmm10 mulss %xmm8, %xmm10 . . repeated 512 times . mulss %xmm7, %xmm9 mulss %xmm7, %xmm9 . . repeated 512 times . mulss %xmm6, %xmm3 mulss %xmm6, %xmm3 . . repeated 512 times . Since p1, p2, p3, and p4 are all independent, this reordering is correct. This would have the possible advantage of reducing live ranges of va...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

2

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > >> I've noticed that LLVM tends to generate suboptimal code and spill an >> excessive amount of registers in large functions, such as in those >> that are automatically generated by FFTW. >

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >> xmm4[0,1],xmm1[2],xmm4[3] >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >> xmm6[0,1],xmm13[2],xmm6[3] >> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 = >> xmm7...

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

2011 Nov 30

0

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

...ops); + if ( (rc != X86EMUL_OKAY) || memcmp(res, res + 8, 32) ) + goto fail; + printf("okay\n"); + } + else + { + printf("skipped\n"); + memset(res + 2, 0x66, 8); + } + + printf("%-40s", "Testing movaps (%edx),%xmm7..."); + if ( stack_exec && cpu_has_sse ) + { + extern const unsigned char movaps_from_mem[]; + + asm volatile ( "xorps %%xmm7, %%xmm7\n" + ".pushsection .test, \"a\", @progbits\n" + "mova...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

4

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...i64> %sextS54_D to i128 %mskS54_D = icmp ne i128 %BCS54_D, 0 br i1 %mskS54_D, label %middle.block, label %vector.ph Now the assembly for the above IR code is: # BB#4: # %for.cond.preheader vmovdqa 144(%rsp), %xmm0 # 16-byte Reload vpmuludq %xmm7, %xmm0, %xmm2 vpsrlq $32, %xmm7, %xmm4 vpmuludq %xmm4, %xmm0, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpsrlq $32, %xmm0, %xmm4 vpmuludq %xmm7, %xmm4, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpextrq $1, %xmm2, %rax...

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

0

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

...t: CC libavcodec/x86/mpegvideo_mmx.o fatal error: error in backend: Ran out of registers during register allocation! Please check your inline asm statement for invalid constraints: INLINEASM <es:movd %eax, %xmm3 pshuflw $$0, %xmm3, %xmm3 punpcklwd %xmm3, %xmm3 pxor %xmm7, %xmm7 pxor %xmm4, %xmm4 movdqa ($2), %xmm5 pxor %xmm6, %xmm6 psubw ($3), %xmm6 mov $$-128, %eax .align 1 << 4 1: movdqa ($1, %eax), %xmm0 movdqa %xmm0, %xmm1 pabsw %xmm0, %xmm0 psubusw %xmm6, %xmm0...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...insertps masks. A sample here: >>>>> >>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >>>>> xmm4[0,1],xmm1[2],xmm4[3] >>>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >>>>> xmm6[0,1],xmm13[2],xmm6[3] >>>>> v...

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

5

[LLVMdev] Spilled variables using unaligned moves

...ems like an optimization opportunity. The attached replacement of fibonacci.cpp generates x86 code like this: 03A70010 push ebp 03A70011 mov ebp,esp 03A70013 and esp,0FFFFFFF0h 03A70019 sub esp,1A0h ... 03A7006C movups xmmword ptr [esp+180h],xmm7 ... 03A70229 mulps xmm1,xmmword ptr [esp+180h] ... 03A70682 movups xmm0,xmmword ptr [esp+180h] Note how stores and loads use unaligned moves while it could use aligned moves. It's also interesting that the multiply does correctly assume the stack to be 16-byte aligned....

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

0

[LLVMdev] Calling conventions for YMM registers on AVX

....scl 2; .type 32; .endef .text .globl test .align 16, 0x90 test: # @test # BB#0: # %entry pushq %rbp movq %rsp, %rbp subq $64, %rsp vmovaps %xmm7, -32(%rbp) # 16-byte Spill vmovaps %xmm6, -16(%rbp) # 16-byte Spill vmovaps %ymm3, %ymm6 vmovaps %ymm2, %ymm7 vaddps %ymm7, %ymm0, %ymm0 vaddps %ymm6, %ymm1, %ymm1 callq foo vsubps %ymm7, %ymm0, %ymm0 vsubps %ymm6,...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...[eax+36] + movss xmm5, [eax+40] + mulss xmm4, xmm1 + mulss xmm5, xmm1 + movaps xmm6, [ebx+4] + subps xmm6, xmm2 + movups [ebx], xmm6 + movaps xmm7, [ebx+20] + subps xmm7, xmm3 + movups [ebx+16], xmm7 + + movss xmm7, [ebx+36] + subss xmm7, xmm4 + movss [ebx+32], xmm7 + xorps xmm2, xmm2 +...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

3

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

5

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...;> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >>> xmm4[0,1],xmm1[2],xmm4[3] >>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >>> xmm6[0,1],xmm13[2],xmm6[3] >>> vinsertps $416, %xmm0, %xmm7, %xmm0 #...

search for: xmm7