thr3ads.net - search: "movss"

2011 Oct 07

2

[LLVMdev] Aliasing confusion

...nction they do not. The source of BasicAliasAnalysis.cpp seems more than intelligent enough to detect both of these cases as not aliasing. More to the point, the X86 assembly I get from compiling this is: opt -O3 test_alias.ll | llc -filetype=asm ... _A: ## @A movss (%rdi), %xmm0 movss %xmm0, 4096(%rdi) movss (%rdi), %xmm0 ret ... _B: ## @B movss (%rdi), %xmm0 movss %xmm0, (%rsi) ret ... Note the redundant movss in A. Any idea what I'm doing wrong? I am using what I believe is the most recent svn version (llvm...

[LLVMdev] Aliasing confusion

2011 Oct 07

0

[LLVMdev] Aliasing confusion

...Analysis.cpp seems more than intelligent enough to detect > both of these cases as not aliasing. > > More to the point, the X86 assembly I get from compiling this is: > > opt -O3 test_alias.ll | llc -filetype=asm > ... > _A: ## @A > movss (%rdi), %xmm0 > movss %xmm0, 4096(%rdi) > movss (%rdi), %xmm0 > ret > > ... > _B: ## @B > movss (%rdi), %xmm0 > movss %xmm0, (%rsi) > ret > ... > > Note the redundant movss in...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...push ecx + push edx + */ + _asm + { + mov eax, num + mov ebx, den + mov ecx, mem + + mov edx, in1 + movss xmm0, [edx] + + movss xmm1, [ecx] + addss xmm1, xmm0 + + mov edx, in2 + movss [edx], xmm1 + + shufps xmm0, xmm0, 0x00 + shufps xmm1, xmm1, 0x00 + +...

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

2

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

...is is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2 # =>This Inner Loop Header: Depth=1 movq offset_array3(,%rsi,8), %rdi movq offset_array2(,%rsi,8), %r10 movss -28(%rax), %xmm0 movss -8(%rax), %xmm1 movss -4(%rax), %xmm2 unpcklps %xmm0, %xmm2 # xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1] movss (%rax), %xmm0 unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] unpcklps...

[LLVMdev] JIT compiled intrinsics calls is call to null pointer

2013 Sep 18

2

[LLVMdev] JIT compiled intrinsics calls is call to null pointer

...ntrinsic::pow, arg_types);auto result=ir_builder->CreateCall(function, args); When I try to execute the code generated by the JIT compiler, I see that the intrinsic is not compiled into a math coprocessor instruction, but in a call to a null address: 002300B8 sub esp,8 002300BB movss xmm0,dword ptr ds:[2300B0h] 002300C3 movss dword ptr [esp+4],xmm0 002300C9 movss xmm0,dword ptr ds:[2300B4h] 002300D1 movss dword ptr [esp],xmm0 002300D6 call 00000000 002300DB add esp,8 002300DE ret Is there anything special that I need to...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...0x183(%rip),%xmm1,%xmm1 # 4006a0 <__dso_handle+0x28> 40051d: vpsubd %xmm1,%xmm0,%xmm0 400521: vmovq %xmm0,%rax 400526: movslq %eax,%rcx 400529: sar $0x20,%rax 40052d: vpextrq $0x1,%xmm0,%rdx 400533: movslq %edx,%rsi 400536: sar $0x20,%rdx 40053a: vmovss 0x4006c0(,%rcx,4),%xmm0 400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0 40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0 400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0 <__dso_handle+0x38> 40056c: vmov...

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

5

[LLVMdev] About JIT by LLVM 2.9 or later

...[ebp+8] } 002C13F0 pop edi 002C13F1 pop esi 002C13F2 pop ebx 002C13F3 mov esp,ebp 002C13F5 pop ebp 002C13F6 ret *Callee( 'fetch' LLVM ):* 010B0010 mov eax,dword ptr [esp+4] 010B0014 mov ecx,dword ptr [esp+8] 010B0018 movss xmm0,dword ptr [ecx+1Ch] 010B001D movss dword ptr [eax+0Ch],xmm0 010B0022 movss xmm0,dword ptr [ecx+18h] 010B0027 movss dword ptr [eax+8],xmm0 010B002C movss xmm0,dword ptr [ecx+10h] 010B0031 movss xmm1,dword ptr [ecx+14h] 010B0036 movss dword ptr [ea...

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

0

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

...xorl %esi, %esi > .align 16, 0x90 > .LBB0_1: # %loop2 > # =>This Inner Loop Header: Depth=1 > movq offset_array3(,%rsi,8), %rdi > movq offset_array2(,%rsi,8), %r10 > movss -28(%rax), %xmm0 > movss -8(%rax), %xmm1 > movss -4(%rax), %xmm2 > unpcklps %xmm0, %xmm2 # xmm2 = > xmm2[0],xmm0[0],xmm2[1],xmm0[1] > movss (%rax), %xmm0 > unpcklps %xmm0, %xmm1 # xmm1 = > xmm1[0],xmm0[0],xmm1[...

[LLVMdev] Float compare-for-equality and select optimization opportunity

2008 May 27

3

[LLVMdev] Float compare-for-equality and select optimization opportunity

...that I think could be eliminated. In C syntax the code looks like this: float x, y; int a, b, c if(x == y) // Rotate the integers { int t; t = a; a = b; b = c; c = t; } This is the resulting x86 assembly code: movss xmm0,dword ptr [ecx+4] ucomiss xmm0,dword ptr [ecx+8] sete al setnp dl test dl,al mov edx,edi cmovne edx,ecx cmovne ecx,esi cmovne esi,edi While I'm pleasantly surprised that my branch does get turned into several sele...

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

2

[LLVMdev] Poor floating point optimizations?

I wanted to use LLVM for my math parser but it seems that floating point optimizations are poor. For example consider such C code: float foo(float x) { return x+x+x; } and here is the code generated with "optimized" live demo: define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x, 2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

0

[LLVMdev] Poor floating point optimizations?

And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&...

[LLVMdev] llvm.exp.f32 didn't work

2012 Mar 31

1

[LLVMdev] llvm.exp.f32 didn't work

...mented a function like define inlinehint float "my_exp"(float %.value) { .body: %0 = call float @llvm.exp.f32(float %.value) ret float %0 } declare float @llvm.exp.f32(float) nounwind readonly But it generates following ASM: 00280072 movups xmm0,xmmword ptr [esp+8] 00280077 movss dword ptr [esp],xmm0 0028007C call 00000000 00280081 pop eax As you seen, line 0028007C will call CRT exp I think, but it calls NULL pointer. But sqrt is right. 005000D1 movss xmm0,dword ptr [esp+0Ch] 005000D7 movss dword ptr [esp],xmm0 005000DC call...

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

2

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...a32_sse_lag_12 +cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_la...

[LLVMdev] Win64 Calling Convention problem

2009 Dec 03

4

[LLVMdev] Win64 Calling Convention problem

...ED mov rcx,4 0000000140067AF7 mov eax,0CCCCCCCCh 0000000140067AFC rep stos dword ptr [rdi] 0000000140067AFE mov rcx,qword ptr [rsp+20h] return v.x + v.y; 0000000140067B03 mov rax,qword ptr [v] 0000000140067B08 mov rcx,qword ptr [v] 0000000140067B0D movss xmm0,dword ptr [rax] 0000000140067B11 addss xmm0,dword ptr [rcx+4] 0000000140067B16 add rsp,10h 0000000140067B1A pop rdi 0000000140067B1B ret } --- snip --- noise4 is supposed to be called by jitted LLVM code, just like in the following example. --- snip --- targe...

[LLVMdev] Float compare-for-equality and select optimizationopportunity

2008 May 27

1

[LLVMdev] Float compare-for-equality and select optimizationopportunity

Both ZF and PF will be set if unordered, so the code below is IEEE correct...you want to generate 'fcmp ueq' instead of 'fcmp oqe' This is the resulting x86 assembly code: movss xmm0,dword ptr [ecx+4] ucomiss xmm0,dword ptr [ecx+8] sete al setnp dl test dl,al mov edx,edi cmovne edx,ecx cmovne ecx,esi cmovne esi,edi While I'm pleasantly surprised that my branch does get turned into several sele...

[LLVMdev] JIT compiled intrinsics calls is call to null pointer

2013 Sep 19

1

[LLVMdev] JIT compiled intrinsics calls is call to null pointer

...nsic::pow, arg_types);auto result=ir_builder->CreateCall(function, args); When I try to execute the code generated by the JIT compiler, I see that the intrinsic is not compiled into a math coprocessor instruction, but in a call to a null address: 002300B8 sub esp,8 002300BB movss xmm0,dword ptr ds:[2300B0h] 002300C3 movss dword ptr [esp+4],xmm0 002300C9 movss xmm0,dword ptr ds:[2300B4h] 002300D1 movss dword ptr [esp],xmm0 002300D6 call 00000000 002300DB add esp,8 002300DE ret Is there anything special that I need...

[LLVMdev] ASM appears to be incorrect from llc

2012 Feb 15

2

[LLVMdev] ASM appears to be incorrect from llc

...xe test.trunk.ll -f -o test.bc and then llc -x86-asm-syntax=intel -o test.trunk.S test.bc yields: .def _main__i__v; .scl 2; .type 32; .endef .text .globl _main__i__v .align 16, 0x90 _main__i__v: # @main__i__v # BB#0: # %locals sub ESP, 20 movss XMM0, DWORD PTR [_t] movss DWORD PTR [ESP + 8], XMM0 fld DWORD PTR [ESP + 8] fisttp QWORD PTR [ESP] mov EAX, DWORD PTR [ESP] mov _x, EAX xor EAX, EAX add ESP, 20 ret .data .globl _t # @t .align 8 _t: .zero 8 .globl _x # @x .align 4 _x: .long 0...

[LLVMdev] Is it a bug or am I missing something ?

2013 Feb 19

0

[LLVMdev] Is it a bug or am I missing something ?

...location-model=pic -o shufxbug.s .file "shufxbug.ll" .text .globl sample_test .align 16, 0x90 .type sample_test, at function sample_test: # @sample_test # BB#0: # %L.entry movl 4(%esp), %eax movss 304(%eax), %xmm0 xorps %xmm1, %xmm1 movl 8(%esp), %eax movups %xmm1, 624(%eax) pshufd $65, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,1] movdqu %xmm0, 608(%eax) ret .Ltmp0: .size sample_test, .Ltmp0-sample_test .section ".note.GNU-stack&quot...

[LLVMdev] Float compare-for-equality and select optimizationopportunity

2008 May 27

0

[LLVMdev] Float compare-for-equality and select optimizationopportunity

...Mailing List' Subject: Re: [LLVMdev] Float compare-for-equality and select optimizationopportunity Both ZF and PF will be set if unordered, so the code below is IEEE correct...you want to generate 'fcmp ueq' instead of 'fcmp oqe' This is the resulting x86 assembly code: movss xmm0,dword ptr [ecx+4] ucomiss xmm0,dword ptr [ecx+8] sete al setnp dl test dl,al mov edx,edi cmovne edx,ecx cmovne ecx,esi cmovne esi,edi While I'm pleasantly surprised that my branch does get turned into several sele...

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

3

[LLVMdev] X86 - Help on fixing a poor code generation bug

...ends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into addss %xmm1, %xmm0 The first step is to understand why the compiler produces the redundant instructions. For the example above, the DAG sequence looks like this: a0 : f32 = extract_vector_elt ( A, 0) b0 : f32 = extract_vector_elt...

search for: movss