thr3ads.net - search: "pxor"

2012 Jan 20

2

[LLVMdev] 128-bit PXOR requires SSE2

Hi all, I think I found a bug in LLVM 3.0: When compiling for a target without SSE2 support, there were some 128-bit PXOR instructions in the generated code. I traced it down to the following definition in X86InstrSSE.td: def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "", [(set FR32:$dst, fp32imm0)]>, Requires<[HasSSE1]>, TB, OpSize; I t...

[LLVMdev] 128-bit PXOR requires SSE2

2012 Jan 20

0

[LLVMdev] 128-bit PXOR requires SSE2

On Fri, Jan 20, 2012 at 2:47 PM, Nicolas Capens <nicolas.capens at gmail.com> wrote: > Hi all, > > I think I found a bug in LLVM 3.0: When compiling for a target without > SSE2 support, there were some 128-bit PXOR instructions in the generated > code. > > I traced it down to the following definition in X86InstrSSE.td: > > def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "", > [(set FR32:$dst, fp32imm0)]>, > Requires<[H...

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...di), %rax cmpq %rax, %rcx cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq (%rdi,%rax,4), %r8 addq $16, %rdi movq %rsi, %rdx andq $-8, %rdx pxor %xmm0, %xmm0 pxor %xmm1, %xmm1 .align 16, 0x90 .LBB0_3:...

An assembly optimization and fix

2004 Sep 10

2

An assembly optimization and fix

...= total_error_3:total_error_2 - ; mm2 == 0:total_error_4 - ; mm3/4 == 0:unpackarea - ; mm5 == abs(error_1):abs(error_0) - ; mm5 == abs(error_3):abs(error_2) + ; mm1 == total_error_2:total_error_3 + ; mm2 == :total_error_4 + ; mm3 == last_error_1:last_error_0 + ; mm4 == last_error_2:last_error_3 - pxor mm0, mm0 ; total_error_1 = total_error_0 = 0 - pxor mm1, mm1 ; total_error_3 = total_error_2 = 0 - pxor mm2, mm2 ; total_error_4 = 0 - mov ebx, [esp + 36] ; ebx = data[] - mov ecx, [ebx - 4] ; ecx == data[-1] last_error_0 = data[-1] - mov eax, [ebx - 8] ; eax == data[-2] - mov ebp, [eb...

[LLVMdev] Pesudo X86 instructions used for generating constants

2010 Nov 14

1

[LLVMdev] Pesudo X86 instructions used for generating constants

Hi, I noticed a bunch of psuedo instructions used for creation of constants without generating loads. e.g. pxor xmm0, xmm0 Here is an example of what i am referring to snipped from X86InstrSSE.td: def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "", [(set FR32:$dst, fp32imm0)]>, Requires<[HasSSE1]>, TB, OpSize; My question is why was there...

MMX/mmxext optimisations

2004 Aug 24

5

MMX/mmxext optimisations

quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

0

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

...output: CC libavcodec/x86/mpegvideo_mmx.o fatal error: error in backend: Ran out of registers during register allocation! Please check your inline asm statement for invalid constraints: INLINEASM <es:movd %eax, %xmm3 pshuflw $$0, %xmm3, %xmm3 punpcklwd %xmm3, %xmm3 pxor %xmm7, %xmm7 pxor %xmm4, %xmm4 movdqa ($2), %xmm5 pxor %xmm6, %xmm6 psubw ($3), %xmm6 mov $$-128, %eax .align 1 << 4 1: movdqa ($1, %eax), %xmm0 movdqa %xmm0, %xmm1 pabsw %xmm0, %xmm0 psubusw %xmm6, %xmm0...

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

...i_startproc # BB#0: pushl %ebx .Ltmp13: .cfi_def_cfa_offset 8 pushl %edi .Ltmp14: .cfi_def_cfa_offset 12 pushl %esi .Ltmp15: .cfi_def_cfa_offset 16 subl $88, %esp .Ltmp16: .cfi_def_cfa_offset 104 .Ltmp17: .cfi_offset %esi, -16 .Ltmp18: .cfi_offset %edi, -12 .Ltmp19: .cfi_offset %ebx, -8 pxor %xmm0, %xmm0 movl 112(%esp), %eax testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx movl 104(%esp), %edx xorl %esi, %esi .align 16, 0x90 .LBB1_2: # %.lr.ph.i # =>This Inner Loop Header: Depth=1...

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

...i_startproc # BB#0: pushl %ebx .Ltmp13: .cfi_def_cfa_offset 8 pushl %edi .Ltmp14: .cfi_def_cfa_offset 12 pushl %esi .Ltmp15: .cfi_def_cfa_offset 16 subl $88, %esp .Ltmp16: .cfi_def_cfa_offset 104 .Ltmp17: .cfi_offset %esi, -16 .Ltmp18: .cfi_offset %edi, -12 .Ltmp19: .cfi_offset %ebx, -8 pxor %xmm0, %xmm0 movl 112(%esp), %eax testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx movl 104(%esp), %edx xorl %esi, %esi .align 16, 0x90 .LBB1_2: # %.lr.ph.i # =>This Inner Loop Header: Depth=1...

[LLVMdev] Misaligned SSE store problem (with reduced source)

2011 Nov 11

3

[LLVMdev] Misaligned SSE store problem (with reduced source)

...lowing is produced. Note the 24 byte offset from ebp: .def _MisalignedStore; .scl 2; .type 32; .endef .text .globl _MisalignedStore .align 16, 0x90 _MisalignedStore: # @MisalignedStore # BB#0: # %entry pushl %ebp movl %esp, %ebp subl $24, %esp pxor %xmm0, %xmm0 movaps %xmm0, -24(%ebp) movl $8, %eax calll __alloca movl %ebp, %esp popl %ebp ret The code is trivial and useless, but it's a boiled down version of a real program. Am I doing something wrong in that IR? Note that removing the last alloca of %f or the jump to post-block both c...

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

2

[LLVMdev] Poor floating point optimizations?

I wanted to use LLVM for my math parser but it seems that floating point optimizations are poor. For example consider such C code: float foo(float x) { return x+x+x; } and here is the code generated with "optimized" live demo: define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x, 2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

0

[LLVMdev] Poor floating point optimizations?

And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is unnecessary anyway) is just pure madness. Bob D.

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

3

[LLVMdev] Poor floating point optimizations?

On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote: > And also the resulting assembly code is very poor: > > 00460013 movss xmm0,dword ptr [esp+8] > 00460019 movaps xmm1,xmm0 > 0046001C addss xmm1,xmm1 > 00460020 pxor xmm2,xmm2 > 00460024 addss xmm2,xmm1 > 00460028 addss xmm2,xmm0 > 0046002C movss dword ptr [esp],xmm2 > 00460031 fld dword ptr [esp] > > Especially pxor&and instead of movss (which is unnecessary anyway) is just pure > madness. X...

[LLVMdev] floating point exception and SSE2 instructions

2006 Apr 19

0

[LLVMdev] floating point exception and SSE2 instructions

...e, it should emit something like: .text .align 4 .globl _sum_d _sum_d: subl $12, %esp movl 20(%esp), %eax movl 16(%esp), %ecx cmpl $0, %eax jne LBB_sum_d_2 # cond_true.preheader LBB_sum_d_1: # entry.bb9_crit_edge pxor %xmm0, %xmm0 jmp LBB_sum_d_5 # bb9 LBB_sum_d_2: # cond_true.preheader pxor %xmm0, %xmm0 xorl %edx, %edx LBB_sum_d_3: # cond_true addsd (%ecx), %xmm0 addl $8, %ecx incl %edx cmpl %eax, %edx jne LBB_sum_d_3 # cond_true LBB_...

[LLVMdev] How does SSEDomainFix work?

2010 May 11

0

[LLVMdev] How does SSEDomainFix work?

...{ entry: %0 = add <4 x i32> %x, %z %not = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %not, %y %2 = xor <4 x i32> %0, %1 ret <4 x i32> %2 } _intfoo: movdqa %xmm0, %xmm3 paddd %xmm2, %xmm3 pandn %xmm1, %xmm2 movdqa %xmm2, %xmm0 pxor %xmm3, %xmm0 ret All the instructions moved to the int domain because the add forced them. > Please tell me if something would be wrong for me. You should measure if LLVM's code is actually slower that the code you want. If it is, I would like to hear. Our weakness is the shufflevector...

[LLVMdev] Misaligned SSE store problem (with reduced source)

2011 Nov 11

0

[LLVMdev] Misaligned SSE store problem (with reduced source)

..._MisalignedStore; > .scl 2; > .type 32; > .endef > .text > .globl _MisalignedStore > .align 16, 0x90 > _MisalignedStore: # @MisalignedStore > # BB#0: # %entry > pushl %ebp > movl %esp, %ebp > subl $24, %esp > pxor %xmm0, %xmm0 > movaps %xmm0, -24(%ebp) > movl $8, %eax > calll __alloca > movl %ebp, %esp > popl %ebp > ret > > The code is trivial and useless, but it's a boiled down version of a real > program. Am I doing something wrong in that IR? It's a known issue that th...

[LLVMdev] floating point exception and SSE2 instructions

2006 Apr 19

2

[LLVMdev] floating point exception and SSE2 instructions

Hi, I'm building a little JIT that creates functions to do array manipulations, eg. sum all the elements of a double* array. I'm writing this in python, generating llvm assembly intructions and piping that through a call to ParseAssemblyString, ExecutionEngine, etc. It's working OK on integer values, but i'm getting nasty floating point exceptions when i try this on double*

[LLVMdev] How does SSEDomainFix work?

2010 May 11

2

[LLVMdev] How does SSEDomainFix work?

Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,

[LLVMdev] LLVM fails for inline asm with Link Time Optimization

2015 Mar 27

2

[LLVMdev] LLVM fails for inline asm with Link Time Optimization

...1> mov esi, dword ptr 28(%esp) 1> ^ 1><inline asm>:4:21 : error 0: invalid token in expression 1> movq mm1, [edi+ebx-$8] 1> ^ 1><inline asm>:5:12 : error 0: invalid operand for instruction 1> pxor mm0, mm0 Thanks Ashish On Fri, Mar 27, 2015 at 8:21 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > If you are getting a parse error it is very likely a different bug. In > that bug the issue is that we don't parse the function bodies to find > if some inline...

[RFC] Introducing a vector reduction add instruction.

2015 Nov 19

5

[RFC] Introducing a vector reduction add instruction.

...nstructions. Source code: const int N = 1024; unsigned char a[N], b[N]; int sad() { int s = 0; for (int i = 0; i < N; ++i) { int res = a[i] - b[i]; s += (res > 0) ? res : -res; } return s; } Emitted instructions on X86: # BB#0: # %entry pxor %xmm0, %xmm0 movq $-1024, %rax # imm = 0xFFFFFFFFFFFFFC00 pxor %xmm1, %xmm1 .align 16, 0x90 .LBB0_1: # %vector.body # =>This Inner Loop Header: Depth=1 movd b+1024(%rax), %xmm2 # xmm2 = mem[0],zero,zero,zero mo...

search for: pxor