Displaying 20 results from an estimated 35 matches for "pxor".
Did you mean:
poor
2012 Jan 20
2
[LLVMdev] 128-bit PXOR requires SSE2
Hi all,
I think I found a bug in LLVM 3.0: When compiling for a target without
SSE2 support, there were some 128-bit PXOR instructions in the generated
code.
I traced it down to the following definition in X86InstrSSE.td:
def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "",
[(set FR32:$dst, fp32imm0)]>,
Requires<[HasSSE1]>, TB, OpSize;
I t...
2012 Jan 20
0
[LLVMdev] 128-bit PXOR requires SSE2
On Fri, Jan 20, 2012 at 2:47 PM, Nicolas Capens
<nicolas.capens at gmail.com> wrote:
> Hi all,
>
> I think I found a bug in LLVM 3.0: When compiling for a target without
> SSE2 support, there were some 128-bit PXOR instructions in the generated
> code.
>
> I traced it down to the following definition in X86InstrSSE.td:
>
> def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "",
> [(set FR32:$dst, fp32imm0)]>,
> Requires<[H...
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...di), %rax
cmpq %rax, %rcx
cmovaq %rcx, %rax
movq %rdi, %rsi
notq %rsi
addq %rax, %rsi
shrq $2, %rsi
incq %rsi
xorl %edx, %edx
movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8
andq %rsi, %rax
pxor %xmm0, %xmm0
je .LBB0_1
# BB#2: # %vector.body.preheader
leaq (%rdi,%rax,4), %r8
addq $16, %rdi
movq %rsi, %rdx
andq $-8, %rdx
pxor %xmm0, %xmm0
pxor %xmm1, %xmm1
.align 16, 0x90
.LBB0_3:...
2004 Sep 10
2
An assembly optimization and fix
...= total_error_3:total_error_2
- ; mm2 == 0:total_error_4
- ; mm3/4 == 0:unpackarea
- ; mm5 == abs(error_1):abs(error_0)
- ; mm5 == abs(error_3):abs(error_2)
+ ; mm1 == total_error_2:total_error_3
+ ; mm2 == :total_error_4
+ ; mm3 == last_error_1:last_error_0
+ ; mm4 == last_error_2:last_error_3
- pxor mm0, mm0 ; total_error_1 = total_error_0 = 0
- pxor mm1, mm1 ; total_error_3 = total_error_2 = 0
- pxor mm2, mm2 ; total_error_4 = 0
- mov ebx, [esp + 36] ; ebx = data[]
- mov ecx, [ebx - 4] ; ecx == data[-1] last_error_0 = data[-1]
- mov eax, [ebx - 8] ; eax == data[-2]
- mov ebp, [eb...
2010 Nov 14
1
[LLVMdev] Pesudo X86 instructions used for generating constants
Hi,
I noticed a bunch of psuedo instructions used for creation of constants without
generating loads. e.g. pxor xmm0, xmm0
Here is an example of what i am referring to snipped from X86InstrSSE.td:
def FsFLD0SS : I<0xEF, MRMInitReg, (outs FR32:$dst), (ins), "",
[(set FR32:$dst, fp32imm0)]>,
Requires<[HasSSE1]>, TB, OpSize;
My question is why was there...
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2010 Aug 02
0
[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!
...output:
CC libavcodec/x86/mpegvideo_mmx.o
fatal error: error in backend: Ran out of registers during register
allocation!
Please check your inline asm statement for invalid constraints:
INLINEASM <es:movd %eax, %xmm3
pshuflw $$0, %xmm3, %xmm3
punpcklwd %xmm3, %xmm3
pxor %xmm7, %xmm7
pxor %xmm4, %xmm4
movdqa ($2), %xmm5
pxor %xmm6, %xmm6
psubw ($3), %xmm6
mov $$-128, %eax
.align 1 << 4
1:
movdqa ($1, %eax), %xmm0
movdqa %xmm0, %xmm1
pabsw %xmm0, %xmm0
psubusw %xmm6, %xmm0...
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
...i_startproc
# BB#0:
pushl %ebx
.Ltmp13:
.cfi_def_cfa_offset 8
pushl %edi
.Ltmp14:
.cfi_def_cfa_offset 12
pushl %esi
.Ltmp15:
.cfi_def_cfa_offset 16
subl $88, %esp
.Ltmp16:
.cfi_def_cfa_offset 104
.Ltmp17:
.cfi_offset %esi, -16
.Ltmp18:
.cfi_offset %edi, -12
.Ltmp19:
.cfi_offset %ebx, -8
pxor %xmm0, %xmm0
movl 112(%esp), %eax
testl %eax, %eax
je .LBB1_3
# BB#1:
xorl %ebx, %ebx
movl 108(%esp), %ecx
movl 104(%esp), %edx
xorl %esi, %esi
.align 16, 0x90
.LBB1_2: # %.lr.ph.i
# =>This Inner Loop Header: Depth=1...
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
...i_startproc
# BB#0:
pushl %ebx
.Ltmp13:
.cfi_def_cfa_offset 8
pushl %edi
.Ltmp14:
.cfi_def_cfa_offset 12
pushl %esi
.Ltmp15:
.cfi_def_cfa_offset 16
subl $88, %esp
.Ltmp16:
.cfi_def_cfa_offset 104
.Ltmp17:
.cfi_offset %esi, -16
.Ltmp18:
.cfi_offset %edi, -12
.Ltmp19:
.cfi_offset %ebx, -8
pxor %xmm0, %xmm0
movl 112(%esp), %eax
testl %eax, %eax
je .LBB1_3
# BB#1:
xorl %ebx, %ebx
movl 108(%esp), %ecx
movl 104(%esp), %edx
xorl %esi, %esi
.align 16, 0x90
.LBB1_2: # %.lr.ph.i
# =>This Inner Loop Header: Depth=1...
2011 Nov 11
3
[LLVMdev] Misaligned SSE store problem (with reduced source)
...lowing is produced. Note the 24 byte offset from ebp:
.def _MisalignedStore;
.scl 2;
.type 32;
.endef
.text
.globl _MisalignedStore
.align 16, 0x90
_MisalignedStore: # @MisalignedStore
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
subl $24, %esp
pxor %xmm0, %xmm0
movaps %xmm0, -24(%ebp)
movl $8, %eax
calll __alloca
movl %ebp, %esp
popl %ebp
ret
The code is trivial and useless, but it's a boiled down version of a real program. Am I doing something wrong in that IR? Note that removing the last alloca of %f or the jump to post-block both c...
2010 Nov 20
2
[LLVMdev] Poor floating point optimizations?
I wanted to use LLVM for my math parser but it seems that floating point
optimizations are poor.
For example consider such C code:
float foo(float x) { return x+x+x; }
and here is the code generated with "optimized" live demo:
define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x,
2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor:
00460013 movss xmm0,dword ptr [esp+8]
00460019 movaps xmm1,xmm0
0046001C addss xmm1,xmm1
00460020 pxor xmm2,xmm2
00460024 addss xmm2,xmm1
00460028 addss xmm2,xmm0
0046002C movss dword ptr [esp],xmm2
00460031 fld dword ptr [esp]
Especially pxor&and instead of movss (which is unnecessary anyway) is just pure
madness.
Bob D.
2010 Nov 20
3
[LLVMdev] Poor floating point optimizations?
On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote:
> And also the resulting assembly code is very poor:
>
> 00460013 movss xmm0,dword ptr [esp+8]
> 00460019 movaps xmm1,xmm0
> 0046001C addss xmm1,xmm1
> 00460020 pxor xmm2,xmm2
> 00460024 addss xmm2,xmm1
> 00460028 addss xmm2,xmm0
> 0046002C movss dword ptr [esp],xmm2
> 00460031 fld dword ptr [esp]
>
> Especially pxor&and instead of movss (which is unnecessary anyway) is just pure
> madness.
X...
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
...e, it should emit something like:
.text
.align 4
.globl _sum_d
_sum_d:
subl $12, %esp
movl 20(%esp), %eax
movl 16(%esp), %ecx
cmpl $0, %eax
jne LBB_sum_d_2 # cond_true.preheader
LBB_sum_d_1: # entry.bb9_crit_edge
pxor %xmm0, %xmm0
jmp LBB_sum_d_5 # bb9
LBB_sum_d_2: # cond_true.preheader
pxor %xmm0, %xmm0
xorl %edx, %edx
LBB_sum_d_3: # cond_true
addsd (%ecx), %xmm0
addl $8, %ecx
incl %edx
cmpl %eax, %edx
jne LBB_sum_d_3 # cond_true
LBB_...
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
...{
entry:
%0 = add <4 x i32> %x, %z
%not = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %not, %y
%2 = xor <4 x i32> %0, %1
ret <4 x i32> %2
}
_intfoo:
movdqa %xmm0, %xmm3
paddd %xmm2, %xmm3
pandn %xmm1, %xmm2
movdqa %xmm2, %xmm0
pxor %xmm3, %xmm0
ret
All the instructions moved to the int domain because the add forced them.
> Please tell me if something would be wrong for me.
You should measure if LLVM's code is actually slower that the code you want. If it is, I would like to hear.
Our weakness is the shufflevector...
2011 Nov 11
0
[LLVMdev] Misaligned SSE store problem (with reduced source)
..._MisalignedStore;
> .scl 2;
> .type 32;
> .endef
> .text
> .globl _MisalignedStore
> .align 16, 0x90
> _MisalignedStore: # @MisalignedStore
> # BB#0: # %entry
> pushl %ebp
> movl %esp, %ebp
> subl $24, %esp
> pxor %xmm0, %xmm0
> movaps %xmm0, -24(%ebp)
> movl $8, %eax
> calll __alloca
> movl %ebp, %esp
> popl %ebp
> ret
>
> The code is trivial and useless, but it's a boiled down version of a real
> program. Am I doing something wrong in that IR?
It's a known issue that th...
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
Hi,
I'm building a little JIT that creates functions to do array manipulations,
eg. sum all the elements of a double* array. I'm writing this in python, generating
llvm assembly intructions and piping that through a call to ParseAssemblyString,
ExecutionEngine, etc.
It's working OK on integer values, but i'm getting nasty floating point exceptions
when i try this on double*
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x,
2015 Mar 27
2
[LLVMdev] LLVM fails for inline asm with Link Time Optimization
...1> mov esi, dword ptr 28(%esp)
1> ^
1><inline asm>:4:21 : error 0: invalid token in expression
1> movq mm1, [edi+ebx-$8]
1> ^
1><inline asm>:5:12 : error 0: invalid operand for instruction
1> pxor mm0, mm0
Thanks
Ashish
On Fri, Mar 27, 2015 at 8:21 PM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> If you are getting a parse error it is very likely a different bug. In
> that bug the issue is that we don't parse the function bodies to find
> if some inline...
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
...nstructions.
Source code:
const int N = 1024;
unsigned char a[N], b[N];
int sad() {
int s = 0;
for (int i = 0; i < N; ++i) {
int res = a[i] - b[i];
s += (res > 0) ? res : -res;
}
return s;
}
Emitted instructions on X86:
# BB#0: # %entry
pxor %xmm0, %xmm0
movq $-1024, %rax # imm = 0xFFFFFFFFFFFFFC00
pxor %xmm1, %xmm1
.align 16, 0x90
.LBB0_1: # %vector.body
# =>This Inner Loop Header: Depth=1
movd b+1024(%rax), %xmm2 # xmm2 = mem[0],zero,zero,zero
mo...