Displaying 20 results from an estimated 1881 matches for "edi".
2014 Jan 18
2
[LLVMdev] Scheduling quirks
...<===
...I get the following result:
===>
.file "test.cpp"
.text
.globl _Z13test_registeri
.align 16, 0x90
.type _Z13test_registeri, at function
_Z13test_registeri: # @_Z13test_registeri
.cfi_startproc
# BB#0: # %entry
movl %edi, %eax
sarl $2, %eax
xorl %edi, %eax
movl %eax, %ecx
sarl $3, %ecx
xorl %eax, %ecx
movl %ecx, %edx
sarl $4, %edx
xorl %ecx, %edx
movl %edx, %eax
sarl $5, %eax
xorl %edx, %eax
retq
.Ltmp0:
.size _Z13test_registeri, .Ltmp0-_Z13test_registeri
.cfi_endproc
.globl _Z14test_scheduleri
.al...
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...var_34 = dword ptr -34h
push rbp
push r15
push r14
push r13
push r12
push rbx
sub rsp, 18h
mov ebx, 0FFFFFFFFh
cmp edi, 2
jnz loc_100000F29
mov rdi, [rsi+8] ; char *
xor r14d, r14d
xor esi, esi ; char **
mov edx, 0Ah ; int
call _strtol
mov r15, rax...
2018 Nov 20
2
A pattern for portable __builtin_add_overflow()
...nsigned types this is easy:
int uaddo_native(unsigned a, unsigned b, unsigned* s)
{
return __builtin_add_overflow(a, b, s);
}
int uaddo_portable(unsigned a, unsigned b, unsigned* s)
{
*s = a + b;
return *s < a;
}
We get exactly the same assembly:
uaddo_native: # @uaddo_native
xor eax, eax
add edi, esi
setb al
mov dword ptr [rdx], edi
ret
uaddo_portable: # @uaddo_portable
xor eax, eax
add edi, esi
setb al
mov dword ptr [rdx], edi
ret
But with signed types it is not so easy. I tried 2 versions, but the result
is quite far away from the optimal assembly.
int saddo_native(int a, int b, int*...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...push r15
>> push r14
>> push r13
>> push r12
>> push rbx
>> sub rsp, 18h
>> mov ebx, 0FFFFFFFFh
>> cmp edi, 2
>> jnz loc_100000F29
>> mov rdi, [rsi+8] ; char *
>> xor r14d, r14d
>> xor esi, esi ; char **
>> mov edx, 0Ah ; int
>> call...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...push r14
>>>> push r13
>>>> push r12
>>>> push rbx
>>>> sub rsp, 18h
>>>> mov ebx, 0FFFFFFFFh
>>>> cmp edi, 2
>>>> jnz loc_100000F29
>>>> mov rdi, [rsi+8] ; char *
>>>> xor r14d, r14d
>>>> xor esi, esi ; char **
>>>> mov edx, 0Ah...
2004 Sep 10
2
An assembly optimization and fix
I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov
function and fixed bug when data_len == 0. Now the function is about
50% faster and flac -5 is about 5% faster on my box. I have tested it
thoroughly, I think it can go to flac 1.0.4.
--
Miroslav Lichvar
-------------- next part --------------
--- src/libFLAC/ia32/fixed_asm....
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
...{
s += sum(&x[i], 18);
p[i] = 5; // xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
}
return s;
}
====== Output A ======
======================
foo: # @foo
.Ltmp12:
.cfi_startproc
# BB#0:
pushl %ebx
.Ltmp13:
.cfi_def_cfa_offset 8
pushl %edi
.Ltmp14:
.cfi_def_cfa_offset 12
pushl %esi
.Ltmp15:
.cfi_def_cfa_offset 16
subl $88, %esp
.Ltmp16:
.cfi_def_cfa_offset 104
.Ltmp17:
.cfi_offset %esi, -16
.Ltmp18:
.cfi_offset %edi, -12
.Ltmp19:
.cfi_offset %ebx, -8
pxor %xmm0, %xmm0
movl 112(%esp), %eax
testl %eax, %eax
je .LBB1_3
# BB#...
2011 Jul 11
4
[LLVMdev] RegAllocFast uses too much stack
...foo(0);
foo(1);
foo(2);
}
This doesn't just spill out all the registers to the stack before each call,
we also set up 0, 1 and 2 into regs first, then spill them and don't even
get a chance to reuse stack slots. That's just bad:
pushq %rax
movl $2, %edi
movl $1, %eax
movl $0, %ecx
movl %edi, 4(%rsp) # 4-byte Spill
movl %ecx, %edi
movl %eax, (%rsp) # 4-byte Spill
callq foo
movl (%rsp), %edi # 4-byte Reload
callq foo
movl...
2015 Nov 21
2
Recent -Os code size regressions
...%cl,%cl
28f: 0f 89 34 01 00 00 jns 3c9 <t_run_test+0x3c9>
After
r252152:
Note that the OR $0x408 and OR $0x810 come
now
in reverse order.
35d: 81 c9 08 04 00 00 or $0x408,%ecx
363: 89 4c 24 28 mov %ecx,0x28(%esp)
367: 89 df mov %ebx,%edi
369: 83 e7 10
and $0x10,%edi
36c: 89 7c 24 20 mov %edi,0x20(%esp)
370: 0f 45 d1
cmovne %ecx,%edx
373: 89 d7 mov %edx,%edi
375: 81 cf 10 08 00 00 or $0x810,%edi
37b: 89 7c 24 14 mov %edi,0x14(%esp)
37f: 89 d9...
2018 May 09
3
Ignored branch predictor hints
...fine likely(x) __builtin_expect((x),1)
// switch like
char * b(int e) {
if (likely(e == 0))
return "0";
else if (e == 1)
return "1";
else return "f";
}
GCC correctly prefers the first case:
b(int):
mov eax, OFFSET FLAT:.LC0
test edi, edi
jne .L7
ret
But Clang seems to ignore _builtin_expect hints in this case.
b(int): # @b(int)
cmp edi, 1
mov eax, offset .L.str.1
mov ecx, offset .L.str.2
cmove rcx, rax
test edi, edi
mov eax, offset .L.str
cmovne rax, rcx
ret
https://godbolt.org/g/tuAVT7
-------------- ne...
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
...{
s += sum(&x[i], 18);
p[i] = 5; // xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
}
return s;
}
====== Output A ======
======================
foo: # @foo
.Ltmp12:
.cfi_startproc
# BB#0:
pushl %ebx
.Ltmp13:
.cfi_def_cfa_offset 8
pushl %edi
.Ltmp14:
.cfi_def_cfa_offset 12
pushl %esi
.Ltmp15:
.cfi_def_cfa_offset 16
subl $88, %esp
.Ltmp16:
.cfi_def_cfa_offset 104
.Ltmp17:
.cfi_offset %esi, -16
.Ltmp18:
.cfi_offset %edi, -12
.Ltmp19:
.cfi_offset %ebx, -8
pxor %xmm0, %xmm0
movl 112(%esp), %eax
testl %eax, %eax
je .LBB1_3
# BB#...
2009 Mar 11
4
[LLVMdev] Bug in X86CompilationCallback_SSE
...%ebp)
0xb74b98e9 <X86CompilationCallback_SSE+9>: lea 0x17(%esp),%esi
0xb74b98ed <X86CompilationCallback_SSE+13>: and $0xfffffff0,%esi
0xb74b98f0 <X86CompilationCallback_SSE+16>: mov %ebx,-0xc(%ebp)
0xb74b98f3 <X86CompilationCallback_SSE+19>: mov %edi,-0x4(%ebp)
0xb74b98f6 <X86CompilationCallback_SSE+22>: lea 0x40(%esi),%edi
0xb74b98f9 <X86CompilationCallback_SSE+25>: call 0xb7315577
<__i686.get_pc_thunk.bx>
0xb74b98fe <X86CompilationCallback_SSE+30>: add $0x76d71e,%ebx
0xb74b9904 <X86CompilationCal...
2007 Apr 30
0
[LLVMdev] Boostrap Failure -- Expected Differences?
...gt; ./loop-iv.o differs
> ./loop.o differs
> ./loop-unroll.o differs
> ./loop-unswitch.o differs
> ./modulo-sched.o differs
> ./optabs.o differs
> ./opts.o differs
> ./params.o differs
> ./passes.o differs
> ./postreload-gcse.o differs
> ./postreload.o differs
> ./predict.o differs
> ./pretty-print.o differs
> ./print-rtl.o differs
> ./print-tree.o differs
> ./profile.o differs
> ./real.o differs
> ./recog.o differs
> ./regclass.o differs
> ./regmove.o differs
> ./regrename.o differs
> ./reg-stack.o differs
> ./reload1.o differs
&g...
2007 Apr 27
2
[LLVMdev] Boostrap Failure -- Expected Differences?
...o differs
./loop-init.o differs
./loop-invariant.o differs
./loop-iv.o differs
./loop.o differs
./loop-unroll.o differs
./loop-unswitch.o differs
./modulo-sched.o differs
./optabs.o differs
./opts.o differs
./params.o differs
./passes.o differs
./postreload-gcse.o differs
./postreload.o differs
./predict.o differs
./pretty-print.o differs
./print-rtl.o differs
./print-tree.o differs
./profile.o differs
./real.o differs
./recog.o differs
./regclass.o differs
./regmove.o differs
./regrename.o differs
./reg-stack.o differs
./reload1.o differs
./reload.o differs
./resource.o differs
./rtlanal.o diffe...
2017 Aug 15
3
How to debug instruction selection
Hi there,
I try to JIT compile some bitcode and seeing the following error:
LLVM ERROR: Cannot select: 0x28ec830: ch,glue = X86ISD::CALL 0x28ec7c0, 0x28ef900, Register:i32 %EDI, Register:i8 %AL, RegisterMask:Untyped, 0x28ec7c0:1
0x28ef900: i32 = X86ISD::Wrapper TargetGlobalAddress:i32<void (i8*, ...)* @_ZN5FooBr7xprintfEPKcz> 0
0x28ec520: i32 = TargetGlobalAddress<void (i8*, ...)* @_ZN5FooBr7xprintfEPKcz> 0
0x28ec670: i32 = Register %EDI
0x28ec750: i...
2004 Sep 10
3
patch
...ion_asm_ia32
- ;[esp + 24] == autoc[]
- ;[esp + 20] == lag
- ;[esp + 16] == data_len
- ;[esp + 12] == data[]
+ ;[esp + 28] == autoc[]
+ ;[esp + 24] == lag
+ ;[esp + 20] == data_len
+ ;[esp + 16] == data[]
;ASSERT(lag > 0)
;ASSERT(lag <= 33)
@@ -71,21 +71,22 @@
.begin:
push esi
push edi
+ push ebx
; for(coeff = 0; coeff < lag; coeff++)
; autoc[coeff] = 0.0;
- mov edi, [esp + 24] ; edi == autoc
- mov ecx, [esp + 20] ; ecx = # of dwords (=lag) of 0 to write
+ mov edi, [esp + 28] ; edi == autoc
+ mov ecx, [esp + 24] ; ecx = # of dwords (=lag) of 0 to write
xor eax...
2009 Jun 21
1
Incidence Function Model in R help
...nction Model in R" and can't get past some error with my glm
arguments. I'm getting through
> attach(amphimedon_compressa)
>
plot(x.crd,y.crd,asp=1,xlab="Easting",ylab="Northing",pch=21,col=p+1,bg=5*p)
> d<-dist(cbind(x.crd,y.crd))
> alpha<-1
> edis<-as.matrix(exp(-alpha*d))
> diag(edis)<-0
> edis<-sweep(edis,2,A,"*")
> S<-rowSums(edis[,p>0])
> mod<-glm(p~offset(2*log(S))+log(A),family=binomial)
before I get the error message
Error: NA/NaN/Inf in foreign function call (arg 4)
which is an argument con...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ed;"
> >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> >> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> >> +FRAME_BEGIN
> >> +"push %rdi;"
> >> +"push %rdx;"
> >> +"movslq %edi, %rdi;"
> >> +"movq $steal_time+16, %rax;"
> >> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> >> +"cmpb $0, (%rdx,%rax);"
Could we not put the $steal_time+16 displacement as an immediate in the
cmpb and save a whole register here?...
2017 Feb 13
4
[PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function
...ed;"
> >> +".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
> >> +"__raw_callee_save___kvm_vcpu_is_preempted:"
> >> +FRAME_BEGIN
> >> +"push %rdi;"
> >> +"push %rdx;"
> >> +"movslq %edi, %rdi;"
> >> +"movq $steal_time+16, %rax;"
> >> +"movq __per_cpu_offset(,%rdi,8), %rdx;"
> >> +"cmpb $0, (%rdx,%rax);"
Could we not put the $steal_time+16 displacement as an immediate in the
cmpb and save a whole register here?...
2008 May 27
3
[LLVMdev] Float compare-for-equality and select optimization opportunity
...int t;
t = a;
a = b;
b = c;
c = t;
}
This is the resulting x86 assembly code:
movss xmm0,dword ptr [ecx+4]
ucomiss xmm0,dword ptr [ecx+8]
sete al
setnp dl
test dl,al
mov edx,edi
cmovne edx,ecx
cmovne ecx,esi
cmovne esi,edi
While I'm pleasantly surprised that my branch does get turned into several
select operations as intended (cmov - conditional move - in x86), I'm
confused why it uses the ucomiss instruction (unordered compare and set
flag...