Displaying 20 results from an estimated 330 matches for "retq".
Did you mean:
ret
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
...ed to:
vec.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z4fct1Dv4_f>:
0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
<_Z4fct1Dv4_f+0x9>
7: 00 00
9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
d: c3 retq
e: 66 90 xchg %ax,%ax
0000000000000010 <_Z4fct2Dv4_f>:
10: c5 f8 53 c0 vrcpps %xmm0,%xmm0
14: c3 retq
As you can see, 1.0 / x is not turned into vrcpps. Is it because of
precision or a missing optimization?
Regards,
--
Alexandre Bique
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
...;:
140: c5 f8 53 c8 vrcpps %xmm0,%xmm1
144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2 # 14d
<_Z4fct4Dv4_f+0xd>
14b: 00 00
14d: c4 e2 71 ac c2 vfnmadd213ps %xmm2,%xmm1,%xmm0
152: c4 e2 71 98 c1 vfmadd132ps %xmm1,%xmm1,%xmm0
157: c3 retq
158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
15f: 00
0000000000000160 <_Z4fct5Dv4_f>:
160: c5 f8 53 c0 vrcpps %xmm0,%xmm0
164: c3 retq
As you can see, fct4 is not equivalent to fct5.
Regards,
Alexandre Bique
On Tue, Sep 1, 2020 at 12:59 AM Quentin Colo...
2017 Mar 10
3
[ELF] [RFC] Padding between executable sections
...ections on some targets, 0x00 forms part of an
executable instruction that is not nop. In particular, for x86_64 targets
at least, the sequence 0x00 0x00 is an add instruction. This can result in
confusing disassembly.
For example, on x86_64, given a simple InputSection that is a single "0xc3
retq" instruction, and given an alignment of 16 bytes, 15 null bytes are
inserted between the end of that InputSection and the next. In the
disassembly I then see the retq instruction followed by a series of adds,
the last of which actually consumes 1 or more bytes of the next section to
form a val...
2016 May 24
5
Liveness of AL, AH and AX in x86 backend
....cfi_startproc
# BB#0: # %entry
movb (%rdi), %al
movzbl 1(%rdi), %ecx
movb %al, z(%rip)
movb %cl, z+1(%rip)
incb %al
shll $8, %ecx
movzbl %al, %eax
orl %ecx, %eax
retq
I was hoping it would do something along the lines of
movb (%rdi), %al
movb 1(%rdi), %ah
movh %ax, z(%rip)
incb %al
retq
Why is the x86 backend not getting this code? Does it know that AH:AL = AX?
-Krzysztof
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Fo...
2016 Jul 29
2
XRay: Demo on x86_64/Linux almost done; some questions.
Thanks for pointing this out, Tim. Then maybe this approach is not the best
choice for x86, though ideally measuring is needed, it is just that on ARM
the current x86 approach is not applicable because ARM doesn't have a
single return instruction (such as RETQ on x86_64), furthermore, the return
instructions on ARM can be conditional.
I have another question: what happens if the instrumented function (or its
callees) throws an exception and doesn't catch? I understood that currently
XRay will not report an exit from this function in such case becaus...
2017 Jan 24
7
[X86][AVX512] RFC: make i1 illegal in the Codegen
...i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
ret 8 x i32>%r
}
Can be lowered to
# BB#0:
kxnorw %k0, %k0, %k1
vpgatherqd (,%zmm1), %ymm0 {%k1}
retq
Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (...) and INSERT_VEC_ELEMENT(i1, ...) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk.
However, making i1 legal affected instruction sel...
2018 Apr 12
3
[RFC] __builtin_constant_p() Improvements
...ltin_constant_p(37))
return 927;
return 0;
}
int bar(int a) {
if (a)
return foo(42);
else
return mux();
}
Now outputs this code at -O1:
bar:
.cfi_startproc
# %bb.0: # %entry
testl %edi, %edi
movl $927, %ecx # imm = 0x39F
movl $1, %eax
cmovel %ecx, %eax
retq
And this code at -O0:
bar: # @bar
.cfi_startproc
# %bb.0: # %entry
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movl %edi, -16(%rbp)
cmpl $0, -16(%rbp)
je .LBB0_2
# %bb.1:...
2014 Oct 10
3
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
...<http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf>. It contains no mention of retl.
This seems to be the commit that added support for it <http://lists.cs.uiuc.edu/pipermail/llvm-branch-commits/2010-May/003229.html>.
I'm not sure I understand the distinction between retl/retq. x86 has 4 return instruction (cribbing from the Intel manual):
C3 RET Near return
CB RET Far return
C2 iw RET imm16 Near return + pop imm16 bytes
CA iw RET imm16 Far return + pop imm16 bytes
(And I think that's been true since the 8086.)
Distinguishing between near and far (e.g., ret vs....
2016 Oct 25
3
RFC: Absolute or "fixed address" symbols as immediate operands
...will receive PIC and GOT treatment.
If we use a declaration as you suggest, this example will compile awkwardly:
@foo = external global i8
define i64 @addfoo(i64 %v) {
%cast = ptrtoint i8* @foo to i64
%v1 = add i64 %v, %cast
ret i64 %v1
}
The ideal code is:
addfoo:
leaq foo(%rdi), %rax
retq
Today we select:
addfoo:
addq foo at GOTPCREL(%rip), %rdi
movq %rdi, %rax
retq
We could use attributes to try to convince ISel not to form GlobalAddress
nodes, but we might also want to consider just adding a new kind of global.
Chris's proposal of giving this n...
2015 Feb 25
1
[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
...je 400412 <main+0x12>
400409: 83 3d 40 0c 20 00 01 cmpl $0x1,0x200c40(%rip) # 601050 <x>
400410: 75 06 jne 400418 <main+0x18>
400412: b8 01 00 00 00 mov $0x1,%eax
400417: c3 retq
400418: 83 c8 ff or $0xffffffff,%eax
40041b: c3 retq
Hey my second check didn't get compiled away.. magic.
And to show that a random function call does the same, replace the barrier with random():
0000000000400440 <main>:...
2013 Mar 12
6
[LLVMdev] help decompiling x86 ASM to LLVM IR
...everly removing any branches):
0000000000000000 <test61>:
0: 83 ff 01 cmp $0x1,%edi
3: 19 c0 sbb %eax,%eax
5: 83 e0 df and $0xffffffdf,%eax
8: 83 c0 61 add $0x61,%eax
b: c3 retq
How would I represent the SBB instruction in LLVM IR?
Would I have to first convert the ASM to something like:
0000000000000000 <test61>:
0: cmp $0x1,%edi Block A
1: jb 4: Block A
2: mov 0x61,%ea...
2009 Jan 19
6
[LLVMdev] Load from abs address generated bad code on LLVM 2.4
...a problem where an absolute memory load
define i32 @foo() {
entry:
%0 = load i32* inttoptr (i64 12704196 to i32*) ; <i32> [#uses=1]
ret i32 %0
}
generates incorrect code on LLVM 2.4:
0x7ffff6d54010: mov 0xc1d9c4(%rip),%eax # 0x7ffff79719da
0x7ffff6d54016: retq
should be
0x7ffff6d54010: mov 0xc1d9c4, %eax
0x7ffff6d54016: retq
i.e. the IP-relative addressing mode is incorrect.
The current LLVM trunk does not have this bug. This seems quite a nasty
bug; is there any chance of a bug-fix release for LLVM 2.4, or should I
just use LLVM trunk until LLVM...
2017 Apr 05
2
Deopt operand bundle behavior
...() ]
ret void
}
We get this output machine code for x86_64:
_testFunc: ## @testFunc
.cfi_startproc
## BB#0: ## %entry
pushq %rax
Lcfi0:
.cfi_def_cfa_offset 16
callq _getCode
callq *%rax
Ltmp0:
popq %rax
retq
Without the deopt operand bundle:
_testFunc: ## @testFunc
.cfi_startproc
## BB#0: ## %entry
pushq %rax
Lcfi0:
.cfi_def_cfa_offset 16
callq _getCode
callq *%rdx
popq %rax
retq
For some reason with the d...
2016 Nov 17
4
RFC: Insertion of nops for performance stability
...5: 8b 00 movl (%rax), %eax
7: 01 c8 addl %ecx, %eax
9: 44 39 c0 cmpl %r8d, %eax
c: 75 0f jne 15 <foo+0x1D>
e: ff 05 00 00 00 00 incl (%rip)
14: ff 05 00 00 00 00 incl (%rip)
1a: 31 c0 xorl %eax, %eax
1c: c3 retq
1d: 44 39 c9 cmpl %r9d, %ecx
20: 74 ec je -20 <foo+0xE>
22: 48 8b 44 24 30 movq 48(%rsp), %rax
27: 2b 08 subl (%rax), %ecx
29: 39 d1 cmpl %edx, %ecx
2b: 7f e1 jg -31 <foo+0xE>
2d: 31 c0 xorl %eax, %eax...
2015 Feb 25
2
[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
Hey,
On 25-02-15 18:05, Ilia Mirkin wrote:
> On Wed, Feb 25, 2015 at 11:59 AM, Patrick Baggett
> <baggett.patrick at gmail.com> wrote:
>>> If code like
>>>
>>> x = *a;
>>> pthread_mutex_lock or unlock or __memory_barrier()
>>> y = *a;
>>>
>>> doesn't cause a to get loaded twice, then the compiler's in serious
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0
400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0
<__dso_handle+0x38>
40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r>
400574: xor %eax,%eax
400576: retq
$ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math
-march=native -mtune=native -DSPILLING_ENSUES=1 /* spilling */
$ objdump -dC --no-show-raw-insn ./a.out
...
00000000004004f0 <main>:
4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x>
4004f8: vpsrld $0...
2019 Apr 04
3
question about --emit-relocs with lld
...48 8b 04 25 d8 00 40 mov 0x4000d8,%rax
4000c7: 00
4000c4: R_X86_64_32S .L__const._start.instance+0x8
4000c8: 48 89 45 f8 mov %rax,-0x8(%rbp)
4000cc: 5d pop %rbp
4000cd: c3 retq
$ objdump -Sdr minimal.lld
...
0000000000201000 <_start>:
201000: 55 push %rbp
201001: 48 89 e5 mov %rsp,%rbp
201004: 48 8b 04 25 20 01 20 mov 0x200120,%rax
20100b: 00
201008: R_X86_64_32S...
2016 Jul 29
0
XRay: Demo on x86_64/Linux almost done; some questions.
...t;llvm-dev at lists.llvm.org> wrote:
> Can I ask you why you chose to patch both function entrances and exits,
> rather than just patching the entrances and (in the patches) pushing on the
> stack the address of __xray_FunctionExit , so that the user function returns
> normally (with RETQ or POP RIP or whatever else instruction) rather than
> jumping into __xray_FunctionExit?
> This approach should also be faster because smaller code better fits in CPU
> cache, and patching itself should run faster (because there is less code to
> modify).
It may well be slower. Larger...
2019 Sep 14
2
Side-channel resistant values
...npredictable() works at all. Even if we ignore cmp/br into switch conversion, it still doesn’t work:
int test_cmov(int left, int right, int *alt) {
return __builtin_unpredictable(left < right) ? *alt : 999;
}
Should generate:
test_cmov:
movl $999, %eax
cmpl %esi, %edi
cmovll (%rdx), %eax
retq
But currently generates:
test_cmov:
movl $999, %eax
cmpl %esi, %edi
jge .LBB0_2
movl (%rdx), %eax
.LBB0_2:
retq
> On Sep 14, 2019, at 12:18 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>
> I'm not sure if this is the entire problem, but SimplifyCFG loses the ...
2015 Dec 01
2
Expected behavior of __builtin_return_address() with inlining?
...s inlined into bar(), then __builtin_return_address(0)
will return &bar, instead of the instruction address of the call to foo().
I compiled with GCC 4.8.2 and Clang 3.8.0, and I got the following result:
# clang 3.8.0 trunk 253253
# clang -S -O2 test.c
foo:
movq (%rsp), %rax
retq
foobar:
movq (%rsp), %rax
retq
# end assembly
# GCC 4.8.2-19ubuntu1
# gcc -S -O2 test.c
foo:
movq (%rsp), %rax
ret
bar:
movq (%rsp), %rax
ret
foobar:
movq (%rsp), %rax
ret
# end assembly
So with both compilers, an inlined __builtin_retur...