thr3ads.net - search: "retq"

2020 Aug 31

2

Should llvm optimize 1.0 / x ?

...ed to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9 <_Z4fct1Dv4_f+0x9> 7: 00 00 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0 d: c3 retq e: 66 90 xchg %ax,%ax 0000000000000010 <_Z4fct2Dv4_f>: 10: c5 f8 53 c0 vrcpps %xmm0,%xmm0 14: c3 retq As you can see, 1.0 / x is not turned into vrcpps. Is it because of precision or a missing optimization? Regards, -- Alexandre Bique

Should llvm optimize 1.0 / x ?

2020 Sep 01

2

Should llvm optimize 1.0 / x ?

...;: 140: c5 f8 53 c8 vrcpps %xmm0,%xmm1 144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2 # 14d <_Z4fct4Dv4_f+0xd> 14b: 00 00 14d: c4 e2 71 ac c2 vfnmadd213ps %xmm2,%xmm1,%xmm0 152: c4 e2 71 98 c1 vfmadd132ps %xmm1,%xmm1,%xmm0 157: c3 retq 158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 15f: 00 0000000000000160 <_Z4fct5Dv4_f>: 160: c5 f8 53 c0 vrcpps %xmm0,%xmm0 164: c3 retq As you can see, fct4 is not equivalent to fct5. Regards, Alexandre Bique On Tue, Sep 1, 2020 at 12:59 AM Quentin Colo...

[ELF] [RFC] Padding between executable sections

2017 Mar 10

3

[ELF] [RFC] Padding between executable sections

...ections on some targets, 0x00 forms part of an executable instruction that is not nop. In particular, for x86_64 targets at least, the sequence 0x00 0x00 is an add instruction. This can result in confusing disassembly. For example, on x86_64, given a simple InputSection that is a single "0xc3 retq" instruction, and given an alignment of 16 bytes, 15 null bytes are inserted between the end of that InputSection and the next. In the disassembly I then see the retq instruction followed by a series of adds, the last of which actually consumes 1 or more bytes of the next section to form a val...

Liveness of AL, AH and AX in x86 backend

2016 May 24

5

Liveness of AL, AH and AX in x86 backend

....cfi_startproc # BB#0: # %entry movb (%rdi), %al movzbl 1(%rdi), %ecx movb %al, z(%rip) movb %cl, z+1(%rip) incb %al shll $8, %ecx movzbl %al, %eax orl %ecx, %eax retq I was hoping it would do something along the lines of movb (%rdi), %al movb 1(%rdi), %ah movh %ax, z(%rip) incb %al retq Why is the x86 backend not getting this code? Does it know that AH:AL = AX? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Fo...

XRay: Demo on x86_64/Linux almost done; some questions.

2016 Jul 29

2

XRay: Demo on x86_64/Linux almost done; some questions.

Thanks for pointing this out, Tim. Then maybe this approach is not the best choice for x86, though ideally measuring is needed, it is just that on ARM the current x86 approach is not applicable because ARM doesn't have a single return instruction (such as RETQ on x86_64), furthermore, the return instructions on ARM can be conditional. I have another question: what happens if the instrumented function (or its callees) throws an exception and doesn't catch? I understood that currently XRay will not report an exit from this function in such case becaus...

[X86][AVX512] RFC: make i1 illegal in the Codegen

2017 Jan 24

7

[X86][AVX512] RFC: make i1 illegal in the Codegen

...i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef) ret 8 x i32>%r } Can be lowered to # BB#0: kxnorw %k0, %k0, %k1 vpgatherqd (,%zmm1), %ymm0 {%k1} retq Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (...) and INSERT_VEC_ELEMENT(i1, ...) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk. However, making i1 legal affected instruction sel...

[RFC] __builtin_constant_p() Improvements

2018 Apr 12

3

[RFC] __builtin_constant_p() Improvements

...ltin_constant_p(37)) return 927; return 0; } int bar(int a) { if (a) return foo(42); else return mux(); } Now outputs this code at -O1: bar: .cfi_startproc # %bb.0: # %entry testl %edi, %edi movl $927, %ecx # imm = 0x39F movl $1, %eax cmovel %ecx, %eax retq And this code at -O0: bar: # @bar .cfi_startproc # %bb.0: # %entry pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movl %edi, -16(%rbp) cmpl $0, -16(%rbp) je .LBB0_2 # %bb.1:...

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

2014 Oct 10

3

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

...<http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf>. It contains no mention of retl. This seems to be the commit that added support for it <http://lists.cs.uiuc.edu/pipermail/llvm-branch-commits/2010-May/003229.html>. I'm not sure I understand the distinction between retl/retq. x86 has 4 return instruction (cribbing from the Intel manual): C3 RET Near return CB RET Far return C2 iw RET imm16 Near return + pop imm16 bytes CA iw RET imm16 Far return + pop imm16 bytes (And I think that's been true since the 8086.) Distinguishing between near and far (e.g., ret vs....

RFC: Absolute or "fixed address" symbols as immediate operands

2016 Oct 25

3

RFC: Absolute or "fixed address" symbols as immediate operands

...will receive PIC and GOT treatment. If we use a declaration as you suggest, this example will compile awkwardly: @foo = external global i8 define i64 @addfoo(i64 %v) { %cast = ptrtoint i8* @foo to i64 %v1 = add i64 %v, %cast ret i64 %v1 } The ideal code is: addfoo: leaq foo(%rdi), %rax retq Today we select: addfoo: addq foo at GOTPCREL(%rip), %rdi movq %rdi, %rax retq We could use attributes to try to convince ISel not to form GlobalAddress nodes, but we might also want to consider just adding a new kind of global. Chris's proposal of giving this n...

[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

2015 Feb 25

1

[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

...je 400412 <main+0x12> 400409: 83 3d 40 0c 20 00 01 cmpl $0x1,0x200c40(%rip) # 601050 <x> 400410: 75 06 jne 400418 <main+0x18> 400412: b8 01 00 00 00 mov $0x1,%eax 400417: c3 retq 400418: 83 c8 ff or $0xffffffff,%eax 40041b: c3 retq Hey my second check didn't get compiled away.. magic. And to show that a random function call does the same, replace the barrier with random(): 0000000000400440 <main>:...

[LLVMdev] help decompiling x86 ASM to LLVM IR

2013 Mar 12

6

[LLVMdev] help decompiling x86 ASM to LLVM IR

...everly removing any branches): 0000000000000000 <test61>: 0: 83 ff 01 cmp $0x1,%edi 3: 19 c0 sbb %eax,%eax 5: 83 e0 df and $0xffffffdf,%eax 8: 83 c0 61 add $0x61,%eax b: c3 retq How would I represent the SBB instruction in LLVM IR? Would I have to first convert the ASM to something like: 0000000000000000 <test61>: 0: cmp $0x1,%edi Block A 1: jb 4: Block A 2: mov 0x61,%ea...

[LLVMdev] Load from abs address generated bad code on LLVM 2.4

2009 Jan 19

6

[LLVMdev] Load from abs address generated bad code on LLVM 2.4

...a problem where an absolute memory load define i32 @foo() { entry: %0 = load i32* inttoptr (i64 12704196 to i32*) ; <i32> [#uses=1] ret i32 %0 } generates incorrect code on LLVM 2.4: 0x7ffff6d54010: mov 0xc1d9c4(%rip),%eax # 0x7ffff79719da 0x7ffff6d54016: retq should be 0x7ffff6d54010: mov 0xc1d9c4, %eax 0x7ffff6d54016: retq i.e. the IP-relative addressing mode is incorrect. The current LLVM trunk does not have this bug. This seems quite a nasty bug; is there any chance of a bug-fix release for LLVM 2.4, or should I just use LLVM trunk until LLVM...

Deopt operand bundle behavior

2017 Apr 05

2

Deopt operand bundle behavior

...() ] ret void } We get this output machine code for x86_64: _testFunc: ## @testFunc .cfi_startproc ## BB#0: ## %entry pushq %rax Lcfi0: .cfi_def_cfa_offset 16 callq _getCode callq *%rax Ltmp0: popq %rax retq Without the deopt operand bundle: _testFunc: ## @testFunc .cfi_startproc ## BB#0: ## %entry pushq %rax Lcfi0: .cfi_def_cfa_offset 16 callq _getCode callq *%rdx popq %rax retq For some reason with the d...

RFC: Insertion of nops for performance stability

2016 Nov 17

4

RFC: Insertion of nops for performance stability

...5: 8b 00 movl (%rax), %eax 7: 01 c8 addl %ecx, %eax 9: 44 39 c0 cmpl %r8d, %eax c: 75 0f jne 15 <foo+0x1D> e: ff 05 00 00 00 00 incl (%rip) 14: ff 05 00 00 00 00 incl (%rip) 1a: 31 c0 xorl %eax, %eax 1c: c3 retq 1d: 44 39 c9 cmpl %r9d, %ecx 20: 74 ec je -20 <foo+0xE> 22: 48 8b 44 24 30 movq 48(%rsp), %rax 27: 2b 08 subl (%rax), %ecx 29: 39 d1 cmpl %edx, %ecx 2b: 7f e1 jg -31 <foo+0xE> 2d: 31 c0 xorl %eax, %eax...

[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

2015 Feb 25

2

[PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

Hey, On 25-02-15 18:05, Ilia Mirkin wrote: > On Wed, Feb 25, 2015 at 11:59 AM, Patrick Baggett > <baggett.patrick at gmail.com> wrote: >>> If code like >>> >>> x = *a; >>> pthread_mutex_lock or unlock or __memory_barrier() >>> y = *a; >>> >>> doesn't cause a to get loaded twice, then the compiler's in serious

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0 400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0 <__dso_handle+0x38> 40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r> 400574: xor %eax,%eax 400576: retq $ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math -march=native -mtune=native -DSPILLING_ENSUES=1 /* spilling */ $ objdump -dC --no-show-raw-insn ./a.out ... 00000000004004f0 <main>: 4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x> 4004f8: vpsrld $0...

question about --emit-relocs with lld

2019 Apr 04

3

question about --emit-relocs with lld

...48 8b 04 25 d8 00 40 mov 0x4000d8,%rax 4000c7: 00 4000c4: R_X86_64_32S .L__const._start.instance+0x8 4000c8: 48 89 45 f8 mov %rax,-0x8(%rbp) 4000cc: 5d pop %rbp 4000cd: c3 retq $ objdump -Sdr minimal.lld ... 0000000000201000 <_start>: 201000: 55 push %rbp 201001: 48 89 e5 mov %rsp,%rbp 201004: 48 8b 04 25 20 01 20 mov 0x200120,%rax 20100b: 00 201008: R_X86_64_32S...

XRay: Demo on x86_64/Linux almost done; some questions.

2016 Jul 29

0

XRay: Demo on x86_64/Linux almost done; some questions.

...t;llvm-dev at lists.llvm.org> wrote: > Can I ask you why you chose to patch both function entrances and exits, > rather than just patching the entrances and (in the patches) pushing on the > stack the address of __xray_FunctionExit , so that the user function returns > normally (with RETQ or POP RIP or whatever else instruction) rather than > jumping into __xray_FunctionExit? > This approach should also be faster because smaller code better fits in CPU > cache, and patching itself should run faster (because there is less code to > modify). It may well be slower. Larger...

Side-channel resistant values

2019 Sep 14

2

Side-channel resistant values

...npredictable() works at all. Even if we ignore cmp/br into switch conversion, it still doesn’t work: int test_cmov(int left, int right, int *alt) { return __builtin_unpredictable(left < right) ? *alt : 999; } Should generate: test_cmov: movl $999, %eax cmpl %esi, %edi cmovll (%rdx), %eax retq But currently generates: test_cmov: movl $999, %eax cmpl %esi, %edi jge .LBB0_2 movl (%rdx), %eax .LBB0_2: retq > On Sep 14, 2019, at 12:18 AM, Sanjay Patel <spatel at rotateright.com> wrote: > > I'm not sure if this is the entire problem, but SimplifyCFG loses the ...

Expected behavior of __builtin_return_address() with inlining?

2015 Dec 01

2

Expected behavior of __builtin_return_address() with inlining?

...s inlined into bar(), then __builtin_return_address(0) will return &bar, instead of the instruction address of the call to foo(). I compiled with GCC 4.8.2 and Clang 3.8.0, and I got the following result: # clang 3.8.0 trunk 253253 # clang -S -O2 test.c foo: movq (%rsp), %rax retq foobar: movq (%rsp), %rax retq # end assembly # GCC 4.8.2-19ubuntu1 # gcc -S -O2 test.c foo: movq (%rsp), %rax ret bar: movq (%rsp), %rax ret foobar: movq (%rsp), %rax ret # end assembly So with both compilers, an inlined __builtin_retur...

search for: retq