similar to: Optimizing assembly generated for tail call

Displaying 20 results from an estimated 1000 matches similar to: "Optimizing assembly generated for tail call"

2010 Sep 01
0
[LLVMdev] equivalent IR, different asm
On Sep 1, 2010, at 6:25 AM, Argyrios Kyrtzidis wrote: > The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. > I suspect the usage of registers is wrong, can someone take a look ? The difference is that there is a shift right after the multiply, before the divide. In IR, the difference is: %5 = mul nsw i32 %4, %tmp1
2013 Aug 19
2
[LLVMdev] Duplicate loading of double constants
Hi, I found that in some cases llvm generates duplicate loads of double constants, e.g. $ cat t.c double f(double* p, int n) { double s = 0; if (n) s += *p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm0, %xmm0 testl %esi, %esi je .LBB0_2 # BB#1: xorps
2010 Sep 01
5
[LLVMdev] equivalent IR, different asm
The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. I suspect the usage of registers is wrong, can someone take a look ? $ llc opt-pass.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90
2020 Oct 03
2
Another tail call optimization question
Hello, Could anyone kindly explain to me why the 'g()' in the following function cannot have tail call optimization? > void f(int* x); > void g(); > void h(int v) { > f(&v); > g(); > } > A while ago I was taught that tail call optimization cannot apply if local variables needs to be kept alive, but 'g()' doesn't seem to require anything to be
2018 Apr 12
3
[RFC] __builtin_constant_p() Improvements
Hello again! I took a stab at PR4898[1]. The attached patch improves Clang's __builtin_constant_p support so that the Linux kernel is happy. With this improvement, Clang can determine if __builtin_constant_p is true or false after inlining. As an example: static __attribute__((always_inline)) int foo(int x) { if (__builtin_constant_p(x)) return 1; return 0; } static
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2016 May 27
2
Handling post-inc users in LSR
Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being
2019 Sep 14
2
Side-channel resistant values
I’m struggling to find cases where __builtin_unpredictable() works at all. Even if we ignore cmp/br into switch conversion, it still doesn’t work: int test_cmov(int left, int right, int *alt) { return __builtin_unpredictable(left < right) ? *alt : 999; } Should generate: test_cmov: movl $999, %eax cmpl %esi, %edi cmovll (%rdx), %eax retq But currently generates: test_cmov: movl $999,
2012 Jan 12
1
[LLVMdev] A question of Sparc assembly generated by llc
Hi, There are some generated Sparc assembly code like this: main: ! @main ! BB#0: save %sp, -112, %sp sethi 0, %l0 or %g0, 5, %l1 st %l0, [%fp+-4] st %l1, [%fp+-8] st %l1, [%fp+-12] sethi %hi(.L.str), %l1 ld [%fp+-8], %o1 add %l1, %lo(.L.str), %l1 or %g0, %l1, %o0 call printf nop ld [%fp+-12], %o2 ld [%fp+-8], %l2 sethi %hi(.L.strQ521), %l3 add
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner
2018 Apr 13
0
[RFC] __builtin_constant_p() Improvements
I actually was working on an updated patch for the LLVM-side of this, also. :) I was just working on some test cases; I'll post it soon. It's somewhat different than yours. I haven't touched the clang side yet, but I think it needs to be more complex than what you have there. I think it actually needs to be able to evaluate the intrinsic as a constant _false_ in the front-end in some
2016 May 27
0
Handling post-inc users in LSR
> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc
2019 Sep 14
2
Side-channel resistant values
Hi Chandler, I feel like this conversation has come full circle. So to ask again: how does one force CMOV to be emitted? You suggested “__builtin_unpredictable()” but that gets lost in various optimization passes. Given other architectures have CMOV like instructions, and given the usefulness of the instruction for performance tuning, it seems like a direct intrinsic would be best. What am I
2020 Sep 04
2
Performance of JIT execution
Hello, I recently noticed a performance issue of JIT execution vs native code of the following simple logic which computes the Fibonacci sequence: uint64_t fib(int n) { if (n <= 2) { return 1; } else { return fib(n-1) + fib(n-2); } } When compiled natively using clang++ with -O3, it took 0.17s to compute fib(40). However, when executing using LLJIT, fed with the IR output of "clang++
2020 Aug 07
2
JIT interaction with linkonce_odr global variables
Hello, I recently hit an issue when JIT'ing my generated IR using llvm::orc::LLJIT. My IR contains the following definition of a global variable: > $_ZZ23TestStaticVarInFunctionbE1x = comdat any > @_ZZ23TestStaticVarInFunctionbE1x = linkonce_odr dso_local global i32 123, > comdat, align 4 > And in my host process, there exists the same symbol. I would expect LLJIT to resolve the
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
All, If we can speculatively execute a load instruction, why isn’t it safe to hoist it out by promoting it to a scalar in LICM pass? There is a comment in LICM pass that if a load/store is conditional then it is not safe because it would break the LLVM concurrency model (See commit 73bfa4a). It has an IR test for checking this in test/Transforms/LICM/scalar-promote-memmodel.ll However, I have
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
Hi I have some inline function C code, that llvm could be optimizing better. Since I am new to this, I wonder if someone could give me a few pointers, how to approach this in LLVM. Should I try to change the IR code -somehow- to get the code generator to generate better code, or should I rather go to the code generator and try to add an optimization pass ? Thanks for any feedback. Ciao
2020 Sep 25
2
Understanding tail call
Hi friendly LLVM Devs, I'm trying to understand the technical details of tail call optimization, but unfortunately I hit some issues that I couldn't figure out myself. So I tried the following two really simple functions: > extern void g(int*); > void f1() { > int x; > g(&x); > } > void f2(int* x) { > g(x); > } > It turns out that 'f1'
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
The Problem? Nowaday, SCEV called "Scalar Evolution" does only evolate instructions that has predictable operand, Constant-Based operand. such as that can evolute as a constant. otherwise we couldn't evolate it as SCEV node, evolated as SCEVUnknown. important thing that we remember is, we do not use SCEV only for Loop Deletion, which that doesn't really needed on nature loops
2020 Jul 20
2
[ARM] Should Use Load and Store with Register Offset
Hello LLVM Community (specifically anyone working with ARM Cortex-M), While trying to compile the Newlib C library I found that Clang10 was generating slightly larger binaries than the libc from the prebuilt gcc-arm-none-eabi toolchain. I looked at a few specific functions (memcpy, strcpy, etc.) and noticed that LLVM does not tend to generate load/store instructions with a register offset (e.g.