similar to: [LLVMdev] Need a clue to improve the optimization of some C code

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] Need a clue to improve the optimization of some C code"

2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
Am 03.03.2015 um 19:49 schrieb Philip Reames <listmail at philipreames.com>: Hi Philip first thanks for your response, > You'll need to prove a bit more information to get any useful response. Questions: > 1) What's you're use case? Are you using clang to compile C code? Are you manually generating LLVM IR? yes the "inline function C code" will be compiled
2016 Feb 11
3
Expected constant simplification not happening
Hi the appended IR code does not optimize to my liking :) this is the interesting part in x86_64, that got produced via clang -Os: --- movq -16(%r12), %rax movl -4(%rax), %ecx andl $2298949, %ecx ## imm = 0x231445 cmpq $2298949, (%rax,%rcx) ## imm = 0x231445 leaq 8(%rax,%rcx), %rax cmovneq %r15, %rax movl $2298949, %esi ## imm = 0x231445 movq %r12, %rdi movq %r14,
2016 Dec 07
1
Expected constant simplification not happening
Hello Has there been any progress on this topic ? The 3.9 optimizer output is still the same as I just looked. https://llvm.org/bugs/show_bug.cgi?id=24448 Ciao Nat! Sanjay Patel schrieb: > [cc'ing Zia] > > We have this transform with -Os for some cases after: > http://reviews.llvm.org/rL244601 > http://reviews.llvm.org/D11363 > > but something in this example is
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll
2017 Dec 19
2
A code layout related side-effect introduced by rL318299
On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
The Problem? Nowaday, SCEV called "Scalar Evolution" does only evolate instructions that has predictable operand, Constant-Based operand. such as that can evolute as a constant. otherwise we couldn't evolate it as SCEV node, evolated as SCEVUnknown. important thing that we remember is, we do not use SCEV only for Loop Deletion, which that doesn't really needed on nature loops
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
All, If we can speculatively execute a load instruction, why isn’t it safe to hoist it out by promoting it to a scalar in LICM pass? There is a comment in LICM pass that if a load/store is conditional then it is not safe because it would break the LLVM concurrency model (See commit 73bfa4a). It has an IR test for checking this in test/Transforms/LICM/scalar-promote-memmodel.ll However, I have
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
I think gcc is right. It inserted a branch for n == 0 (the cbz at the top), so that's not a problem. In all other regards, this is safe: if you examine the sequence of loads and stores, it eliminated all but the first load and all but the last store. How's that unsafe? If I had to guess, the bug here is that LLVM doesn't want to hoist the load over the condition (which it is right
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
Hello all, I've been working for the last month or so on a comprehensive mitigation approach to variant #1 of Spectre. There are a bunch of reasons why this is desirable: - Critical software that is unlikely to be easily hand-mitigated (or where the performance tradeoff isn't worth it) will have a compelling option. - It gives us a baseline on performance for hand-mitigation. - Combined
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
Thanks for the background on the concurrent memory model. So, is it sufficient that the loop entry is guarded by condition (cbz at top) for preventing the race? The loop entry will be guarded by condition if loop has been rotated by loop rotate pass. Since LICM runs after loop rotate, we can use ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively execute load without
2010 Sep 01
5
[LLVMdev] equivalent IR, different asm
The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. I suspect the usage of registers is wrong, can someone take a look ? $ llc opt-pass.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90
2010 Sep 01
0
[LLVMdev] equivalent IR, different asm
On Sep 1, 2010, at 6:25 AM, Argyrios Kyrtzidis wrote: > The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. > I suspect the usage of registers is wrong, can someone take a look ? The difference is that there is a shift right after the multiply, before the divide. In IR, the difference is: %5 = mul nsw i32 %4, %tmp1
2014 May 11
2
[LLVMdev] [cfe-dev] Code generation for noexcept functions
On Sun, May 11, 2014 at 8:19 AM, Stephan Tolksdorf <st at quanttec.com> wrote: > Hi, > > When clang/LLVM can't prove that a noexcept function only contains > non-throwing code, it seems to insert an explicit exception handler that > calls std::terminate. Why doesn't clang leave it to the eh personality > function to call std::terminate when an exception is thrown
2016 Jun 30
4
Help required regarding IPRA and Local Function optimization
Hello Mentors, I am currently finding bug in Local Function related optimization due to which runtime failures are observed in some test cases, as those test cases are containing very large function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider