Displaying 20 results from an estimated 1000 matches similar to: "How can I tell llvm, that a branch is preferred ?"
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
Hi,
Recently 10% performance regression on an important benchmark showed up
after we integrated https://reviews.llvm.org/rL318299. The analysis showed
that rL318299 triggered loop rotation on an multi exits loop, and the loop
rotation introduced code layout issue. The performance regression is a
side-effect of rL318299. I got two testcases a.ll and b.ll attached to
illustrate the problem. a.ll
2017 Dec 19
2
A code layout related side-effect introduced by rL318299
On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com>
wrote:
> The introduction of cleanup.cond block in b.ll without loop-rotation
> already makes the layout worse than a.ll.
>
>
> Without introducing cleanup.cond block, the layout out is
>
> entry->while.cond -> while.body->ret
>
> All the arrows are hot fall through edges which is
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
The Problem?
Nowaday, SCEV called "Scalar Evolution" does only evolate instructions that
has predictable operand,
Constant-Based operand. such as that can evolute as a constant.
otherwise we couldn't evolate it as SCEV node, evolated as SCEVUnknown.
important thing that we remember is, we do not use SCEV only for Loop
Deletion,
which that doesn't really needed on nature loops
2017 May 30
3
[atomics][AArch64] Possible bug in cmpxchg lowering
Currently the AtomicExpandPass will lower the following IR:
define i1 @foo(i32* %obj, i32 %old, i32 %new) {
entry:
%v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new _*release
acquire*_
%v1 = extractvalue { i32, i1 } %v0, 1
ret i1 %v1
}
to the equivalent of the following on AArch64:
_*ldxr w8, [x0]*_
cmp w8, w1
b.ne .LBB0_3
// BB#1:
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll,
while clang/LLVM recognizes common bit-twiddling idioms/expressions
like
unsigned int rotate(unsigned int x, unsigned int n)
{
return (x << n) | (x >> (32 - n));
}
and typically generates "rotate" machine instructions for this
expression, it fails to recognize other also common bit-twiddling
idioms/expressions.
The standard IEEE CRC-32 for "big
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote:
> IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like
> this:
> unsigned int foo(unsigned int crc) {
> if (crc & 0x80000000)
> crc <<= 1, crc ^= 0xEDB88320;
> else
> crc <<= 1;
> return crc;
> }
To document this for x86 too: rewrite the function
2016 Jun 28
2
Instruction selection problem with type i64 - mistaken as v8i64?
Hello.
I am writing a back end in which I combined the existing BPF LLVM back end with the
Mips MSA vector extensions (from the Mips back end)
I have encountered an error when compiling with llc: the instruction selector uses a
vector register instead of a scalar register with type i64 .
I have the following part of LLVM IR program:
vector.body.preheader:
2013 Aug 19
3
[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012
Hi,
I'm using LLVM to convert expressions to native assembly, the problem
is when LLVM compiles this code:
define void @fn_0000000000000000(i8*, i8*, i8*) {
bb:
%res = alloca i32
%3 = load i32* %res
%4 = bitcast i8* %0 to i32*
%5 = load i32* %4
%6 = bitcast i8* %0 to i32*
%7 = load i32* %6
%8 = xor i32 %5, %7
store volatile i32 %8, i32* %res
%9 = load i32* %res
%10 = icmp
2014 May 11
2
[LLVMdev] [cfe-dev] Code generation for noexcept functions
On Sun, May 11, 2014 at 8:19 AM, Stephan Tolksdorf <st at quanttec.com> wrote:
> Hi,
>
> When clang/LLVM can't prove that a noexcept function only contains
> non-throwing code, it seems to insert an explicit exception handler that
> calls std::terminate. Why doesn't clang leave it to the eh personality
> function to call std::terminate when an exception is thrown
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
Hi
I have some inline function C code, that llvm could be optimizing better.
Since I am new to this, I wonder if someone could give me a few pointers, how to approach this in LLVM.
Should I try to change the IR code -somehow- to get the code generator to generate better code, or should I rather go to the code generator and try to add an optimization pass ?
Thanks for any feedback.
Ciao
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
Hello all,
I've been working for the last month or so on a comprehensive mitigation
approach to variant #1 of Spectre. There are a bunch of reasons why this is
desirable:
- Critical software that is unlikely to be easily hand-mitigated (or where
the performance tradeoff isn't worth it) will have a compelling option.
- It gives us a baseline on performance for hand-mitigation.
- Combined
2010 Oct 04
2
[LLVMdev] missing blocks
I suspect this is a straight forward problem so I thought I'd ask.
I'm developing a new backend. I recently updated from the LLVM
repository and now my output assembly is branching to labels/blocks that
have been removed. It had been working fine two weeks ago. What looks
suspicious is the following message:
TryTailMergeBlocks: BB#1, BB#3, BB#4
Looking for common tails of
2013 Aug 06
1
[LLVMdev] Patching jump tables at run-time
I am looking for guidance on how to:
1.
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the very simple example
----
const int SIZE = 3;
int the_func(int* p_array)
{
int dummy = 0;
#if defined(ITER)
for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p;
#else
for(int i = 0; i < SIZE; ++i) dummy += p_array[i];
#endif
return dummy;
}
int main(int argc, char** argv)
{
2010 Oct 07
2
[LLVMdev] [Q] x86 peephole deficiency
Hi all,
I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125)
and now I am running into a deficiency of the x86
peephole optimizer (or jump-threader?). Here is what I get:
andl $3, %edi
je .LBB0_4
# BB#2: # %nz
# in Loop: Header=BB0_1
Depth=1
cmpl $2, %edi
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael,
Sometime back I did some experiments with interleave vectorizer and did not found any degrade,
probably my tests/benchmarks are not extensive enough to cover much.
Elina is the right person to comment on it as she already experienced cases where it hinders performance.
For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
Am 03.03.2015 um 19:49 schrieb Philip Reames <listmail at philipreames.com>:
Hi Philip
first thanks for your response,
> You'll need to prove a bit more information to get any useful response. Questions:
> 1) What's you're use case? Are you using clang to compile C code? Are you manually generating LLVM IR?
yes the "inline function C code" will be compiled
2013 Aug 27
0
[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012
It's not a solution to the actual bug (which is, as the thread you linked
discusses, a problem with the assumption on LLVM's part that the __chkstk
function lies within 2GB of the emitted code's address space) but there is
a simple workaround: hoist all allocas to the first basic block of your
function. This allows the JIT to perform all stack allocations in a single
adjustment of the
2017 Aug 02
3
[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass
Hi,
We recently found a testcase showing that simplifications in
instcombine sometimes change the instruction without reducing the
instruction cost, but causing problems in TwoAddressInstruction pass.
And it looks like the problem is generic and other simplification may
have the same issue. I want to get some ideas about what is the best
way to fix such kind of problem.
The testcase:
2020 Jun 01
3
Aarch64: unaligned access despite -mstrict-align
Hi,
I experienced a crash in code compiled with Clang 10.0.0 due to a
misaligned 64-bit data access. The (ARMv8) CPU is configured with SCTL.A
== 1 (alignment check enable). With SCTLR.A == 0 the code runs as expected.
After some investigation I came up with the following reproducer:
---8<-------8<-------8<-------8<-------8<-------8<-------8<-------
$ cat test.c
extern char