thr3ads.net - similar to: "[LLVMdev] [Q] x86 peephole deficiency"

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] [Q] x86 peephole deficiency"

2010 Oct 07

[LLVMdev] [Q] x86 peephole deficiency

On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz >

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 13

[LLVMdev] [Q] x86 peephole deficiency

Am 07.10.2010 um 19:50 schrieb Chris Lattner: > > On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > >> Hi all, >> >> I am slowly working on a SwitchInst optimizer (http://llvm.org/ >> PR8125) >> and now I am running into a deficiency of the x86 >> peephole optimizer (or jump-threader?). Here is what I get: >> >> >> andl $3,

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

question about xray tls data initialization

2017 Nov 21

question about xray tls data initialization

with some dirty hack , I've made xray runtime 'built' on windows , but unfortunately I haven't enough knowledge about linker and the runtime, and finally built executable didn't run. I'd like to share my changes here , hopes somebody help me to make it run on windows. in AsmPrinter, copy/paster xray for coff target InstMap =

[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012

2013 Aug 19

[LLVMdev] Issue with X86FrameLowering __chkstk on Windows 8 64-bit / Visual Studio 2012

Hi, I'm using LLVM to convert expressions to native assembly, the problem is when LLVM compiles this code: define void @fn_0000000000000000(i8*, i8*, i8*) { bb: %res = alloca i32 %3 = load i32* %res %4 = bitcast i8* %0 to i32* %5 = load i32* %4 %6 = bitcast i8* %0 to i32* %7 = load i32* %6 %8 = xor i32 %5, %7 store volatile i32 %8, i32* %res %9 = load i32* %res %10 = icmp

question about xray tls data initialization

2017 Nov 16

question about xray tls data initialization

I'm learning the xray library and try if it can be built on windows, in xray_fdr_logging_impl.h line 152 , comment written as // Using pthread_once(...) to initialize the thread-local data structures but at line 175, 183, code written as thread_local pthread_key_t key; // Ensure that we only actually ever do the pthread initialization once. thread_local bool UNUSED Unused = [] {

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

The Sparc, Microblaze, and Mips code generators implement branch delay slots. They all seem to exhibit the same bug, which is not surprising since the code is very similar. If I compile code with this snippit: while (n--) *s++ = (char) c; I get this (for the Microblaze): swi r19, r1, 0 add r3, r0, r0 cmp r3, r3, r7 beqid r3,

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 25

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

For the following code fragment, ; <label>:27 ; preds = %27, %entry %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 %29 = icmp slt i32 %28, 0 br i1 %29, label %27, label %loop.exit loop.exit: ; preds = %27 llc will generate following MIPS code, $BB0_1: lui $3, 32800 ori $3, $3, 1032 lw

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote: > For the following code fragment, > > ; <label>:27 ; preds = %27, %entry > %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 > %29 = icmp slt i32 %28, 0 > br i1 %29, label %27, label %loop.exit > > loop.exit: ; preds = %27

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

I didn't have Mips board. I compile as the commands and check the asm output as below. 1. Question: The distance of caller arg[4] and arg[5] is 4 bytes. But the the callee get every arg[] by 8 bytes offset (arg_ptr1+8 or arg_ptr2+8). I assume the #BB#4 and #BB#5 are the arg_ptr which is the pointer to access the stack arguments. 2. Question: Stack memory 28($sp) has no initial value. If

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

Does it make a difference if you give the "-target" option to clang? $ clang -target mips-linux-gnu ch8_3.cpp -o ch8_3.bc -emit-llvm -c The .s file generated this way looks quite different from the one in your email. On Tue, Feb 19, 2013 at 5:06 PM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I didn't have Mips board. I compile as the commands and check the asm >

[LLVMdev] Question regarding basic-block placement optimization

2011 Oct 19

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 6:58 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote: > > On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote: > > As for why it should be an IR pass, mostly because once the selection dag >> runs through the code, we can never recover all of the freedom we have at >> the IR level. To start with, splicing MBBs around requires known about

Comparing Clang and GCC: only clang stores updated value in each iteration.

2018 Sep 20

Comparing Clang and GCC: only clang stores updated value in each iteration.

Hi, I have a benchmark (mcf) that is currently slower when compiled with clang compared to gcc 8 (~10%). It seems that a hot loop has a few differences, where one interesting one is that while clang stores an incremented value in each iteration, gcc waits and just stores the final value just once after the loop. The value is a global variable. I wonder if this is something clang does not do

[LLVMdev] .globl

2013 Sep 02

[LLVMdev] .globl

Hi Reed, Still catching up on email, so hope this isn't already covered... reed kotler <rkotler at mips.com> writes: > I have a strange issue that I encountered with mips16 hard float. > > Part of mips16 hard float is to emit calls to runtime routines with the > same signature as usual soft float routines, except that they are > implemented using mips32 code which uses

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On 04/29/2012 01:19 PM, Evan Cheng wrote: > On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote: > >> For the following code fragment, >> >> ;<label>:27 ; preds = %27, %entry >> %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 >> %29 = icmp slt i32 %28, 0 >> br i1 %29, label %27, label

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

2014 Mar 26

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

On 03/26/2014 11:36 AM, Geoffrey Irving wrote: > I am trying to compute conservative lower and upper bounds for the > square of a double. I have set the rounding mode to FE_UPWARDS > elsewhere, so the code is > > struct Interval { > double nlo, hi; > }; > > Interval inspect_singleton_sqr(const double x) { > Interval s; > s.nlo = x * -x; > s.hi = x *

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

On 12/14/2010 04:28 PM, Wesley Peck wrote: > On Dec 14, 2010, at 3:46 PM, Richard Pennington wrote: >> Notice that the label $BB0_1 is missing. If I disable filling in the >> branch delay slots, I get: > > Is this with the latest SVN HEAD version of LLVM or some other version? The delay slot filler and many other things have been updated for the Microblaze backend. In

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

On Dec 14, 2010, at 3:46 PM, Richard Pennington wrote: > Notice that the label $BB0_1 is missing. If I disable filling in the > branch delay slots, I get: Is this with the latest SVN HEAD version of LLVM or some other version? The delay slot filler and many other things have been updated for the Microblaze backend. In particular, the commit r120095 for the MBlaze backend fixed some issues

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 13

[LLVMdev] [Q] x86 peephole deficiency

On Oct 13, 2010, at 11:22 AM, Gabor Greif wrote: > Hi Chris, > > I had a look into MachineCSE, but it looks like MBB-oriented. > The above problem is an inter-block one. Also MCSE seems > to perform value numbering on virtual/physical registers, which > does not map very well to status register bits that are implicitly > defined. > Any chance to recast this issue as a

similar to: [LLVMdev] [Q] x86 peephole deficiency