thr3ads.net - similar to: "RFC: Insertion of nops for performance stability"

Displaying 20 results from an estimated 5000 matches similar to: "RFC: Insertion of nops for performance stability"

RFC: Insertion of nops for performance stability

2016 Nov 20

RFC: Insertion of nops for performance stability

Hi Hal, A pre-emit pass will indeed be preferable. I originally thought of it, too, however I could not figure out how can such a pass have an access to information on instruction sizes and block alignments. I know that for X86, at least, the branch relaxation is happening during the layout phase in the Assembler, where I plan to integrate the nop insertion such that the new MCPerfNopFragment

RFC: Insertion of nops for performance stability

2016 Nov 21

RFC: Insertion of nops for performance stability

Hi Hal, Thanks for the reference. I’ve looked at PPCBranchSelector and the PowerPC backend. It is very different from the X86 architecture and unfortunately the way branch relaxation and alignment related issues are handled in PPC cannot be copied to X86. This is because: 1. PPC instructions are of fixed length while X86 instructions are of variable length, and their length can change

more reassociation in IR

2018 May 09

more reassociation in IR

When you say that distribution shouldn't be used, do you mean within instcombine rather than some other pass? Or not all as an IR optimization? A dedicated optimization pass that looks for and makes factoring/distribution folds to eliminate instructions seems like it would solve the problems that I'm seeing. Ie, I'm leaning towards the proposal here: https://reviews.llvm.org/D41574

[RFC] New pass: LoopExitValues

2015 Aug 31

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

[RFC] New pass: LoopExitValues

2015 Sep 01

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

more reassociation in IR

2018 May 09

more reassociation in IR

> On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > 1. The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better. Agreed. The original mindset included a (naive) belief that going with a canonical form was better than teaching redundancy elimination to handle abstractions (as a matter

[LLVMdev] Poor register allocations vs gcc

2015 Jul 13

[LLVMdev] Poor register allocations vs gcc

Hello, I have an issue with the llvm optimizations. I need to create object codes. the -ON PURPOSE poor && useless- code : --------------------------------------------------- #include <stdio.h> #include <stdlib.h> int ci(int a){ return 23; } int flop(int a, char ** c){ a += 71; int b = 0; if (a == 56){ b = 69; b += ci(a); } puts("ok"); return a +

[X86][AVX512] RFC: make i1 illegal in the Codegen

2017 Jan 24

[X86][AVX512] RFC: make i1 illegal in the Codegen

Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,

[LLVMdev] nested function's static link gets clobbered

2008 Oct 31

[LLVMdev] nested function's static link gets clobbered

Fellow developers, I'm parallelizing loops to be called by pthread. The thread body that I pass to pthread_create looks like define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) parent_frame is pointer to shared variables in original function 0x00007f0de11c41f0: mov (%r10),%rax 0x00007f0de11c41f3: cmpl $0x63,(%rax) 0x00007f0de11c41f6: jg 0x7f0de11c420c

more reassociation in IR

2018 May 18

more reassociation in IR

I mentioned this earlier in the thread - I would like to see something like D41574 in the optimizer. It's optimizing code that no other pass does currently, and I don't see any other near-term proposal that gets us those optimizations. Omer, can you rebase that to trunk? I think a header has moved, so it doesn't build as-is. I'd like to know if it can catch the cases in D45842. If

RFC: a more detailed design for ThinLTO + vcall CFI

2016 Oct 26

RFC: a more detailed design for ThinLTO + vcall CFI

Hi all, As promised, here is a brain dump on how I see CFI for vcalls working under ThinLTO. Most of this has been prototyped, so the design does appear to be sound. For context on how CFI currently works under regular LTO, please read: http://llvm.org/docs/TypeMetadata.html http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html http://clang.llvm.org/docs/LTOVisibility.html ==== Summary

more reassociation in IR

2018 May 14

more reassociation in IR

On Fri, May 11, 2018 at 7:20 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote: > > > > On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at google.com> > wrote: > >> >> >> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at dberlin.org> >> wrote: >>

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

[LLVMdev] Tight overlapping loops and performance

On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote: > For which version of gcc? I should mention I'm on OS X and using the LLVM > SVN. gcc 4.3. It's also possible this is processor-sensitive. >> First, try looking at the generated code... the code LLVM generates is >> probably not what you're expecting. I'm getting the

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

[LLVMdev] Poor register allocations vs gcc

2015 Jul 13

[LLVMdev] Poor register allocations vs gcc

Hello, Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps. Now llvm is one instruction from gcc. If ecx was not used, it would be as fast. -- Sent from Yandex.Mail for mobile 20:03, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br

RFC: a more detailed design for ThinLTO + vcall CFI

2016 Oct 28

RFC: a more detailed design for ThinLTO + vcall CFI

Hi Peter, Thanks for sending this and sorry for the slow response. Some questions below. Teresa On Tue, Oct 25, 2016 at 5:27 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > Hi all, > > As promised, here is a brain dump on how I see CFI for vcalls working > under ThinLTO. Most of this has been prototyped, so the design does appear > to be sound. For context on how CFI

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

[LLVMdev] Tight overlapping loops and performance

> Date: Mon, 2 Mar 2009 13:41:45 -0800 > From: eli.friedman at gmail.com > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Tight overlapping loops and performance > > Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and > llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing > differently; I wouldn't be surprised if it's

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

[LLVMdev] trunk's optimizer generates slower code than 3.5

I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As

[LLVMdev] nested function's static link gets clobbered

2008 Nov 01

[LLVMdev] nested function's static link gets clobbered

Hi, > I'm parallelizing loops to be called by pthread. The thread body that I pass > to pthread_create looks like > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) > parent_frame is pointer to shared variables in original function > > 0x00007f0de11c41f0: mov (%r10),%rax > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > 0x00007f0de11c41f6:

similar to: RFC: Insertion of nops for performance stability