similar to: RFC: Insertion of nops for performance stability

Displaying 20 results from an estimated 5000 matches similar to: "RFC: Insertion of nops for performance stability"

2016 Nov 20
3
RFC: Insertion of nops for performance stability
Hi Hal, A pre-emit pass will indeed be preferable. I originally thought of it, too, however I could not figure out how can such a pass have an access to information on instruction sizes and block alignments. I know that for X86, at least, the branch relaxation is happening during the layout phase in the Assembler, where I plan to integrate the nop insertion such that the new MCPerfNopFragment
2016 Nov 21
2
RFC: Insertion of nops for performance stability
Hi Hal, Thanks for the reference. I’ve looked at PPCBranchSelector and the PowerPC backend. It is very different from the X86 architecture and unfortunately the way branch relaxation and alignment related issues are handled in PPC cannot be copied to X86. This is because: 1. PPC instructions are of fixed length while X86 instructions are of variable length, and their length can change
2018 May 09
0
more reassociation in IR
When you say that distribution shouldn't be used, do you mean within instcombine rather than some other pass? Or not all as an IR optimization? A dedicated optimization pass that looks for and makes factoring/distribution folds to eliminate instructions seems like it would solve the problems that I'm seeing. Ie, I'm leaning towards the proposal here: https://reviews.llvm.org/D41574
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2018 May 09
4
more reassociation in IR
> On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > 1. The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better. Agreed. The original mindset included a (naive) belief that going with a canonical form was better than teaching redundancy elimination to handle abstractions (as a matter
2015 Jul 13
5
[LLVMdev] Poor register allocations vs gcc
Hello, I have an issue with the llvm optimizations. I need to create object codes. the -ON PURPOSE poor && useless- code : --------------------------------------------------- #include <stdio.h> #include <stdlib.h> int ci(int a){ return 23; } int flop(int a, char ** c){ a += 71; int b = 0; if (a == 56){ b = 69; b += ci(a); } puts("ok"); return a +
2017 Jan 24
7
[X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,
2008 Oct 31
3
[LLVMdev] nested function's static link gets clobbered
Fellow developers, I'm parallelizing loops to be called by pthread. The thread body that I pass to pthread_create looks like define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) parent_frame is pointer to shared variables in original function 0x00007f0de11c41f0: mov (%r10),%rax 0x00007f0de11c41f3: cmpl $0x63,(%rax) 0x00007f0de11c41f6: jg 0x7f0de11c420c
2018 May 18
0
more reassociation in IR
I mentioned this earlier in the thread - I would like to see something like D41574 in the optimizer. It's optimizing code that no other pass does currently, and I don't see any other near-term proposal that gets us those optimizations. Omer, can you rebase that to trunk? I think a header has moved, so it doesn't build as-is. I'd like to know if it can catch the cases in D45842. If
2016 Oct 26
2
RFC: a more detailed design for ThinLTO + vcall CFI
Hi all, As promised, here is a brain dump on how I see CFI for vcalls working under ThinLTO. Most of this has been prototyped, so the design does appear to be sound. For context on how CFI currently works under regular LTO, please read: http://llvm.org/docs/TypeMetadata.html http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html http://clang.llvm.org/docs/LTOVisibility.html ==== Summary
2018 May 14
3
more reassociation in IR
On Fri, May 11, 2018 at 7:20 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote: > > > > On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at google.com> > wrote: > >> >> >> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at dberlin.org> >> wrote: >>
2009 Mar 02
0
[LLVMdev] Tight overlapping loops and performance
On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote: > For which version of gcc?  I should mention I'm on OS X and using the LLVM > SVN. gcc 4.3. It's also possible this is processor-sensitive. >> First, try looking at the generated code... the code LLVM generates is >> probably not what you're expecting. I'm getting the
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after
2015 Jul 13
2
[LLVMdev] Poor register allocations vs gcc
<br />Hello, <br />Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br /><br />Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.<br />-- <br />Sent from Yandex.Mail for mobile<br /><br />20:03, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br
2016 Oct 28
0
RFC: a more detailed design for ThinLTO + vcall CFI
Hi Peter, Thanks for sending this and sorry for the slow response. Some questions below. Teresa On Tue, Oct 25, 2016 at 5:27 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > Hi all, > > As promised, here is a brain dump on how I see CFI for vcalls working > under ThinLTO. Most of this has been prototyped, so the design does appear > to be sound. For context on how CFI
2009 Mar 02
3
[LLVMdev] Tight overlapping loops and performance
> Date: Mon, 2 Mar 2009 13:41:45 -0800 > From: eli.friedman at gmail.com > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Tight overlapping loops and performance > > Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and > llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing > differently; I wouldn't be surprised if it's
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As
2008 Nov 01
0
[LLVMdev] nested function's static link gets clobbered
Hi, > I'm parallelizing loops to be called by pthread. The thread body that I pass > to pthread_create looks like > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) > parent_frame is pointer to shared variables in original function > > 0x00007f0de11c41f0: mov (%r10),%rax > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > 0x00007f0de11c41f6: