thr3ads.net - similar to: "Heroic LLVM optimizations"

Displaying 20 results from an estimated 1000 matches similar to: "Heroic LLVM optimizations"

2017 Aug 16

Heroic LLVM optimizations

Hi Tobias- The loop fusion you mention is the one in libquantum/cpu2006 ? Or something else in cpu2017 ? -Thx Dibyendu -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Tobias Grosser via llvm-dev Sent: Wednesday, August 16, 2017 10:10 AM To: renau at uncore.io; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Heroic LLVM optimizations Hi

Heroic LLVM optimizations

2017 Aug 16

Heroic LLVM optimizations

I'll be interested in seeing the improvements. As a reference, this is what I get in an Intel 6700K when I compare gcc 5.4 (Ofast flto) vs published Intel results. 23x in libquantum, and over 40% in many benchmarks. I think that it is mostly from AoS vs SoA and loop transformations. 5.4

[RFC] Enable Partial Inliner by default

2017 Nov 02

[RFC] Enable Partial Inliner by default

Forgot to add that all experiments were done with '-O3 -m64 -fexperimental-new-pass-manager'. Graham Yiu LLVM Compiler Development IBM Toronto Software Lab Office: (905) 413-4077 C2-707/8200/Markham Email: gyiu at ca.ibm.com From: Graham Yiu/Toronto/IBM To: llvm-dev at lists.llvm.org Cc: junbuml at codeaurora.org, xinliangli at gmail.com Date: 11/02/2017 05:26 PM Subject: [RFC]

[RFC] Enable Partial Inliner by default

2017 Nov 10

[RFC] Enable Partial Inliner by default

Hi Graham, Thank you for offering help. I am trying to create a reproducer. The problem is that the crashes happen whilst LTO is used. One thing I am sure about IR is broken at compile time. Thanks, Evgeny From: Graham Yiu <gyiu at ca.ibm.com> Date: Friday, 10 November 2017 at 16:09 To: Evgeny Astigeevich <Evgeny.Astigeevich at arm.com> Cc: "junbuml at codeaurora.org"

[RFC] Enable Partial Inliner by default

2017 Nov 10

[RFC] Enable Partial Inliner by default

Hi Evgeny, I just realized that if these are compile-time errors I can help investigate on my end. Do you have something I can use to reproduce? Cheers, Graham Yiu LLVM Compiler Development IBM Toronto Software Lab Office: (905) 413-4077 C2-707/8200/Markham Email: gyiu at ca.ibm.com From: Graham Yiu/Toronto/IBM To: Evgeny Astigeevich <Evgeny.Astigeevich at arm.com> Cc:

[CodeGen] CodeSize - TailMerging and BlockPlacement

2016 Mar 29

[CodeGen] CodeSize - TailMerging and BlockPlacement

Hi everyone, The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. I did an experiment of adding additional BranchFolding and BlockPlacement after the existing BlockPlacement (i.e., -block-placement -branch-folder -block-placement) targeting

[LLVMdev] proof of concept for a loop fusion pass

2015 Jan 16

[LLVMdev] proof of concept for a loop fusion pass

Hi, We are proposing a loop fusion pass that tries to proactive fuse loops across function call boundaries and arbitrary control flow. http://reviews.llvm.org/D7008 With this pass, we get 103 loop fusions in SPECCPU INT 2006 462.libquantum with rate performance improving close to 2.5X in x86 (results from AMD A10-6700). I took some liberties in patching up some of the code in

2015 Oct 02

This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like

2015 Oct 01

Hi Sanjay, I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this . X=a+b . Y=X+c . Z=Y+d . There are many other instructions between these float adds. The reassociation

[PATCH 02/10] x86/cpufeature: Kill cpu_has_hypervisor

2016 Mar 29

[PATCH 02/10] x86/cpufeature: Kill cpu_has_hypervisor

From: Borislav Petkov <bp at suse.de> Use boot_cpu_has() instead. Signed-off-by: Borislav Petkov <bp at suse.de> Cc: virtualization at lists.linux-foundation.org Cc: sparmaintainer at unisys.com --- arch/x86/events/intel/cstate.c | 2 +- arch/x86/events/intel/uncore.c | 2 +- arch/x86/include/asm/cpufeature.h | 1 -

[PATCH 02/10] x86/cpufeature: Kill cpu_has_hypervisor

2016 Mar 29

[PATCH 02/10] x86/cpufeature: Kill cpu_has_hypervisor

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

2020 Apr 09

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

+Yundi Qian <yundi at google.com> +Eugene Brevdo <ebrevdo at google.com> , our team members from the ML side. To avoid formatting issues, here is a link to the RFC <https://docs.google.com/document/d/1BoSGQlmgAh-yUZMn4sCDoWuY6KWed2tV58P4_472mDE/edit?usp=sharing>, open to comments. Thanks! On Wed, Apr 8, 2020 at 2:34 PM Mircea Trofin <mtrofin at google.com> wrote: >

[LLVMdev] BasicAA unable to analyze recursive PHI nodes

2015 Jun 11

[LLVMdev] BasicAA unable to analyze recursive PHI nodes

----- Original Message ----- > From: "Tobias Edler von Koch" <tobias at codeaurora.org> > To: "Daniel Berlin" <dberlin at dberlin.org> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, June 11, 2015 10:02:37 AM > Subject: Re: [LLVMdev] BasicAA unable to analyze recursive PHI nodes > > Hi Daniel,

[RFC] Enable Partial Inliner by default

2017 Nov 13

[RFC] Enable Partial Inliner by default

Hi Graham, I created a bug report with a reproducer for the failures I’ve got: https://bugs.llvm.org/show_bug.cgi?id=35288 I have also found that LTO reverts everything the partial inliner has done. Maybe the partial inliner should not be used at the first LTO phase (compilation). I hope I’ll have a chance to look at the code size regressions this week. Thanks, Evgeny Astigeevich From:

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

2020 Apr 08

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

It turns out it's me, sorry. Let me see how I can sort this out. In the meantime, here is the csv: SPEC2006 data: binary,base -Oz size,ML -Oz size,ML size shrink by,,perf: base -Oz scores,perf: ML -Oz scores,ML improvement by 400.perlbench,2054200,2086776,-1.59%,,2.9,2.9,0.00% 401.bzip2,1129976,1095544,3.05%,,6.4,6.2,-3.13% 403.gcc,4078488,4130840,-1.28%,,11.6,11.7,0.86%

[LLVMdev] LLVM and Spec2006

2010 Jul 20

[LLVMdev] LLVM and Spec2006

Hi, What are the best options to compile Spec2006 with LLVM compilers to get the best performance numbers on x86? Has anybody compared LLVM Spec2006 numbers with GCC 4.5 base? reza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100719/40cf38a5/attachment.html>

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

2020 Apr 09

RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM

Sorry, I wasn't aware of that. I can make the google doc view-only, keeping the current comments. I'll wait a bit (few hrs) to see if there's any pushback to that. On Thu, Apr 9, 2020 at 9:57 AM Xinliang David Li <xinliangli at gmail.com> wrote: > One suggestion : should we consolidate the discussion into the main > thread? I know some folks are not willing to comment in

LoopSimplify pass prevents loop unrolling

2017 Jun 30

LoopSimplify pass prevents loop unrolling

Hi All, In the attached test case there, is an unnested loop with 2 iterations. The loop latch block is terminated by an unconditional branch, so simplifycfg folds the almost empty latch block into its predecessor which is the loop header. This results in an additional backedge in the CFG, so when LoopRotate pass is called it canonicalizes the loop into a nested loop. However, now the loop

[RFC] Using Intel MPX to harden SafeStack

2017 Feb 18

[RFC] Using Intel MPX to harden SafeStack

On 2/7/2017 20:02, Kostya Serebryany wrote: > ... > > My understanding is that BNDCU is the cheapest possible instruction, > just like XOR or ADD, > so the overhead should be relatively small. > Still my guesstimate would be >= 5% since stores are very numerous. > And such overhead will be on top of whatever overhead SafeStack has. > Do you have any measurements to

[LLVMdev] LLVM and Spec2006

2010 Jul 20

[LLVMdev] LLVM and Spec2006

Hi Reza, -O4 is the highest level of LLVM optimization that I know of. But, I don't know if it has been tried on Spec2006. IIRC, Dan Gohman has run Spec. tests with LLVM, so he can provide more info. - fariborz On Jul 19, 2010, at 6:06 PM, Reza Yazdani wrote: > Hi, > > What are the best options to compile Spec2006 with LLVM compilers to > get the best performance numbers

similar to: Heroic LLVM optimizations