thr3ads.net - similar to: "[LLVMdev] [Polly] Performance comparison between Cloog and ISL code generation"

Displaying 20 results from an estimated 300 matches similar to: "[LLVMdev] [Polly] Performance comparison between Cloog and ISL code generation"

Compare test-suite benchmarks performance complied without TBAA, with default TBAA and with new TBAA struct path

2018 Apr 26

Compare test-suite benchmarks performance complied without TBAA, with default TBAA and with new TBAA struct path

Hello, I was interested in how much Type-Based Alias Analysis helps to optimize code. For that purpose, I've compared three sets of benchmarks: compiled without TBAA, compiled with a default TBAA metadata format, and compiled with new TBAA metadata format. As a set of benchmarks, I've used the LLVM test suite (http://llvm.org/docs/TestingGuide.html#test-suite-overview) which has a lot of

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

2015 May 18

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Hi Chris and others! I totally support any work in this direction. In the current state LNT’s regression detection system is too noisy, which makes it almost impossible to use in some cases. If after each run a developer gets a dozen of ‘regressions’, none of which happens to be real, he/she won’t care about such reports after a while. We clearly need to filter out as much noise as we can - and

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 28

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Hi, Sean: I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change to the IPO pass-ordering for now. However, when I make a minor change to populateLTOPassManager() by separating module-pass and non-module-passes, I saw quite a few performance difference, most of them are degradations. Attacking these degradations one by one in a piecemeal manner is wasting

[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

2015 Feb 26

[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Hi all, I've started looking at the GlobalMerge pass, enabled by default on ARM and AArch64. I think we should reconsider that, at least for AArch64. As is, the pass just merges all globals together, in groups of 4KB (AArch64, 128B on ARM). At the time it was enabled, the general thinking was "it's almost free, it doesn't affect performance much, we might as well use it".

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

2013 Jul 28

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

Sorry for not posting sooner. I forgot to send an update with the results. I also have some benchmark data. It confirms much of what you posted -- binary size increase is essentially 0, performance increases across the board. It looks really good to me. However, there was one crash that I'd like to check if it still fires. Will update later today (feel free to ping me if you don't hear

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

2013 Jul 28

[LLVMdev] Enabling the SLP-vectorizer by default for -O3

Hi, Below you can see the updated benchmark results for the new SLP-vectorizer. As you can see, there is a small number of compile time regressions, a single major runtime *regression, and many performance gains. There is a tiny increase in code size: 30k for the whole test-suite. Based on the numbers below I would like to enable the SLP-vectorizer by default for -O3. Please let me know if you

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

2015 May 15

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

tl;dr in low data situations we don’t look at past information, and that increases the false positive regression rate. We should look at the possibly incorrect recent past runs to fix that. Motivation: LNT’s current regression detection system has false positive rate that is too high to make it useful. With test suites as large as the llvm “test-suite” a single report will show hundreds of

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 14

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Cool! What changes have you seen to generated code size? I'll take it for a spin on our benchmarks. On Sat, Jul 13, 2013 at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent > instructions in a straight-line code. It is currently not enabled by > default, and people who want to

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 14

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements

[RFC] Add IR level interprocedural outliner for code size.

2017 Jul 22

[RFC] Add IR level interprocedural outliner for code size.

Hi Andrey, Questions and feedback are very much welcome. - The explanation as to why the improvements can vary between the IR and MIR outliner mainly boil down to the level of abstraction that each are working at. The MIR level has very accurate heuristics and is effectively the last post ISel target independent code gen pass. The IR outliner on the other hand has more estimation in the cost

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>

[LLVMdev] cloog/isl required for polly 3.2?

2012 Dec 18

[LLVMdev] cloog/isl required for polly 3.2?

Tobi, From your previous comments off-list, I assume that the polly 3.2 branch still needs changes backported from polly trunk in order to solve the polly-test testsuite failures against current cloog git and isl 0.11. Also, we still need the cloog 0.18 tarballs created for polly 3.2 as it is incompatible with the current cloog release. Any chance of these issues being resolved before the llvm

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

2013 Jul 18

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order for the pre- and post- IPO scalar optimizations. This is wish-list in our mind: pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of all loop xforms... post-IPO: get rid of inlining, or maybe we still need it, only

always allow canonicalizing to 8- and 16-bit ops?

2018 Jan 22

always allow canonicalizing to 8- and 16-bit ops?

Hello Thanks for looking into this. I can't be very confident what the knock on result of a change like that would be, especially on architectures that are not Arm. What I can do though, is run some benchmarks and look at that results. Using this patch: --- a/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -150,6 +150,9 @@

[LLVMdev] [FastPolly]: Update of Polly's performance on LLVM test-suite

2013 Aug 12

[LLVMdev] [FastPolly]: Update of Polly's performance on LLVM test-suite

At 2013-08-12 01:18:30,"Tobias Grosser" <tobias at grosser.es> wrote: >On 08/10/2013 06:59 PM, Star Tan wrote: >> Hi all, >> >> I have evaluated Polly's performance on LLVM test-suite with latest LLVM (r188054) and Polly (r187981). Results can be viewed on: http://188.40.87.11:8000. > >Hi Star Tan, > >thanks for the update. >

always allow canonicalizing to 8- and 16-bit ops?

2018 Jan 22

always allow canonicalizing to 8- and 16-bit ops?

Thanks for the perf testing. I assume that DAG legalization is equipped to handle these cases fairly well, or someone would've complained by now... FWIW (and at least some of this can be blamed on me), instcombine already does the narrowing transforms without checking shouldChangeType() for binops like and/or/xor/udiv. The justification was that narrower ops are always better for

RFC: Switching to the new pass manager by default

2017 Oct 25

RFC: Switching to the new pass manager by default

On 10/25/2017 12:10 PM, Evgeny Astigeevich via llvm-dev wrote: > > Hi Chandler, > > I ran the LNT benchmarks and SPEC2k6.train on AArch64 Cortex-A57. I > used revisions: Clang 316561, LLVM 316563. > > Options: -O3 -mcpu=cortex-a57 -fomit-frame-pointer > -fexperimental-new-pass-manager > > Regressions: execution time increase > > LNT > >

RFC: Switching to the new pass manager by default

2017 Oct 25

RFC: Switching to the new pass manager by default

On 10/25/2017 12:32 PM, Evgeny Astigeevich wrote: > > Hi Hal, > > I quickly checked the execution profile. It is real. The code changed > significantly. A number of the hottest regions changed. I’ll compare IRs. > Thanks. Obviously a 1000% execution performance regression seems problematic. -Hal > JFYI FreeBench/fourinarow time graph: >

similar to: [LLVMdev] [Polly] Performance comparison between Cloog and ISL code generation