Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Enabling the SLP vectorizer by default for -O3"
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
>> Hi,
>>
>> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2013 Jul 15
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi,
>
> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi,
Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements.
Thanks,
Nadav
On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2013 Jul 28
2
[LLVMdev] Enabling the SLP-vectorizer by default for -O3
Hi,
Below you can see the updated benchmark results for the new SLP-vectorizer. As you can see, there is a small number of compile time regressions, a single major runtime *regression, and many performance gains. There is a tiny increase in code size: 30k for the whole test-suite. Based on the numbers below I would like to enable the SLP-vectorizer by default for -O3. Please let me know if you
2015 Feb 26
5
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi all,
I've started looking at the GlobalMerge pass, enabled by default on
ARM and AArch64. I think we should reconsider that, at least for
AArch64.
As is, the pass just merges all globals together, in groups of 4KB
(AArch64, 128B on ARM).
At the time it was enabled, the general thinking was "it's almost
free, it doesn't affect performance much, we might as well use it".
2013 Jul 14
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Cool!
What changes have you seen to generated code size?
I'll take it for a spin on our benchmarks.
On Sat, Jul 13, 2013 at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi,
>
> LLVM’s SLP-vectorizer is a new pass that combines similar independent
> instructions in a straight-line code. It is currently not enabled by
> default, and people who want to
2013 Jul 28
0
[LLVMdev] Enabling the SLP-vectorizer by default for -O3
Sorry for not posting sooner. I forgot to send an update with the results.
I also have some benchmark data. It confirms much of what you posted --
binary size increase is essentially 0, performance increases across the
board. It looks really good to me.
However, there was one crash that I'd like to check if it still fires. Will
update later today (feel free to ping me if you don't hear
2013 Jul 18
3
[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
Andy and I briefly discussed this the other day, we have not yet got
chance to list a detailed pass order
for the pre- and post- IPO scalar optimizations.
This is wish-list in our mind:
pre-IPO: based on the ordering he propose, get rid of the inlining (or
just inline tiny func), get rid of
all loop xforms...
post-IPO: get rid of inlining, or maybe we still need it, only
2018 Aug 14
3
[RFC] Delaying phi-to-select transformation until later in the pass pipeline
Summary
=======
I'm planning on adjusting SimplifyCFG so that it doesn't turn two-entry phi
nodes into selects until later in the pass pipeline, to give passes which can
understand phis but not selects more opportunity to optimize. The thing I'm
trying to do which made me think of doing this is described below, but from the
benchmarking I've done it looks like this is overall a
2015 May 15
6
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
tl;dr in low data situations we don’t look at past information, and that increases the false positive regression rate. We should look at the possibly incorrect recent past runs to fix that.
Motivation: LNT’s current regression detection system has false positive rate that is too high to make it useful. With test suites as large as the llvm “test-suite” a single report will show hundreds of
2018 Aug 15
2
[RFC] Delaying phi-to-select transformation until later in the pass pipeline
I'm concerned that we're focusing on one side of this. Let me point out
a few concerns w/changing the canonical form here:
1. LICM does not know how to hoist or sink regions. It does know how
to hoist and sink selects.
2. InstCombine has limited support for triangles/diamonds, but fairly
extensive support for selects.
3. EarlyCSE and GVN do not know how to eliminate fully
2018 Aug 17
2
[RFC] Delaying phi-to-select transformation until later in the pass pipeline
> On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>
> On 08/15/2018 02:38 PM, Philip Reames via llvm-dev wrote:
>> I'm concerned that we're focusing on one side of this. Let me point out a few concerns w/changing the canonical form here:
>>
>> LICM does not know how to hoist or sink regions. It does know
2011 Jul 24
2
[LLVMdev] [llvm-testresults] bwilson__llvm-gcc_PROD__i386 nightly tester results
A big compile time regression. Any ideas?
Ciao, Duncan.
On 22/07/11 19:13, llvm-testresults at cs.uiuc.edu wrote:
>
> bwilson__llvm-gcc_PROD__i386 nightly tester results
>
> URL http://llvm.org/perf/db_default/simple/nts/253/
> Nickname bwilson__llvm-gcc_PROD__i386:4
> Name curlew.apple.com
>
> Run ID Order Start Time End Time
> Current 253 0 2011-07-22 16:22:04
2018 Apr 26
0
Compare test-suite benchmarks performance complied without TBAA, with default TBAA and with new TBAA struct path
Hello,
I was interested in how much Type-Based Alias Analysis helps to optimize code. For that purpose, I've compared
three sets of benchmarks: compiled without TBAA, compiled with a default TBAA metadata format, and compiled
with new TBAA metadata format.
As a set of benchmarks, I've used the LLVM test suite (http://llvm.org/docs/TestingGuide.html#test-suite-overview)
which has a lot of
2013 Sep 08
2
[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization
Hello all,
I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance.
Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or 33, 34)
*pNoGen with SCEV canonicalization (run 32): -O3 -Xclang -load
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
Thanks for fixing the problem with the insertps mask.
Generally the new shuffle lowering looks promising, however there are
some cases where the codegen is now worse causing runtime performance
regressions in some of our internal codebase.
You have already mentioned how the new shuffle lowering is missing
some features; for example, you explicitly said that we currently lack
of
2013 Jul 28
0
[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
Hi, Sean:
I'm sorry I lie. I didn't mean to lie. I did try to avoid making a
*BIG* change
to the IPO pass-ordering for now. However, when I make a minor change to
populateLTOPassManager() by separating module-pass and non-module-passes, I
saw quite a few performance difference, most of them are degradations.
Attacking
these degradations one by one in a piecemeal manner is wasting
2013 Sep 08
0
[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization
On 09/08/2013 08:03 PM, Star Tan wrote:
> Hello all,
>
>
> I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance.
Interesting.
> Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or
2013 Sep 14
0
[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization
Hello all,
I have evaluated the compile-time and execution-time performance of Polly canonicalization passes. Details can be referred to http://188.40.87.11:8000/db_default/v4/nts/recent_activity. There are four runs:
pollyBasic (run 45): clang -O3 -Xclang -load -Xclang LLVMPolly.so
pollyNoGenSCEV (run 44): clang -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-codegen-scev
2013 Sep 25
0
[LLVMdev] [Polly] Performance comparison between Cloog and ISL code generation
Hello all,
The performance comparison between Polly's Cloog and ISL code generator is posted on http://188.40.87.11:8000/db_default/v4/nts/59?compare_to=58&baseline=58
It seems their execution-time performance are comparable:
Performance Regressions - Execution Time (ISL over Cloog)
MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt 8.49%