Displaying 20 results from an estimated 50000 matches similar to: "[LLVMdev] Enabling the vectorizer for -Os"
2013 Jun 06
0
[LLVMdev] Enabling the vectorizer for -Os
Hi,
Thanks for the feedback. I think that we agree that vectorization on -Os can benefit many programs. Regarding -O2 vs -O3, maybe we should set a higher cost threshold for O2 to increase the likelihood of improving the performance ? We have very few regressions on -O3 as is and with better cost models I believe that we can bring them close to zero, so I am not sure if it can help that much.
2013 Jun 05
0
[LLVMdev] Enabling the vectorizer for -Os
On 5 June 2013 13:32, David Tweed <david.tweed at arm.com> wrote:
> This is what I'd like to know about: what specific potential to change
> results have you seen in the vectorizer?
>
No changes, just conceptual. AFAIK, the difference between the passes on O2
and O3 are minimal (looking at the code where this is chosen) and they
don't seem to be particularly amazing to
2013 Jun 05
0
[LLVMdev] Enabling the vectorizer for -Os
On 5 June 2013 11:59, David Tweed <david.tweed at arm.com> wrote:
> (I've very rarely had O3 optimzation, rather than some program specific
> subset of the options, acheive any non-noise-level speed-up over O2 with
> gcc/g++.)
>
Hi David,
You surely remember this:
http://plasma.cs.umass.edu/emery/stabilizer
"We find that, while -O2 has a significant impact relative
2013 Jun 06
2
[LLVMdev] Enabling the vectorizer for -Os
On Wed, Jun 5, 2013 at 5:51 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi,
>
> Thanks for the feedback. I think that we agree that vectorization on -Os
> can benefit many programs. Regarding -O2 vs -O3, maybe we should set a
> higher cost threshold for O2 to increase the likelihood of improving the
> performance ? We have very few regressions on -O3 as is and with
2013 Jun 05
0
[LLVMdev] Enabling the vectorizer for -Os
On 5 June 2013 04:26, Nadav Rotem <nrotem at apple.com> wrote:
> I would like to start a discussion about enabling the loop vectorizer by
> default for -Os. The loop vectorizer can accelerate many workloads and
> enabling it for -Os and -O2 has obvious performance benefits.
Hi Nadav,
As it stands, O2 is very similar to O3 with a few, more aggressive,
optimizations running,
2013 Jun 06
0
[LLVMdev] Enabling the vectorizer for -Os
Hi Chandler,
> FWIW, I don't yet agree.
>
>
> Your tables show many programs growing in code size by over 20%. While there is associated performance improvements, it isn't clear that this is a good tradeoff. Historically, optimizations which optimize as a direct result of growing code size have *not* been an acceptable tradeoff in -Os.
>
>
>
>
>
I am
2012 Nov 06
2
[LLVMdev] Regarding BOF: Vectorization in LLVM
Hi Nadav,
Unfortunately I'm not attending the dev meeting, but the BoF looks
interesting. One thing that I'd like to throw into the mix is that, while
dealing with autovectorisation of LLVM compiled down from C-like languages
(or maybe Fortran-like languages) is clearly a very big area for fruitful
work both algorithmically and in terms of practical relevance, it'd also be
interesting
2012 Nov 06
0
[LLVMdev] Regarding BOF: Vectorization in LLVM
Hi David!
On Nov 6, 2012, at 3:23 AM, David Tweed <david.tweed at gmail.com> wrote:
> Hi Nadav,
>
> Unfortunately I'm not attending the dev meeting, but the BoF looks interesting. One thing that I'd like to throw into the mix is that, while dealing with autovectorisation of LLVM compiled down from C-like languages (or maybe Fortran-like languages) is clearly a very big
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
>> Hi,
>>
>> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2012 Nov 06
2
[LLVMdev] Regarding BOF: Vectorization in LLVM
----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "David Tweed" <david.tweed at gmail.com>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Tuesday, November 6, 2012 11:08:23 AM
> Subject: Re: [LLVMdev] Regarding BOF: Vectorization in LLVM
>
> Hi David!
>
> On Nov 6, 2012, at 3:23 AM, David Tweed <david.tweed at
2017 Feb 17
2
(RFC) Adjusting default loop fully unroll threshold
> On Feb 16, 2017, at 4:41 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Thu, Feb 16, 2017 at 3:45 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> First off, I just want to say wow and thank you. This kind of data is amazing. =D
>
> On Thu, Feb 16, 2017 at
2013 Jul 28
2
[LLVMdev] Enabling the SLP-vectorizer by default for -O3
Hi,
Below you can see the updated benchmark results for the new SLP-vectorizer. As you can see, there is a small number of compile time regressions, a single major runtime *regression, and many performance gains. There is a tiny increase in code size: 30k for the whole test-suite. Based on the numbers below I would like to enable the SLP-vectorizer by default for -O3. Please let me know if you
2015 Jul 15
5
[LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable
Hi all,
I would like to propose an improvement of the “almost dead” block
elimination in Transforms/Local.cpp so that it will preserve the canonical
loop form for loops with a volatile iteration variable.
*** Problem statement
Nested loops in LCALS Subset B (https://codesign.llnl.gov/LCALS.php) are
not vectorized with LLVM -O3 because the LLVM loop vectorizer fails the
test whether the loop
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi,
Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements.
Thanks,
Nadav
On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2017 Feb 16
4
(RFC) Adjusting default loop fully unroll threshold
First off, I just want to say wow and thank you. This kind of data is
amazing. =D
On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at arm.com> wrote:
> The biggest relative code size increases indeed didn't happen for the
> biggest programs, but instead for a few programs weighing in at about 100KB.
> I'm assuming the Google benchmark set covers much bigger
2015 Aug 13
2
[LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable
Hi Gerolf,
I think we have several (perhaps separable) issues here:
1. Do we have a canonical form for loops, preserved through the optimizer, that allows naturally-constructed loop nests to remain separable?
2. Do we forbid non-lowering transformations that turn vectorizable loops into non-vectorizable loops?
3. How do we detect cases where transformations cause a negative answer to either
2013 Jan 14
17
[LLVMdev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
This has been an idea floating around in my head for a while and after
several discussions with others it continues to hold up so I thought I
would mail it out. Sorry for cross posting to both lists, but this is an
issue that would significantly impact both LLVM and Clang.
Essentially, LLVM provides canned optimization "levels" for frontends to
re-use. This is nothing new. However, we
2015 Jul 30
4
[LLVMdev] RFC: Callee speedup estimation in inline cost analysis
TLDR - The proposal below is intended to allow inlining of larger callees
when such inlining is expected to reduce the dynamic instructions count.
Proposal
-------------
LLVM inlines a function if the size growth (in the given context) is less
than a threshold. The threshold is increased based on certain
characteristics of the called function (inline keyword and the fraction of
vector
2013 Jul 14
6
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi,
LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements
2017 Feb 15
2
(RFC) Adjusting default loop fully unroll threshold
Thanks for running these Kristof!
I'd still like to hear from Apple, and if we can get a few more x86
micro-architectures covered that'd be great, but it looks like -O3 is
uncontroversial, and the question is whether this makes sense at O2...
To me, it would help a lot to know the actual breakdown of benchmarks such
as yours Kristof (as they seem to have more codesize impact than others