similar to: [LLVMdev] Enabling the loop vectorizer for -O2 and -Os

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Enabling the loop vectorizer for -O2 and -Os"

2013 Jun 17
0
[LLVMdev] Enabling the loop vectorizer for -O2 and -Os
> I enabled the loop vectorizer for -O2 and -Os in r184084. Congratulations! A lot of hard work went into getting us here. Tim.
2013 Jun 16
3
[LLVMdev] Enabling the vectorizer for -Os -- ping
All I have to say is *wow*. The vectorizer performs *remarkably* better now than it did the last time I benchmarked it. I'm stunned. I measured -O2 and -Os, as well as -march=x86-64 and -march=corei7-avx. My hope with the latter two was to cover both worst-case and best-case in terms of the quality of the vector ISA available. First, binary size growth. This is measured on average across a
2013 Jun 14
0
[LLVMdev] Enabling the vectorizer for -Os -- ping
Sorry for the delays here. I am running our benchmark suite and will have data in a day or so. On Jun 13, 2013 9:40 PM, "Nadav Rotem" <nrotem at apple.com> wrote: > Hi, > > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I
2013 Jun 14
5
[LLVMdev] Enabling the vectorizer for -Os -- ping
Hi, Last week I wrote llvm-dev and presented data that shows how enabling the vectorizer on -Os can improve the performance of many workloads and that it has negligible effects on code size. I also added a command line switch to make it easier for people to benchmark the vectorizer using -Os directly from clang without changing LLVM. Has anyone done any benchmarks on -Os + vectorization ?
2013 Jun 14
1
[LLVMdev] Enabling the vectorizer for -Os -- ping
Hi Nadav, No noticeable difference between "-Os" and "-Os -fvectorize" in code size or compilation times in my tests, and only minimal performance improvements (small enough to be ignored). cheers, --renato On 14 June 2013 09:29, Renato Golin <renato.golin at linaro.org> wrote: > On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > >>
2013 Jun 14
0
[LLVMdev] Enabling the vectorizer for -Os -- ping
On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I also added a command line switch to > make it easier for people to benchmark the vectorizer using -Os directly > from
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is
2013 Jul 15
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP
2013 Jun 05
0
[LLVMdev] Enabling the vectorizer for -Os
On 5 June 2013 04:26, Nadav Rotem <nrotem at apple.com> wrote: > I would like to start a discussion about enabling the loop vectorizer by > default for -Os. The loop vectorizer can accelerate many workloads and > enabling it for -Os and -O2 has obvious performance benefits. Hi Nadav, As it stands, O2 is very similar to O3 with a few, more aggressive, optimizations running,
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:
2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]
2012 Dec 08
1
[LLVMdev] Enabling the Loop Vectorizer for a few hours - heads up
Hi, I am going to enable the loop vectorizer on by default this afternoon for a few hours. I want to be able to test the vectorizer on a large number of configurations and get some performance numbers. I plan to enable the vectorizer at around 5pm (PST) and disable it before 9pm. Thanks, Nadav
2013 Nov 15
2
[LLVMdev] Limit loop vectorizer to SSE
A fix for this is in r194876. Thanks for reporting this! On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to
2013 Jan 03
1
[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?
Hi Nadav, On Thu, Jan 3, 2013 at 1:53 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Akira! > > > > > Does the current loop vectorizer inquire about the SIMD capabilities of > the target architecture when it decides whether it is profitable to > vectorize a loop? > > Yes, it uses a cost model to determine the profitability of vectorization. > At the
2013 Nov 15
4
[LLVMdev] Limit loop vectorizer to SSE
Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index